Top AI Project by Categories

A list of Top influential AI open source project listed by different categories. ( Data sourced from GitHub, updated automatically everyday.)

Top AI Developers Top AI Organizations Top AI Project Top Growing Speed The Least Known Devs A Year Without Refresh

LLM Diffusion GPT RAG Multi-modality DeepSeek

Rankings	Organization Account	Related Project	Project intro	Star count
1	bytedance 12.3K followers Singapore	UI-TARS-desktop	The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.	15.0K
2	om-ai-lab 682 followers -	VLM-R1	Solve Visual Understanding with Reinforced VLMs	5.3K
3	manycore-research 159 followers Hangzhou, China	SpatialLM	SpatialLM: Training Large Language Models for Structured Indoor Modeling	3.5K
4	MiniMax-AI 2.0K followers -	MiniMax-01	The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention	3.0K
5	SkyworkAI 958 followers Singapore	Skywork-R1V	Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning	2.6K
6	QiuYannnn 51 followers Los Angeles	Local-File-Organizer	An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.	2.4K
7	SkalskiP 5.6K followers 127.0.0.1	vlms-zero-to-hero	This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.	1.0K
8	THUDM 10.8K followers FIT Building, Tsinghua University	GLM-4.1V-Thinking	GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.	644
9	OpenBMB 5.1K followers -	VisRAG	Parsing-free RAG supported by VLMs	611
10	PKU-YuanGroup 1.2K followers China	UniWorld-V1	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	583
11	vlm-run 46 followers United States of America	vlmrun-hub	A hub for various industry-specific schemas to be used with VLMs.	510
12	nrl-ai 51 followers -	llama-assistant	AI-powered assistant to help you with your daily tasks, powered by Llama 3, DeepSeek R1, and many more models on HuggingFace.	486
13	changyeyu 31 followers China	LLM-RL-Visualized	🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献🎉 （100+ LLM/RL Algorithm Maps ）	458
14	awwaiid 97 followers Washington, DC	ghostwriter	Use the reMarkable2 as an interface to vision-LLMs (ChatGPT, Claude, Gemini). Ghost in the machine!	436
15	Flame-Code-VLM 10 followers -	Flame-Code-VLM	Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.	367
16	fpgaminer 173 followers -	joycaption	JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.	349
17	Hon-Wong 20 followers -	VoRA	[Fully open] [Encoder-free MLLM] Vision as LoRA	299
18	TIGER-AI-Lab 307 followers Canada	VLM2Vec	This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]	287
19	Aident-AI 5 followers United States of America	open-cuak	Reliable Automation Agents at Scale	279
20	MigoXLab 1 followers -	dingo	Dingo: A Comprehensive AI Data Quality Evaluation Tool	256
21	KolosalAI 24 followers United States of America	Kolosal	Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.	227
22	2U1 89 followers South Korea	Llama3.2-Vision-Finetune	An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.	156
23	IDEA-Research 2.5K followers China	ChatRex	Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding	156
24	lucasjinreal 2.3K followers Sanfancisco	Namo-R1	A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.	133
25	FennelFetish 8 followers -	qapyq	An image viewer and AI-assisted editing/captioning/masking tool that helps with curating datasets for generative AI models, finetunes and LoRA.	130
26	NVIDIA-AI-Blueprints 474 followers United States of America	video-search-and-summarization	Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A	130
27	RenzKa 63 followers -	simlingo	[CVPR 2025, Spotlight] SimLingo (CarLLava): Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment	126
28	balrog-ai 1 followers -	BALROG	Benchmarking Agentic LLM and VLM Reasoning On Games	117
29	mtkresearch 101 followers -	BreezeApp	BreezeAPP 是一款為 Android 和 iOS 平台開發的純手機 AI 應用程式。從 App Store下載，即可在不連網的狀態下享受多項 AI 功能。源碼由聯發創新基地(MediaTek Research)提供。我們旨在推廣兩個概念: 人人都可以在自己的手機上自由選擇並運行不同的LLM - one is free to choose one's own LLM to run on a phone，以及任何app開發者都可以輕鬆寫作創意的純手機AI應用 - any dev can create purely phone-based AI apps easily。	110
30	TrustGen 13 followers United States of America	TrustEval-toolkit	Toolkit for evaluating the trustworthiness of generative foundation models.	105
31	Ravi-Teja-konda 15 followers -	Surveillance_Video_Summarizer	VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.	102
32	MDGrey33 13 followers Riga, Latvia	pyvisionai	The PyVisionAI Official Repo	97
33	shikiw 65 followers -	Modality-Integration-Rate	The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".	96
34	WooQi57 41 followers -	Helpful-Doggybot	Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models	90
35	fangyuan-ksgk 37 followers Singapore	Mini-LLaVA	A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.	89
36	OpenRobotLab 531 followers -	VLM-Grounder	[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding	85
37	Gumpest 129 followers Beijing	SparseVLMs	Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".	77
38	Osilly 55 followers -	Awesome-Interleaving-Reasoning	Interleaving Reasoning: Next-Generation Reasoning Systems for AGI	77
39	FakeOAI 9 followers -	tokens	A token management platform that reverse-engineers the conversation interfaces of ChatGPT, Cursor, Grok, Claude, Windsurf, Gemini, and Sora, converting them into the OpenAI format./Token管理平台，逆向ChatGPT、Cursor、Grok、Claude、Windsurf、Gemini、Sora平台的对话接口转OpenAI格式	76
40	UMass-Embodied-AGI 141 followers United States of America	Mirage	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)	63
41	NVIDIA-Omniverse-blueprints 18 followers -	3d-conditioning	Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.	61
42	ai4ce 220 followers Brooklyn, NY, U.S.	SeeDo	[IROS 2025] Human Demo Videos to Robot Action Plans	54
43	saidwivedi 85 followers Germany	InteractVLM	[CVPR 2025] InteractVLM: 3D Interaction Reasoning from 2D Foundational Models	51
44	wendell0218 0 followers -	GVA-Survey	Official repository of the paper "Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms"	49
45	hewei2001 116 followers Shanghai	ReachQA	Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"	48
46	iris0329 57 followers -	SeeGround	[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding	46
47	GAD-cell 10 followers -	vlm-grpo	An implementation of GRPO for Unsloth's VLMs training	40
48	thubZ09 13 followers India	all-things-multimodal	Hub for researchers exploring VLMs and Multimodal Learning:)	40
49	declare-lab 285 followers Singapore University of Technology and Design	Emma-X	Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	39
50	kesimeg 19 followers -	awesome-turkish-language-models	A curated list of Turkish AI models, datasets, papers	38
51	USC-GVL 25 followers United States of America	PhysBench	[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding>	36
52	tsunghan-wu 61 followers Berkeley, CA	reverse_vlm	🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"	34
53	Video-Bench 0 followers -	Video-Bench	Video Generation Benchmark	32
54	mbzuai-oryx 281 followers -	AIN	AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.	31
55	roboflow 3.5K followers United States of America	vision-ai-checkup	Take your LLM to the optometrist.	31
56	LiuHengyu321 59 followers Hong Kong	IR3D-Bench	Official Code of IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering	30
57	NVIDIA-NeMo 34 followers -	Automodel	Day-0 support for any Hugging Face model leveraging PyTorch native functionalities while providing performance and memory optimized training and inference recipes.	26
58	sovit-123 140 followers India	SAM_Molmo_Whisper	An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.	23
59	ArmenJeddi 8 followers Toronto	saint	a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity	22
60	gptscript-ai 142 followers -	gptparse	Document parser for RAG	20
61	taco-group 111 followers United States of America	Re-Align	A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.	19
62	kornia 205 followers Spain	bubbaloop	🦄 Serving Platform for Spatial AI and Robotics.	19
63	col14m 15 followers -	cadrille	cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning	19
64	oztrkoguz 78 followers Turkey	SubtitleAI	An AI-powered tool for summarizing YouTube videos by generating scene descriptions, translating them, and creating subtitled videos with text-to-speech narration	17
65	StabRise 5 followers -	ScaleDP	ScaleDP is an Open-Source extension of Apache Spark for Document Processing	13
66	gptbmw 0 followers -	wildcard	最新野卡wildcard虚拟信用卡使用指南：wildcard注册教程，如何开通野卡信用卡？如何为野卡充值和提现？	13
67	stogiannidis 13 followers Edinburgh	srbench	Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"	12
68	SiyuWang0906 1 followers -	CAD-GPT	[AAAI2025] CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs	12
69	FreedomIntelligence 455 followers -	TRIM	We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their performance.	11
70	xlang-ai 554 followers -	computer-agent-arena-hub	Computer Agent Arena Hub: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks	11
71	miccunifi 41 followers Firenze - Viale Morgagni 65 - Italia	Cross-the-Gap	[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	11
72	OPTML-Group 76 followers East Lansing, Michigan	VLM-Safety-MU	Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning	11
73	MING-ZCH 52 followers Wuhan, China	CII-Bench	Can MLLMs Understand the Deep Implication Behind Chinese Images?	9
74	securade 5 followers Singapore	sentinel	Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.	9
75	Open-Social-World 0 followers -	EgoNormia	EgoNormia \| Benchmarking Physical Social Norm Understanding in VLMs	9
76	HaoyuanYang-2023 1 followers -	ImagineFSL	Official implementation of "ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning" [CVPR 2025 Highlight]	9
77	WILLOSCAR 0 followers China	Awesome-HCI-LLM	Awesome-HCI （Ubiquitous, LLM, MLLM, Agent, RAG, Embodied-AI, RLHF)	9
78	joanibal 23 followers California	OptVL	AVL + python + optimization = OptVL	9
79	hyun-yang 0 followers Brisbane, Australia	MyColPali	The PyQt6 application using ColPali and OpenAI to show Efficient Document Retrieval with Vision Language Models	8
80	YanNeu 11 followers -	DASH	DASH: Detection and Assessment of Systematic Hallucinations of VLMs	8
81	DataFog 6 followers United States of America	vlm-api	REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model	7
82	ola-krutrim 69 followers India	Chitrarth	Chitrarth: Bridging Vision and Language for a Billion People	7
83	loong64 14 followers China	ollama	Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.	7
84	Theia-4869 26 followers Beijing, China	VisPruner	[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs	7
85	intelligolabs 12 followers Italy	CoIN	[ICCV 25] Official repository of "Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues"	7
86	Davidlequnchen 16 followers Singapore	VLM-CADFeatureRecognition	This repository provides code and resources for automating manufacturing feature recognition in CAD designs using vision-language models.	7
87	2dameneko 0 followers -	ide-cap-chan	ide-cap-chan is a utility for batch image captioning with natural language using various VL models	6
88	HKU-TASR 13 followers Hong Kong	Geminio	[ICCV 2025] Geminio is a VLM-powered gradient inversion attack in federated learning (FL). It allows the adversary (the FL server) to describe the data of value and reconstruct the victim client's private data matching the description.	6
89	uzh-dqbm-cmi 17 followers Zurich	RadVLM	A Multitask Conversational Vision-Language Model for Radiology	6
90	david-s-martinez 8 followers Munich, Germany	Dex-GAN-Grasp	DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation - IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2024 \| DOI: 10.1109/Humanoids58906.2024.10769950	5
91	Ashad001 62 followers Karachi	RoomAligner	A focus on aligning room elements for better flow and space utilization.	5
92	Sathees2482 0 followers -	google-veo3-from-scratch	# Google Veo 3 Implemented from ScratchThis repository contains an implementation of Google Veo 3, a cutting-edge text-to-video generation system. 🎥 Explore the code to create high-quality videos from text prompts and enhance your projects with advanced AI capabilities. 🌟	5
93	sonstory 16 followers Seoul, Korea	VLM-ZSAD-Paper-Review	Reviews of papers on zero-shot anomaly detection using vision-Language models	4
94	Bhavik-Ardeshna 55 followers Montreal, Quebec	Multimodal-VideoRAG	Multimodal-VideoRAG: Using BridgeTower Embeddings and Large Vision Language Models	4
95	JoeJoe1313 24 followers Sofia, Bulgaria	LLMs-Journey	Various LLM resources and experiments	4
96	Traffic-Alpha 22 followers -	VLMLight	Official implementation of VLMLight	4
97	vbdi 6 followers Canada	casp	[CVPR 2025 Highlight] CASP: Compression of Large Multimodal Models Based on Attention Sparsity	4
98	asaddi 10 followers California, USA	ComfyUI-YALLM-node	Yet another set of LLM nodes for ComfyUI (for local/remote OpenAI-like APIs, multi-modal models supported)	3
99	PandragonXIII 1 followers China	CIDER	This is the official repository for Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models.	3
100	XiaomingX 9.8K followers japan	awesome-text-to-video-plus	The Ultimate Guide to Effortlessly Creating AI Videos for Social Media Go From Text to Eye-Catching Videos in Just a Few Steps	3