Top AI Project by Categories

A list of Top influential AI open source project listed by different categories. ( Data sourced from GitHub, updated automatically everyday.)

Top AI Developers Top AI Organizations Top AI Project Top Growing Speed The Least Known Devs

LLM Diffusion GPT RAG Multi-modality

Rankings	Organization Account	Related Project	Project intro	Star count
1	sgl-project 68 followers -	sglang	SGLang is a fast serving framework for large language models and vision language models.	4.9K
2	Fanghua-Yu 95 followers -	SUPIR	SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.	4.0K
3	X-PLUG 219 followers -	MobileAgent	Mobile-Agent: The Powerful Mobile Device Operation Assistant Family	2.4K
4	SoraWebui 58 followers -	SoraWebui	SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model.	2.3K
5	deepseek-ai 1.1K followers -	DeepSeek-VL	DeepSeek-VL: Towards Real-World Vision-Language Understanding	1.9K
6	cambrian-mllm 29 followers -	cambrian	Cambrian-1 is a family of multimodal LLMs with a vision-centric design.	1.5K
7	mini-sora 35 followers -	minisora	MiniSora: A community aims to explore the implementation path and future development direction of Sora.	1.1K
8	ShareGPT4Omni 12 followers -	ShareGPT4Video	An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	994
9	all-in-aigc 602 followers -	sorafm	Sora AI Video Generator by Sora.FM	907
10	BAAI-DCAI 120 followers -	Bunny	A family of lightweight multimodal models.	808
11	open-compass 221 followers China	VLMEvalKit	Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks	747
12	mbzuai-oryx 200 followers -	LLaVA-pp	🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)	730
13	CircleRadon 56 followers Hangzhou	Osprey	[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"	728
14	SunzeY 80 followers Shanghai, China	AlphaCLIP	[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want	595
15	ThuCCSLab 49 followers Beijing, China	Awesome-LM-SSP	A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).	534
16	FoundationVision 249 followers -	Groma	Grounded Multimodal Large Language Model with Localized Visual Tokenization	466
17	TinyLLaVA 6 followers -	TinyLLaVA_Factory	A Framework of Small-scale Large Multimodal Models	462
18	NVlabs 5.5K followers -	EAGLE	EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders	281
19	developersdigest 322 followers -	ai-devices	AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more	262
20	WisconsinAIVision 17 followers -	ViP-LLaVA	[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts	242
21	neonwatty 297 followers -	meme_search	Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.	235
22	niuzaisheng 18 followers Changchun Jilin China	ScreenAgent	ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)	234
23	jurieo 9 followers -	chatgpt-share-web	chatgpt和claude官网完整还原，包含其官网的全部功能。具有完善的用户体系和流量变现体系。	233
24	Curated-Awesome-Lists 20 followers -	Awesome-Open-AI-Sora	Sora AI Awesome List – Your go-to resource hub for all things Sora AI, OpenAI's groundbreaking model for crafting realistic scenes from text. Explore a curated collection of articles, videos, podcasts, and news about Sora's capabilities, advancements, and more.	208
25	shikiw 48 followers -	OPERA	[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation	206
26	gokayfem 86 followers Turkey	awesome-vlm-architectures	Famous Vision Language Models and Their Architectures	200
27	illuin-tech 14 followers Paris, France	colpali	The code used to train and run inference with the ColPali architecture.	196
28	SoraFlows 1 followers -	SoraFlows	The most powerful and modular Sora WebUI, api and backend with OpenAI's Sora Model. Collecting the highest quality prompts for Sora. using NextJs and Tailwind CSS	189
29	RLHF-V 13 followers -	RLHF-V	[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback	188
30	JiaqiLi404 6 followers Hong Kong	IAmDirector-Text2Video-NextJS-Client	本项目开源基于NextJS的前端，希望能够提供一个用于生成式AI的文字转视频，尤其是电影从编剧到视频生成的Web前端平台参考。Everyone can become a director. The Nextjs front-end of an AI driven platform for automatic movie/video generation (form GPT script generation to text2video movie generation).这是一个免费试用AI视频创作平台，集成了基于GPT的视频剧本生成和视频生成功能。我们的理想是让每个人都能成为导演，以最快的方式将日常中的任何创意转化为高质量的视频，无论是电影、营销视频、还是自媒体视频。	175
31	shure-dev 12 followers Japan	Awesome-LLM-related-Papers-Comprehensive-Topics	Awesome LLM-related papers and repos on very comprehensive topics.	170
32	baaivision 412 followers China	EVE	EVE: Encoder-Free Vision-Language Models from BAAI	168
33	Blaizzy 156 followers Poland	mlx-vlm	MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.	166
34	JosefAlbers 14 followers -	Phi-3-Vision-MLX	Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon	160
35	zhengli97 75 followers -	PromptKD	[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"	157
36	xiaoachen98 71 followers -	Open-LLaVA-NeXT	An open-source implementation for training LLaVA-NeXT.	148
37	AviSoori1x 71 followers San Francisco	seemore	From scratch implementation of a vision language model in pure PyTorch	136
38	gbaptista 164 followers Brazil	ollama-ai	A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.	134
39	mbodiai 15 followers United States of America	embodied-agents	Seamlessly integrate state-of-the-art transformer models into robotics stacks	134
40	linzhiqiu 116 followers -	t2v_metrics	Evaluating text-to-image/video/3D models with VQAScore	132
41	TIGER-AI-Lab 141 followers Canada	Mantis	Official code for Paper "Mantis: Multi-Image Instruction Tuning"	127
42	OpenDriveLab 1.2K followers Hong Kong	ELM	[ECCV 2024] Embodied Understanding of Driving Scenarios	119
43	sterzhang 4 followers -	image-textualization	Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions	117
44	dvlab-research 560 followers -	Prompt-Highlighter	[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs	110
45	LostXine 79 followers Bellevue, WA	LLaRA	LLaRA: Large Language and Robotics Assistant	110
46	notune 10 followers -	captcha-solver	basic google recaptcha solver using llava-v1.6-7b	99
47	bz-lab 0 followers -	AUITestAgent	AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.	91
48	WangWenhao0716 67 followers Sydney	VidProM	VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models	91
49	mala-lab 45 followers Singapore	InCTRL	Official implementation of CVPR'24 paper 'Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts'.	86
50	thomas-yanxin 261 followers Beijing China	KarmaVLM	🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.	84
51	zytx121 117 followers Singapore	Awesome-VLGFM	A Survey on Vision-Language Geo-Foundation Models (VLGFMs)	82
52	graphic-design-ai 5 followers -	graphist	Official Repo of Graphist	82
53	xlang-ai 387 followers -	Spider2-V	Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?	82
54	eliranwong 170 followers Europe	freegenius	FreeGenius AI, an advanced AI assistant that can talk and take multi-step actions. Supports numerous open-source LLMs via Llama.cpp or Ollama or Groq Cloud API, with optional integration with AutoGen agents, OpenAI API, Google Gemini Pro and unlimited plugins.	81
55	Ahnsun 13 followers -	merlin	[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds	77
56	chs20 3 followers -	RobustVLM	[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models	73
57	kyegomez 1.3K followers Palo Alto	MoE-Mamba	Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta	72
58	RobotecAI 134 followers Poland	rai	RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.	71
59	opendilab 1.1K followers China	PsyDI	PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)	68
60	TideDra 23 followers Beijing, CN	VL-RLHF	A RLHF Infrastructure for Vision-Language Models	67
61	OpenGVLab 1.9K followers -	MM-NIAH	This is the official implementation of the paper "Needle In A Multimodal Haystack"	66
62	KwaiVGI 342 followers -	Uniaa	Unified Multi-modal IAA Baseline and Benchmark	66
63	ruili3 39 followers Zürich, Switzerland	Know-Your-Neighbors	[CVPR 2024] 🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning	64
64	thu-ml 594 followers FIT Building, Tsinghua University, Beijing, China	MMTrustEval	A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)	61
65	mu-cai 45 followers Madison. WI	matryoshka-mm	Matryoshka Multimodal Models	61
66	BAAI-Agents 62 followers -	GPA-LM	This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".	57
67	Yxxxb 33 followers Shenzhen	VoCo-LLaMA	VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".	55
68	billpsomas 64 followers Athens, Greece	rscir	Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"	54
69	AIDC-AI 3 followers -	Ovis	A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.	52
70	OpenM3D 3 followers -	M3DBench	M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts. Furthermore, M3DBench provides a new benchmark to assess large models across 3D vision-centric tasks.	52
71	FlyCole 27 followers UK	Dream2Real	[ICRA 2024] Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models	50
72	tosiyuki 6 followers -	LLaVA-JP	LLaVA-JP is a Japanese VLM trained by LLaVA method	47
73	blib-la 13 followers Germany	captain	Give your computer an AI Brain	47
74	YBZh 113 followers Hong Kong	DMN	CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models	45
75	yihedeng9 20 followers -	STIC	Enhancing Large Vision Language Models with Self-Training on Image Comprehension.	45
76	zubair-irshad 212 followers Silicon Valley, CA, USA	Awesome-Robotics-3D	A curative list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites	44
77	fly-apps 266 followers -	ollama-open-webui	Self-host a ChatGPT-style web interface for Ollama 🦙	42
78	DefTruth 975 followers Guangzhou, China	Awesome-SD-Inference	📖A small curated list of Awesome SD/DiT/ViT/Diffusion Inference with Distributed/Caching/Sampling: DistriFusion, PipeFusion, AsyncDiff, DeepCache, Block Caching etc.	42
79	HieuPhan33 11 followers -	CVPR2024_MAVL	Multi-Aspect Vision Language Pretraining - CVPR2024	39
80	princeton-nlp 1.0K followers -	CharXiv	CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs	39
81	Victorwz 90 followers -	MLM_Filter	Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".	39
82	mapluisch 13 followers Germany	LLaVA-CLI-with-multiple-images	LLaVA inference with multiple images at once for cross-image analysis.	38
83	whwu95 131 followers Sydney, Australia	FreeVA	FreeVA: Offline MLLM as Training-Free Video Assistant	38
84	VisualWebBench 1 followers -	VisualWebBench	Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"	38
85	bonjour-npy 19 followers -	UndergraduateDissertation	Undergraduate Dissertation of Guilin University of Electronic Technology	38
86	chenshuang-zhang 6 followers -	imagenet_d	[CVPR2024 Highlight] Official Code for "ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object"	36
87	aimagelab 109 followers Modena, Italy	LLaVA-MORE	LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1	33
88	richard-peng-xia 26 followers -	CARES	[arXiv'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models	32
89	Hon-Wong 5 followers -	Elysium	[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM	32
90	skit-ai 41 followers Bangalore, India	SpeechLLM	This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.	31
91	robert-mcdermott 12 followers Seattle	LLM-Image-Classification	Image Classification Testing with LLMs	30
92	tmlr-group 80 followers Hong Kong	WCA	[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"	29
93	Gahyeonkim09 4 followers -	AAPL	AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)	28
94	miccunifi 31 followers Firenze - Viale Morgagni 65 - Italia	KDPL	[ECCV 2024] - Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation	26
95	BUAADreamer 45 followers Beijing	Chinese-LLaVA-Med	中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine	26
96	foundation-multimodal-models 6 followers -	ConBench	Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".	26
97	ProGamerGov 132 followers Multiverse	VLM-Captioning-Tools	Python scripts to use for captioning images with VLMs	24
98	ys-zong 14 followers Edinburgh	VLGuard	[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.	24
99	ParadoxZW 43 followers Hangzhou, China	LLaVA-UHD-Better	A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo	24
100	ai-aigc-studio 9 followers -	Kling-AI-Webui	Kling AI, Make Imagination Alive. This is a revolutionary text-to-video model like Sora. Kling AI WebUI is the open source project to integrate Kling AI Video Generation Model.	24