Top AI Project by Categories

A list of Top influential AI open source project listed by different categories. ( Data sourced from GitHub, updated automatically everyday.)
RankingsOrganization Account
Related Project
Project intro
Star count
1

sgl-project

68 followers
-
sglang
SGLang is a fast serving framework for large language models and vision language models.
4.9K
2

Fanghua-Yu

95 followers
-
SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
4.0K
3

X-PLUG

219 followers
-
MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
2.4K
4

SoraWebui

58 followers
-
SoraWebui
SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model.
2.3K
5

deepseek-ai

1.1K followers
-
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
1.9K
6

cambrian-mllm

29 followers
-
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
1.5K
7

mini-sora

35 followers
-
minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
1.1K
8

ShareGPT4Omni

12 followers
-
ShareGPT4Video
An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
994
9

all-in-aigc

602 followers
-
sorafm
Sora AI Video Generator by Sora.FM
907
10

BAAI-DCAI

120 followers
-
Bunny
A family of lightweight multimodal models.
808
11

open-compass

221 followers
China
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
747
12

mbzuai-oryx

200 followers
-
LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
730
13

CircleRadon

56 followers
Hangzhou
Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
728
14

SunzeY

80 followers
Shanghai, China
AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
595
15

ThuCCSLab

49 followers
Beijing, China
Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
534
16

FoundationVision

249 followers
-
Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
466
17

TinyLLaVA

6 followers
-
TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
462
18

NVlabs

5.5K followers
-
EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
281
19

developersdigest

322 followers
-
ai-devices
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
262
20

WisconsinAIVision

17 followers
-
ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
242
21

neonwatty

297 followers
-
meme_search
Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.
235
22

niuzaisheng

18 followers
Changchun Jilin China
ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
234
23

jurieo

9 followers
-
chatgpt-share-web
chatgpt和claude官网完整还原,包含其官网的全部功能。具有完善的用户体系和流量变现体系。
233
24
Awesome-Open-AI-Sora
Sora AI Awesome List – Your go-to resource hub for all things Sora AI, OpenAI's groundbreaking model for crafting realistic scenes from text. Explore a curated collection of articles, videos, podcasts, and news about Sora's capabilities, advancements, and more.
208
25

shikiw

48 followers
-
OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
206
26

gokayfem

86 followers
Turkey
awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
200
27

illuin-tech

14 followers
Paris, France
colpali
The code used to train and run inference with the ColPali architecture.
196
28

SoraFlows

1 followers
-
SoraFlows
The most powerful and modular Sora WebUI, api and backend with OpenAI's Sora Model. Collecting the highest quality prompts for Sora. using NextJs and Tailwind CSS
189
29

RLHF-V

13 followers
-
RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
188
30

JiaqiLi404

6 followers
Hong Kong
IAmDirector-Text2Video-NextJS-Client
本项目开源基于NextJS的前端, 希望能够提供一个用于生成式AI的文字转视频, 尤其是电影从编剧到视频生成的Web前端平台参考。Everyone can become a director. The Nextjs front-end of an AI driven platform for automatic movie/video generation (form GPT script generation to text2video movie generation).这是一个免费试用AI视频创作平台,集成了基于GPT的视频剧本生成和视频生成功能。 我们的理想是让每个人都能成为导演,以最快的方式将日常中的任何创意转化为高质量的视频, 无论是电影、营销视频、还是自媒体视频。
175
31

shure-dev

12 followers
Japan
Awesome-LLM-related-Papers-Comprehensive-Topics
Awesome LLM-related papers and repos on very comprehensive topics.
170
32

baaivision

412 followers
China
EVE
EVE: Encoder-Free Vision-Language Models from BAAI
168
33

Blaizzy

156 followers
Poland
mlx-vlm
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
166
34

JosefAlbers

14 followers
-
Phi-3-Vision-MLX
Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon
160
35

zhengli97

75 followers
-
PromptKD
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
157
36

xiaoachen98

71 followers
-
Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
148
37

AviSoori1x

71 followers
San Francisco
seemore
From scratch implementation of a vision language model in pure PyTorch
136
38

gbaptista

164 followers
Brazil
ollama-ai
A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.
134
39

mbodiai

15 followers
United States of America
embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
134
40

linzhiqiu

116 followers
-
t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore
132
41

TIGER-AI-Lab

141 followers
Canada
Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
127
42

OpenDriveLab

1.2K followers
Hong Kong
ELM
[ECCV 2024] Embodied Understanding of Driving Scenarios
119
43

sterzhang

4 followers
-
image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
117
44

dvlab-research

560 followers
-
Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
110
45

LostXine

79 followers
Bellevue, WA
LLaRA
LLaRA: Large Language and Robotics Assistant
110
46

notune

10 followers
-
captcha-solver
basic google recaptcha solver using llava-v1.6-7b
99
47

bz-lab

0 followers
-
AUITestAgent
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
91
48

WangWenhao0716

67 followers
Sydney
VidProM
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
91
49

mala-lab

45 followers
Singapore
InCTRL
Official implementation of CVPR'24 paper 'Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts'.
86
50

thomas-yanxin

261 followers
Beijing China
KarmaVLM
🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
84
51

zytx121

117 followers
Singapore
Awesome-VLGFM
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
82
52

graphic-design-ai

5 followers
-
graphist
Official Repo of Graphist
82
53

xlang-ai

387 followers
-
Spider2-V
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
82
54

eliranwong

170 followers
Europe
freegenius
FreeGenius AI, an advanced AI assistant that can talk and take multi-step actions. Supports numerous open-source LLMs via Llama.cpp or Ollama or Groq Cloud API, with optional integration with AutoGen agents, OpenAI API, Google Gemini Pro and unlimited plugins.
81
55

Ahnsun

13 followers
-
merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
77
56

chs20

3 followers
-
RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
73
57

kyegomez

1.3K followers
Palo Alto
MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
72
58

RobotecAI

134 followers
Poland
rai
RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
71
59

opendilab

1.1K followers
China
PsyDI
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
68
60

TideDra

23 followers
Beijing, CN
VL-RLHF
A RLHF Infrastructure for Vision-Language Models
67
61

OpenGVLab

1.9K followers
-
MM-NIAH
This is the official implementation of the paper "Needle In A Multimodal Haystack"
66
62

KwaiVGI

342 followers
-
Uniaa
Unified Multi-modal IAA Baseline and Benchmark
66
63

ruili3

39 followers
Zürich, Switzerland
Know-Your-Neighbors
[CVPR 2024] 🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
64
64

thu-ml

594 followers
FIT Building, Tsinghua University, Beijing, China
MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)
61
65

mu-cai

45 followers
Madison. WI
matryoshka-mm
Matryoshka Multimodal Models
61
66

BAAI-Agents

62 followers
-
GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
57
67

Yxxxb

33 followers
Shenzhen
VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
55
68

billpsomas

64 followers
Athens, Greece
rscir
Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"
54
69

AIDC-AI

3 followers
-
Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
52
70

OpenM3D

3 followers
-
M3DBench
M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts. Furthermore, M3DBench provides a new benchmark to assess large models across 3D vision-centric tasks.
52
71

FlyCole

27 followers
UK
Dream2Real
[ICRA 2024] Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
50
72

tosiyuki

6 followers
-
LLaVA-JP
LLaVA-JP is a Japanese VLM trained by LLaVA method
47
73

blib-la

13 followers
Germany
captain
Give your computer an AI Brain
47
74

YBZh

113 followers
Hong Kong
DMN
CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
45
75

yihedeng9

20 followers
-
STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
45
76

zubair-irshad

212 followers
Silicon Valley, CA, USA
Awesome-Robotics-3D
A curative list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
44
77

fly-apps

266 followers
-
ollama-open-webui
Self-host a ChatGPT-style web interface for Ollama 🦙
42
78

DefTruth

975 followers
Guangzhou, China
Awesome-SD-Inference
📖A small curated list of Awesome SD/DiT/ViT/Diffusion Inference with Distributed/Caching/Sampling: DistriFusion, PipeFusion, AsyncDiff, DeepCache, Block Caching etc.
42
79

HieuPhan33

11 followers
-
CVPR2024_MAVL
Multi-Aspect Vision Language Pretraining - CVPR2024
39
80

princeton-nlp

1.0K followers
-
CharXiv
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
39
81

Victorwz

90 followers
-
MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
39
82

mapluisch

13 followers
Germany
LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
38
83

whwu95

131 followers
Sydney, Australia
FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
38
84

VisualWebBench

1 followers
-
VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
38
85

bonjour-npy

19 followers
-
UndergraduateDissertation
Undergraduate Dissertation of Guilin University of Electronic Technology
38
86

chenshuang-zhang

6 followers
-
imagenet_d
[CVPR2024 Highlight] Official Code for "ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object"
36
87

aimagelab

109 followers
Modena, Italy
LLaVA-MORE
LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1
33
88

richard-peng-xia

26 followers
-
CARES
[arXiv'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
32
89

Hon-Wong

5 followers
-
Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
32
90

skit-ai

41 followers
Bangalore, India
SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
31
91

robert-mcdermott

12 followers
Seattle
LLM-Image-Classification
Image Classification Testing with LLMs
30
92

tmlr-group

80 followers
Hong Kong
WCA
[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"
29
93

Gahyeonkim09

4 followers
-
AAPL
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)
28
94

miccunifi

31 followers
Firenze - Viale Morgagni 65 - Italia
KDPL
[ECCV 2024] - Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
26
95

BUAADreamer

45 followers
Beijing
Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
26
96
ConBench
Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
26
97

ProGamerGov

132 followers
Multiverse
VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
24
98

ys-zong

14 followers
Edinburgh
VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
24
99

ParadoxZW

43 followers
Hangzhou, China
LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
24
100

ai-aigc-studio

9 followers
-
Kling-AI-Webui
Kling AI, Make Imagination Alive. This is a revolutionary text-to-video model like Sora. Kling AI WebUI is the open source project to integrate Kling AI Video Generation Model.
24