mbzuai-oryx

A Library for Large Vision-Language Models

Github Data

Followers 281

Following 0

Links

https://ival-mbzuai.com

AI Project

Public repos: 26Public gists: 0

LLMVoX

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

star: 103fork: 6

language: Python

created at: 2025-03-06

updated at: 2025-03-12

AIN

AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.

star: 31fork: 0

language: HTML

created at: 2025-01-27

updated at: 2025-03-04

VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

star: 252fork: 15

language: Python

created at: 2024-06-13

updated at: 2025-02-07

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

star: 828fork: 61

language: Python

created at: 2024-04-26

updated at: 2025-02-07

MobiLlama

MobiLlama : Small Language Model tailored for edge devices

star: 619fork: 48

language: Python

created at: 2024-02-23

updated at: 2025-02-04

GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

star: 527fork: 45

language: Python

created at: 2023-11-23

updated at: 2025-03-02

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

star: 835fork: 42

language: Python

created at: 2023-11-02

updated at: 2025-03-03

XrayGPT

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

star: 483fork: 58

language: Python

created at: 2023-05-18

updated at: 2025-02-07

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

star: 1.3Kfork: 111

language: Python

created at: 2023-05-18

updated at: 2025-02-09