mbzuai-oryx

A Library for Large Vision-Language Models

Github Data

Followers 200
Following 0

AI Project

Public repos: 14Public gists: 0

VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
star: 161fork: 8
language: Python
created at: 2024-06-13
updated at: 2024-07-25

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
star: 730fork: 48
language: Python
created at: 2024-04-26
updated at: 2024-08-11

MobiLlama

MobiLlama : Small Language Model tailored for edge devices
star: 563fork: 40
language: Python
created at: 2024-02-23
updated at: 2024-08-05

GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
star: 382fork: 29
language: Python
created at: 2023-11-23
updated at: 2024-08-14

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
star: 645fork: 30
language: Python
created at: 2023-11-02
updated at: 2024-08-11

XrayGPT

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
star: 438fork: 51
language: Python
created at: 2023-05-18
updated at: 2024-08-09

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
star: 1.0Kfork: 92
language: Python
created at: 2023-05-18
updated at: 2024-08-10