mbzuai-oryx

A Library for Large Vision-Language Models

Github Data

Followers 220
Following 0

AI Project

Public repos: 17Public gists: 0

VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
star: 217fork: 15
language: Python
created at: 2024-06-13
updated at: 2024-11-16

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
star: 813fork: 61
language: Python
created at: 2024-04-26
updated at: 2024-11-17

MobiLlama

MobiLlama : Small Language Model tailored for edge devices
star: 595fork: 45
language: Python
created at: 2024-02-23
updated at: 2024-11-17

GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
star: 447fork: 36
language: Python
created at: 2023-11-23
updated at: 2024-11-18

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
star: 783fork: 38
language: Python
created at: 2023-11-02
updated at: 2024-11-20

XrayGPT

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
star: 468fork: 56
language: Python
created at: 2023-05-18
updated at: 2024-11-20

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
star: 1.2Kfork: 108
language: Python
created at: 2023-05-18
updated at: 2024-11-20