Github Data

Followers 4433
Following 0

Links

AI Project

Public repos: 1277Public gists: 0

auto-round

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM.
star: 524fork: 42
language: Python
created at: 2024-01-04
updated at: 2025-06-27

llm-on-ray

Pretrain, finetune and serve LLMs on Intel platforms with Ray
star: 109fork: 30
language: Python
created at: 2023-11-13
updated at: 2025-02-09

ipex-llm-tutorial

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
star: 157fork: 39
language: Jupyter Notebook
created at: 2023-07-27
updated at: 2025-02-04

xFasterTransformer

star: 428fork: 69
language: C++
created at: 2023-06-14
updated at: 2025-06-17

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
star: 2.2Kfork: 211
language: Python
created at: 2022-11-11
updated at: 2025-02-10

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
star: 2.3Kfork: 262
language: Python
created at: 2020-07-21
updated at: 2025-02-10

lms

star: 35fork: 10
language: C++
created at: 2019-04-09
updated at: 2025-02-09

lmbench

star: 305fork: 140
language: C
created at: 2018-07-31
updated at: 2025-04-23

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
star: 8.1Kfork: 1.4K
language: Python
created at: 2016-08-29
updated at: 2025-07-01