STACKQUADRANT

llama.cpp

ggml-org/llama.cpp
8.2

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
118.5k20.0kC++MITtoday

vLLM

vllm-project/vllm
8.6

vLLM — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
84.7k18.6kPythonApache-2.0today

gpt4all

nomic-ai/gpt4all
7.1

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Inference Engines
77.4k8.3kC++MIT1y ago

ray

ray-project/ray
8.4

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Inference Engines
43.0k7.7kPythonApache-2.01d ago

gitleaks

gitleaks/gitleaks
8.2

Find secrets with Gitleaks 🔑

Inference Engines
27.9k2.1kGoMIT4d ago

llm-action

liguodongiot/llm-action
6.8

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

Inference Engines
24.6k2.8kHTMLApache-2.03d ago

litgpt

Lightning-AI/litgpt
7.8

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Inference Engines
13.4k1.5kPythonApache-2.02d ago

OpenLLM

bentoml/OpenLLM
7.4

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Inference Engines
12.4k819PythonApache-2.06d ago

mistral-inference

mistralai/mistral-inference
7.2

Official inference library for Mistral models

Inference Engines
10.8k1.1kJupyter NotebookApache-2.012d ago

openvino

openvinotoolkit/openvino
7.9

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Inference Engines
10.4k3.3kC++Apache-2.02d ago

PowerInfer

Tiiny-AI/PowerInfer
6.9

High-speed Large Language Model Serving for Local Deployment

Inference Engines
9.6k586C++MIT1mo ago

BentoML

bentoml/BentoML
8.0

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Inference Engines
8.7k979PythonApache-2.06d ago

lmdeploy

InternLM/lmdeploy
7.5

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Inference Engines
7.9k700PythonApache-2.02d ago

openevolve

algorithmicsuperintelligence/openevolve
6.6

Open-source implementation of AlphaEvolve

Inference Engines
6.6k1.1kPythonApache-2.03mo ago

plano

katanemo/plano
7.4

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

Inference Engines
6.6k433RustApache-2.03d ago

flashinfer

flashinfer-ai/flashinfer
7.6

FlashInfer: Kernel Library for LLM Serving

Inference Engines
5.9k1.1kPythonApache-2.0today

kserve

kserve/kserve
7.7

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Inference Engines
5.6k1.5kGoApache-2.03d ago

shimmy

Michael-A-Kuykendall/shimmy
6.3

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Inference Engines
5.5k530RustApache-2.011d ago

Awesome-LLM-Inference

xlite-dev/Awesome-LLM-Inference
6.6

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Inference Engines
5.4k417PythonGPL-3.05d ago

gpustack

gpustack/gpustack
7.0

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Inference Engines
5.2k556PythonApache-2.0today

eko

FellouAI/eko
7.0

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Inference Engines
4.9k439TypeScriptMIT3mo ago

lemonade

lemonade-sdk/lemonade
7.2

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Inference Engines
4.7k371C++Apache-2.0today

RuVector

ruvnet/RuVector
7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
4.3k568RustMITtoday

ruvector

ruvnet/ruvector
7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
4.3k568RustMITtoday

optillm

algorithmicsuperintelligence/optillm
6.5

Optimizing inference proxy for LLMs

Inference Engines
4.2k368PythonApache-2.01mo ago

lorax

predibase/lorax
6.8

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Inference Engines
3.8k322PythonApache-2.01mo ago

deepsparse

neuralmagic/deepsparse
5.9

Sparsity-aware deep learning inference runtime for CPUs

Inference Engines
3.2k191PythonNOASSERTION1y ago

spiceai

spiceai/spiceai
7.1

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Inference Engines
3.0k207RustApache-2.0today

distributed-llama

b4rtaz/distributed-llama
6.1

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Inference Engines
3.0k237C++MIT2mo ago

Medusa

FasterDecoding/Medusa
5.4

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Inference Engines
2.8k202Jupyter NotebookApache-2.02y ago

kvcached

ovg-project/kvcached
5.8

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Inference Engines
1.1k120PythonApache-2.016d ago

nobodywho

nobodywho-ooo/nobodywho
6.4

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Inference Engines
1.0k70RustEUPL-1.21d ago

ZhiLight

zhihu/ZhiLight
5.3

A highly optimized LLM inference acceleration engine for Llama and its variants.

Inference Engines
906102C++Apache-2.03mo ago

mlxstudio

jjang-ai/mlxstudio
5.2

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

Inference Engines
830575d ago

yalm

andrewkchan/yalm
3.7

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Inference Engines
59064C++9mo ago

KuiperLLama

zjhellofss/KuiperLLama
4.0

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Inference Engines
547143C++8mo ago

openinfer

openinfer-project/openinfer
5.8

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

Inference Engines
48170RustApache-2.0today

tessera

zengxiao-he/tessera
4.3

From teacher to tiles — a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.

Inference Engines
3864PythonNOASSERTION23d ago

swiftLLM

interestingLSY/swiftLLM
3.8

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Inference Engines
32937PythonApache-2.01y ago