AI/LLM Ecosystem Directory — Repository Ratings

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

Inference Engines

★ 6.6k◇ 433RustApache-2.03d ago

flashinfer

flashinfer-ai/flashinfer

7.6

FlashInfer: Kernel Library for LLM Serving

Inference Engines

★ 5.9k◇ 1.1kPythonApache-2.0today

kserve

kserve/kserve

7.7

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Inference Engines

★ 5.6k◇ 1.5kGoApache-2.03d ago

shimmy

Michael-A-Kuykendall/shimmy

6.3

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Inference Engines

★ 5.5k◇ 532RustApache-2.011d ago

Awesome-LLM-Inference

xlite-dev/Awesome-LLM-Inference

6.6

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Inference Engines

★ 5.4k◇ 417PythonGPL-3.05d ago

gpustack

gpustack/gpustack

7.0

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Inference Engines

★ 5.2k◇ 556PythonApache-2.0today

eko

FellouAI/eko

7.0

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Inference Engines

★ 4.9k◇ 439TypeScriptMIT3mo ago

lemonade

lemonade-sdk/lemonade

7.2

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Inference Engines

★ 4.7k◇ 371C++Apache-2.0today

RuVector

ruvnet/RuVector

7.3

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines

★ 4.3k◇ 568RustMITtoday

ruvector

ruvnet/ruvector

7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines

★ 4.3k◇ 568RustMITtoday

optillm

algorithmicsuperintelligence/optillm

6.5

Optimizing inference proxy for LLMs

Inference Engines

★ 4.2k◇ 367PythonApache-2.01mo ago

lorax

predibase/lorax

6.8

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Inference Engines

★ 3.8k◇ 322PythonApache-2.01mo ago

deepsparse

neuralmagic/deepsparse

5.9

Sparsity-aware deep learning inference runtime for CPUs

Inference Engines

★ 3.2k◇ 191PythonNOASSERTION1y ago

spiceai

spiceai/spiceai

7.0

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Inference Engines

★ 3.0k◇ 207RustApache-2.0today

distributed-llama

b4rtaz/distributed-llama

6.1

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Inference Engines

★ 3.0k◇ 237C++MIT2mo ago

Medusa

FasterDecoding/Medusa

5.4

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Inference Engines

★ 2.8k◇ 202Jupyter NotebookApache-2.02y ago

kvcached

ovg-project/kvcached

5.8

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Inference Engines

★ 1.1k◇ 121PythonApache-2.016d ago

nobodywho

nobodywho-ooo/nobodywho

6.4

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Inference Engines

★ 1.0k◇ 70RustEUPL-1.21d ago

ZhiLight

zhihu/ZhiLight

5.3

A highly optimized LLM inference acceleration engine for Llama and its variants.

Inference Engines

★ 906◇ 102C++Apache-2.03mo ago

mlxstudio

jjang-ai/mlxstudio

5.2

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

Inference Engines

★ 830◇ 575d ago

yalm

andrewkchan/yalm

3.7

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Inference Engines

★ 590◇ 64C++9mo ago

KuiperLLama

zjhellofss/KuiperLLama

4.0

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Inference Engines

★ 547◇ 143C++8mo ago

openinfer

openinfer-project/openinfer

5.8

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

Inference Engines

★ 482◇ 70RustApache-2.0today

tessera

zengxiao-he/tessera

4.3

From teacher to tiles — a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.

Inference Engines

★ 389◇ 4PythonNOASSERTION23d ago

swiftLLM

interestingLSY/swiftLLM

3.8

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Inference Engines

★ 329◇ 37PythonApache-2.01y ago