AI/LLM Ecosystem Directory — Repository Ratings

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

Model Serving

★ 4.1k◇ 401MIT11mo ago

LightLLM

ModelTC/LightLLM

6.5

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Model Serving

★ 4.1k◇ 335PythonApache-2.02d ago

chitu

thu-pacman/chitu

6.8

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Model Serving

★ 3.1k◇ 266PythonApache-2.01d ago

ramalama

containers/ramalama

7.5

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Model Serving

★ 2.9k◇ 344PythonMIT1d ago

inference

roboflow/inference

7.0

Turn any computer or edge device into a command center for your computer vision projects.

Model Serving

★ 2.3k◇ 277PythonNOASSERTION2d ago

vllm-ascend

vllm-project/vllm-ascend

7.2

Community maintained hardware plugin for vLLM on Ascend

Model Serving

★ 2.3k◇ 1.5kC++Apache-2.0today

envd

tensorchord/envd

6.8

🏕️ Reproducible development environment for humans and agents

Model Serving

★ 2.2k◇ 169GoApache-2.01mo ago

sie

superlinked/sie

6.6

Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.

Model Serving

★ 2.1k◇ 183PythonApache-2.01d ago

aici

microsoft/aici

4.9

AICI: Prompts as (Wasm) Programs

Model Serving

★ 2.1k◇ 84RustMIT1y ago

mlrun

mlrun/mlrun

7.2

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

Model Serving

★ 1.7k◇ 308PythonApache-2.0today

kitops

kitops-ml/kitops

7.0

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Model Serving

★ 1.4k◇ 176GoApache-2.02d ago

hopsworks

logicalclocks/hopsworks

5.8

Hopsworks - Data-Intensive AI platform with a Feature Store

Model Serving

★ 1.3k◇ 158JavaAGPL-3.01y ago

rtp-llm

alibaba/rtp-llm

6.0

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Model Serving

★ 1.2k◇ 219CudaApache-2.0today

truss

basetenlabs/truss

6.8

The simplest way to serve AI/ML models in production

Model Serving

★ 1.2k◇ 109PythonMIT2d ago

Nanoflow

efeslab/Nanoflow

4.7

A throughput-oriented high-performance serving framework for LLMs

Model Serving

★ 965◇ 50Jupyter Notebook3mo ago

mosec

mosecorg/mosec

6.5

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Model Serving

★ 902◇ 73PythonApache-2.03d ago

model_server

openvinotoolkit/model_server

6.5

A scalable inference server for models optimized with OpenVINO™

Model Serving

★ 892◇ 260C++Apache-2.02d ago

pipeless

pipeless-ai/pipeless

4.9

An open-source computer vision framework to build and deploy apps in minutes

Model Serving

★ 849◇ 52RustApache-2.02y ago

Yatai

bentoml/Yatai

6.1

Model Deployment at Scale on Kubernetes 🦄️

Model Serving

★ 844◇ 76TypeScriptNOASSERTION29d ago

ServerlessLLM

ServerlessLLM/ServerlessLLM

5.8

Serverless LLM Serving for Everyone.

Model Serving

★ 687◇ 74PythonApache-2.01mo ago

timber

kossisoroyce/timber

5.4

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

Model Serving

★ 685◇ 23PythonNOASSERTION2mo ago

fastapi-ml-skeleton

eightBEC/fastapi-ml-skeleton

4.5

FastAPI Skeleton App to serve machine learning models production-ready.

Model Serving

★ 604◇ 91PythonApache-2.05mo ago

pinferencia

underneathall/pinferencia

4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Model Serving

★ 543◇ 83PythonApache-2.03y ago

ome

ome-projects/ome

6.1

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Model Serving

★ 472◇ 83GoApache-2.01d ago

JetStream

AI-Hypercomputer/JetStream

4.8

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Model Serving

★ 447◇ 66PythonApache-2.05mo ago

xFasterTransformer

intel/xFasterTransformer

4.3

xFasterTransformer — open-source AI/LLM project.

Model Serving

★ 436◇ 75C++Apache-2.09mo ago

gpu-rest-engine

NVIDIA/gpu-rest-engine

3.7

A REST API for Caffe using Docker and Go

Model Serving

★ 422◇ 93C++BSD-3-Clause7y ago

stable-diffusion-deploy

Lightning-Universe/stable-diffusion-deploy

4.6

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

Model Serving

★ 391◇ 39PythonApache-2.02y ago

TurboOCR

aiptimizer/TurboOCR

5.1

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

Model Serving

★ 305◇ 37C++MITtoday

pmetal

Epistates/pmetal

5.0

PMetal: high-performance Apple Silicon framework for local LLM inference, LoRA/QLoRA fine-tuning, serving, quantization, and MLX/Metal acceleration.

Model Serving

★ 300◇ 21RustNOASSERTION23d ago

podman-desktop-extension-ai-lab

containers/podman-desktop-extension-ai-lab

5.9

Work with LLMs on a local environment using containers

Model Serving

★ 291◇ 82TypeScriptApache-2.06d ago

BMW-YOLOv4-Inference-API-GPU

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU

4.1

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Model Serving

★ 277◇ 67PythonBSD-3-Clause4y ago

llm-server

raketenkater/llm-server

4.8

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

Model Serving

★ 237◇ 12GoMIT3d ago

ggrun

BMW-YOLOv4-Inference-API-CPU

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

3.9

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Model Serving

★ 218◇ 58PythonNOASSERTION4y ago