vllm

Here are 116 public repositories matching this topic...

meta-llama / llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated Jan 17, 2025
Jupyter Notebook

xorbitsai / inference

Star

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated Jan 17, 2025
Python

katanaml / sparrow

Sponsor

Star

Data processing with ML, LLM and Vision LLM

computer-vision machinelearning gpt nlp-machine-learning rag huggingface-transformers llm vllm

Updated Jan 15, 2025
Python

OpenRLHF / OpenRLHF

Star

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

reinforcement-learning raylib transformers proximal-policy-optimization large-language-models reinforcement-learning-from-human-feedback vllm openai-o1

Updated Jan 16, 2025
Python

DefTruth / Awesome-LLM-Inference

Star

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

sora vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01

Updated Jan 16, 2025

bricks-cloud / BricksLLM

Star

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated Jan 5, 2025
Go

prometheus-eval / prometheus-eval

Star

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Jan 7, 2025
Python

substratusai / kubeai

Star

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.

kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator

Updated Jan 19, 2025
Go

jakobdylanc / llmcord

Sponsor

Star

Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

Updated Jan 14, 2025
Python

harleyszhang / llm_note

Star

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

cuda-programming transformer-models kv-cache llm vllm llm-inference triton-kernels

Updated Jan 14, 2025
Python

varunshenoy / super-json-mode

Star

Low latency JSON generation using LLMs ⚡️

openai huggingface-transformers llm vllm

Updated Mar 10, 2024
Jupyter Notebook

ModelTC / llmc

Star

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".