Skip to content
@REServeLLM

REServe: Reliable and Efficient Large Language Models Serving System

REServe: Reliable and Efficient Large Language Models Serving System

Pinned Loading

  1. Initializer Initializer Public

    Initializer for KServe Cluster

    Shell 1 1

Repositories

Showing 8 of 8 repositories
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    REServeLLM/vllm’s past year of commit activity
    Python 0 Apache-2.0 4,636 0 0 Updated Nov 5, 2024
  • vllm-pdd Public Forked from KuntaiDu/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    REServeLLM/vllm-pdd’s past year of commit activity
    Python 0 Apache-2.0 4,636 0 0 Updated Oct 22, 2024
  • ServerlessLLM Public Forked from ServerlessLLM/ServerlessLLM

    Cost-efficient and fast multi-LLM serving.

    REServeLLM/ServerlessLLM’s past year of commit activity
    Python 0 Apache-2.0 30 0 0 Updated Jul 31, 2024
  • core Public

    Core components for REServe

    REServeLLM/core’s past year of commit activity
    Python 0 0 0 0 Updated Jul 29, 2024
  • Initializer Public

    Initializer for KServe Cluster

    REServeLLM/Initializer’s past year of commit activity
    Shell 1 Apache-2.0 1 0 0 Updated Jul 29, 2024
  • tensorrtllm_backend Public Forked from triton-inference-server/tensorrtllm_backend

    The Triton TensorRT-LLM Backend

    REServeLLM/tensorrtllm_backend’s past year of commit activity
    Python 0 Apache-2.0 106 0 0 Updated Jul 29, 2024
  • TensorRT-LLM Public Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    REServeLLM/TensorRT-LLM’s past year of commit activity
    C++ 0 Apache-2.0 1,000 0 0 Updated Jul 9, 2024
  • kserve Public Forked from kserve/kserve

    Standardized Serverless ML Inference Platform on Kubernetes

    REServeLLM/kserve’s past year of commit activity
    Python 0 Apache-2.0 1,076 0 0 Updated Jul 4, 2024

Top languages

Loading…

Most used topics

Loading…