Skip to content
Change the repository type filter

All

    Repositories list

    • 0002Updated Feb 14, 2025Feb 14, 2025
    • 0002Updated Feb 14, 2025Feb 14, 2025
    • Deploying a GPT-Neo model with Dynamic Batching where requests are dynamically batched. <metadata> collections: ["Dynamic Batching","HF Transformers"] </metadata>
      Python
      0000Updated Feb 14, 2025Feb 14, 2025
    • Deploy Tinyllama-1.1B using GGUF with vLLM, efficient inference for quantized LLMs by leveraging vLLM’s optimized backend to accelerate prompt processing and reduce memory usage. <metadata> gpu: A100 | collections: ["Using NFS Volumes", "vLLM"] </metadata>
      Python
      1100Updated Feb 14, 2025Feb 14, 2025
    • Vicuna-7b-1.1

      Public template
      Open-source chatbot fine-tuned from LLaMA on 70K ShareGPT conversations, optimized for research and conversational tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
      Python
      2000Updated Feb 14, 2025Feb 14, 2025
    • Zephyr-7B-Streaming

      Public template
      Zephyr-7B model with Server-Sent Events (SSE) enables real-time streaming, delivering efficient, interactive updates for chat based applications. <metadata> gpu: A100 | collections: ["Streaming LLMs", "SSE Events"] </metadata>
      Python
      1000Updated Feb 14, 2025Feb 14, 2025
    • whisper-large-v3

      Public template
      Flagship speech recognition model for English, delivering state‑of‑the‑art transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
      Python
      101500Updated Feb 14, 2025Feb 14, 2025
    • Python
      0000Updated Feb 14, 2025Feb 14, 2025
    • MeloTTS

      Public
      Python
      1000Updated Feb 14, 2025Feb 14, 2025
    • Phi-2

      Public template
      A small language model delivering robust text generation, reasoning, and instruction following with efficient long-context comprehension. <metadata> gpu: T4 | collections: ["vLLM","Batch Input Processing"] </metadata>
      Python
      5000Updated Feb 14, 2025Feb 14, 2025
    • bark-streaming

      Public template
      Bark text-to-speech with Server-Sent Events (SSE) streams real-time audio, providing interactive and efficient updates for chat-based applications. <metadata> gpu: A100 | collections: ["SSE Events"] </metadata>
      Python
      2000Updated Feb 14, 2025Feb 14, 2025
    • 0001Updated Feb 14, 2025Feb 14, 2025
    • Bark

      Public template
      A transformer-based text-to-audio model generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
      Python
      11400Updated Feb 14, 2025Feb 14, 2025
    • Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
      Python
      3000Updated Feb 14, 2025Feb 14, 2025
    • A State-Of-The-Art coder LLM, tailored for instruction-based tasks, particularly in code generation, reasoning, and repair. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
      Python
      2100Updated Feb 14, 2025Feb 14, 2025
    • 0001Updated Feb 14, 2025Feb 14, 2025
    • A state-of-the-art model that segments and labels audio recordings by accurately distinguishing different speakers. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
      Python
      1200Updated Feb 13, 2025Feb 13, 2025
    • Llama-2-7b-hf

      Public template
      A 7B parameter model fine-tuned for dialogue, utilizing supervised learning and RLHF, supports a context length of up to 4,000 tokens. <metadata> gpu: A10 | collections: ["HF Transformers"] </metadata>
      Python
      2100Updated Feb 13, 2025Feb 13, 2025
    • Falcon-7b-instruct

      Public template
      A 7B instruction-tuned language model that excels in following detailed prompts and effectively performing a wide variety of natural language processing tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
      Python
      1000Updated Feb 13, 2025Feb 13, 2025
    • DreamShaper

      Public template
      A ControlNet model designed for Stable Diffusion, providing brightness adjustment for colorizing or recoloring images. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
      Python
      2000Updated Feb 13, 2025Feb 13, 2025
    • Phi-3.5-MoE-instruct

      Public template
      A mixture-of-experts, instruction-tuned variant of Phi-3.5, delivering efficient, context-aware responses across diverse language tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
      Python
      1000Updated Feb 13, 2025Feb 13, 2025
    • LLama2-13B-8bit-GPTQ

      Public template
      A quantized version of 13B fine-tuned model, optimized for dialogue use cases. <metadata> gpu: T4 | collections: ["HF Transformers","GPTQ"] </metadata>
      Python
      3000Updated Feb 13, 2025Feb 13, 2025
    • Mistral-7B

      Public template
      A 7B autoregressive language model by Mistral AI, optimized for efficient text generation and robust reasoning. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
      Python
      11300Updated Feb 13, 2025Feb 13, 2025
    • A lightweight, accelerated variant of RealVisXL V4.0, engineered for real‑time, high‑quality image generation with enhanced efficiency. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
      Python
      4000Updated Feb 13, 2025Feb 13, 2025
    • SDXL-Lightning

      Public template
      A lightning-fast text-to-image generation model that generate high-quality 1024px images in a few steps. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
      Python
      5300Updated Feb 13, 2025Feb 13, 2025
    • Stable-diffusion-v1-5

      Public template
      A text-to-image model by Stability AI, renowned for generating high-quality, diverse images from text prompts. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
      Python
      1000Updated Feb 13, 2025Feb 13, 2025
    • Vicuna-13b-8k

      Public template
      A GPTQ‑quantized, 13‑billion‑parameter uncensored language model with an extended 8K context window, designed for dynamic, high‑performance conversational tasks. <metadata> gpu: T4 | collections: ["GPTQ"] </metadata>
      Python
      1000Updated Feb 13, 2025Feb 13, 2025
    • Codellama-34B

      Public template
      A 34B-parameter Python-specialized model for advanced code synthesis, completion, and understanding. <metadata> gpu: A100 | collections:["AWQ"] </metadata>
      Python
      5000Updated Feb 12, 2025Feb 12, 2025
    • CodeLlama-34B-vLLM

      Public template
      A 34B-parameter, Python-specialized model for advanced code synthesis, deploy with vLLM for efficient inference. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
      Python
      3000Updated Feb 12, 2025Feb 12, 2025
    • 0000Updated Feb 12, 2025Feb 12, 2025