Inferless

All

149 repositories

Llama-3.2-3B-Instruct
Public
0•0•0•2•Updated Feb 14, 2025Feb 14, 2025
Mistral-7B-Instruct-v0.3
Public
0•0•0•2•Updated Feb 14, 2025Feb 14, 2025
GPT-Neo-Dynamic-Batching
Public template
Deploying a GPT-Neo model with Dynamic Batching where requests are dynamically batched. <metadata> collections: ["Dynamic Batching","HF Transformers"] </metadata>
generate-text
Python
•0•0•0•0•Updated Feb 14, 2025Feb 14, 2025
Tinyllama-1.1B-chat-vLLM-GGUF
Public template
Deploy Tinyllama-1.1B using GGUF with vLLM, efficient inference for quantized LLMs by leveraging vLLM’s optimized backend to accelerate prompt processing and reduce memory usage. <metadata> gpu: A100 | collections: ["Using NFS Volumes", "vLLM"] </metadata>
generate-text
Python
•1•1•0•0•Updated Feb 14, 2025Feb 14, 2025
Vicuna-7b-1.1
Public template
Open-source chatbot fine-tuned from LLaMA on 70K ShareGPT conversations, optimized for research and conversational tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•2•0•0•0•Updated Feb 14, 2025Feb 14, 2025
Zephyr-7B-Streaming
Public template
Zephyr-7B model with Server-Sent Events (SSE) enables real-time streaming, delivering efficient, interactive updates for chat based applications. <metadata> gpu: A100 | collections: ["Streaming LLMs", "SSE Events"] </metadata>
generate-text
Python
•1•0•0•0•Updated Feb 14, 2025Feb 14, 2025
whisper-large-v3
Public template
Flagship speech recognition model for English, delivering state‑of‑the‑art transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
audio-to-text
Python
•10•15•0•0•Updated Feb 14, 2025Feb 14, 2025
s3-model-import
Public
Python
•0•0•0•0•Updated Feb 14, 2025Feb 14, 2025
MeloTTS
Public
Python
•1•0•0•0•Updated Feb 14, 2025Feb 14, 2025
Phi-2
Public template
A small language model delivering robust text generation, reasoning, and instruction following with efficient long-context comprehension. <metadata> gpu: T4 | collections: ["vLLM","Batch Input Processing"] </metadata>
generate-text
Python
•5•0•0•0•Updated Feb 14, 2025Feb 14, 2025
bark-streaming
Public template
Bark text-to-speech with Server-Sent Events (SSE) streams real-time audio, providing interactive and efficient updates for chat-based applications. <metadata> gpu: A100 | collections: ["SSE Events"] </metadata>
audio-generation
Python
•2•0•0•0•Updated Feb 14, 2025Feb 14, 2025
Qwen2.5-VL-7B-Instruct
Public
0•0•0•1•Updated Feb 14, 2025Feb 14, 2025
Bark
Public template
A transformer-based text-to-audio model generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
audio-generation
Python
•11•4•0•0•Updated Feb 14, 2025Feb 14, 2025
Llama-2-13b-chat-AWQ
Public
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
text-generation
Python
•3•0•0•0•Updated Feb 14, 2025Feb 14, 2025
Qwen2.5-Coder-32B-Instruct
Public template
A State-Of-The-Art coder LLM, tailored for instruction-based tasks, particularly in code generation, reasoning, and repair. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
code-generation
Python
•2•1•0•0•Updated Feb 14, 2025Feb 14, 2025
Mistral-Small-24B-Instruct
Public
0•0•0•1•Updated Feb 14, 2025Feb 14, 2025
pyannote-speaker-diarization-3.1
Public template
A state-of-the-art model that segments and labels audio recordings by accurately distinguishing different speakers. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
audio-generation
Python
•1•2•0•0•Updated Feb 13, 2025Feb 13, 2025
Llama-2-7b-hf
Public template
A 7B parameter model fine-tuned for dialogue, utilizing supervised learning and RLHF, supports a context length of up to 4,000 tokens. <metadata> gpu: A10 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•2•1•0•0•Updated Feb 13, 2025Feb 13, 2025
Falcon-7b-instruct
Public template
A 7B instruction-tuned language model that excels in following detailed prompts and effectively performing a wide variety of natural language processing tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•1•0•0•0•Updated Feb 13, 2025Feb 13, 2025
DreamShaper
Public template
A ControlNet model designed for Stable Diffusion, providing brightness adjustment for colorizing or recoloring images. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
image-editing
Python
•2•0•0•0•Updated Feb 13, 2025Feb 13, 2025
Phi-3.5-MoE-instruct
Public template
A mixture-of-experts, instruction-tuned variant of Phi-3.5, delivering efficient, context-aware responses across diverse language tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•1•0•0•0•Updated Feb 13, 2025Feb 13, 2025
LLama2-13B-8bit-GPTQ
Public template
A quantized version of 13B fine-tuned model, optimized for dialogue use cases. <metadata> gpu: T4 | collections: ["HF Transformers","GPTQ"] </metadata>
generate-text
Python
•3•0•0•0•Updated Feb 13, 2025Feb 13, 2025
Mistral-7B
Public template
A 7B autoregressive language model by Mistral AI, optimized for efficient text generation and robust reasoning. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
generate-text
Python
•11•3•0•0•Updated Feb 13, 2025Feb 13, 2025
RealVisXL_V4.0_Lightning
Public template
A lightweight, accelerated variant of RealVisXL V4.0, engineered for real‑time, high‑quality image generation with enhanced efficiency. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
image-generation
Python
•4•0•0•0•Updated Feb 13, 2025Feb 13, 2025
SDXL-Lightning
Public template
A lightning-fast text-to-image generation model that generate high-quality 1024px images in a few steps. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
image-generation
Python
•5•3•0•0•Updated Feb 13, 2025Feb 13, 2025
Stable-diffusion-v1-5
Public template
A text-to-image model by Stability AI, renowned for generating high-quality, diverse images from text prompts. <metadata> gpu: T4 | collections: ["Diffusers"] </metadata>
image-generation
Python
•1•0•0•0•Updated Feb 13, 2025Feb 13, 2025
Vicuna-13b-8k
Public template
A GPTQ‑quantized, 13‑billion‑parameter uncensored language model with an extended 8K context window, designed for dynamic, high‑performance conversational tasks. <metadata> gpu: T4 | collections: ["GPTQ"] </metadata>
generate-text
Python
•1•0•0•0•Updated Feb 13, 2025Feb 13, 2025
Codellama-34B
Public template
A 34B-parameter Python-specialized model for advanced code synthesis, completion, and understanding. <metadata> gpu: A100 | collections:["AWQ"] </metadata>
code-generation
Python
•5•0•0•0•Updated Feb 12, 2025Feb 12, 2025
CodeLlama-34B-vLLM
Public template
A 34B-parameter, Python-specialized model for advanced code synthesis, deploy with vLLM for efficient inference. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
code-generation
Python
•3•0•0•0•Updated Feb 12, 2025Feb 12, 2025
Stable-Diffusion-3.5-large-turbo
Public
0•0•0•0•Updated Feb 12, 2025Feb 12, 2025