Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

ngxson · 2024-09-27T17:53:59Z

ngxson
Sep 27, 2024
Collaborator

You can now deploy any GGUF model on your own endpoint, in just a few clicks!

Simply select GGUF, select hardware configuration and done! An endpoint powered by llama-server (built from master branch) will be deployed automatically. It works with all llama.cpp-compatible models, with all size, from 0.1B up to 405B parameters.

Try it now --> https://ui.endpoints.huggingface.co/

And the best part is:

@ggerganov: ggml.ai will be receiving a revenue share from all llama.cpp-powered endpoints used on HF. So for anyone who wants to support us, make sure to give those endpoints a try ♥️

A huge thanks to @ggerganov @slaren and @huggingface team for making this possible!

llama.hfe.ok.mp4

ngxson · 2024-09-27T18:00:19Z

ngxson
Sep 27, 2024
Collaborator Author

Hermes 405B model can be deployed on 2xA100. The generation speed is around 8t/s, which is not bad!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

ngxson Sep 27, 2024 Collaborator

Replies: 1 comment

ngxson Sep 27, 2024 Collaborator Author

ngxson
Sep 27, 2024
Collaborator

ngxson
Sep 27, 2024
Collaborator Author