huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1k
Star 8.8k

Code
Issues 83
Pull requests 23
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

Nightly load test results

#2235 opened Jul 15, 2024 by Hugoch

Open 4

Labels 13 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

83 Open 1,213 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

TGI keeps referencing the default model in the image (bigscience/bloom)

#2534 opened Sep 18, 2024 by BeylasanRuzaiqi

2 of 4 tasks

Support for returning a CompletionUsage object when streaming=True

#2531 opened Sep 17, 2024 by andrewrreed

xpu/cpu: docker images referenced in documentation do not exist

#2530 opened Sep 17, 2024 by dvrogozh

* HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable

#2526 opened Sep 17, 2024 by aditivw

4 tasks

Add response_format input parameter to v1/chat/completions endpoint

#2523 opened Sep 16, 2024 by ktrapeznikov

tgi server launch fails with latest-rocm docker image.

#2522 opened Sep 13, 2024 by gurpreet-dhami

3 of 4 tasks

RuntimeError: weight model.embed_tokens.weight does not exist

#2509 opened Sep 11, 2024 by jayus71

3 of 4 tasks

Multi-LORA feature question-2

#2506 opened Sep 9, 2024 by imran3180

Multi-LORA feature question

#2505 opened Sep 9, 2024 by imran3180

Cant install on Ubuntu 22.04 with Cuda 11.8

#2504 opened Sep 9, 2024 by Oxi84

1 of 4 tasks

Add support for Idefics 3 new model

Request for integration of new model

#2503 opened Sep 7, 2024 by stelterlab

2 tasks done

Response prefill logprobs seems to become incorrect when using AsyncInferenceClient in some circumstances

#2502 opened Sep 6, 2024 by sadra-barikbin

Error while building TGI from source

#2495 opened Sep 6, 2024 by samin-batra

2 of 4 tasks

A seeming typo in text_generation_server/utils/adapters.py

#2483 opened Sep 2, 2024 by sadra-barikbin

Feature Request: Support for LLaMa 3.1 built-in tools

#2480 opened Aug 30, 2024 by martinigoyanes

Qwen2-VL new model

Request for integration of new model

#2476 opened Aug 30, 2024 by jvhgit

2 tasks done

Watermarking cannot be detected

#2474 opened Aug 29, 2024 by vorwerkc

2 of 4 tasks

Quantization Failure with Bitsandbytes on SageMaker TGI Deployment: Compatibility Issue?

#2467 opened Aug 28, 2024 by imadoualid

Guide on how to use TensorRT-LLM Backend

#2466 opened Aug 28, 2024 by michaelthreet

Could not import SGMV kernel from Punica, falling back to loop.

#2465 opened Aug 28, 2024 by ksajan

2 of 4 tasks

Failing to unpickle the model

#2464 opened Aug 27, 2024 by ksajan

2 of 4 tasks

Error With Tool Calling

#2461 opened Aug 27, 2024 by Archmilio

2 of 4 tasks

Support Phi-3.5 MoE new model

Request for integration of new model

#2457 opened Aug 25, 2024 by maziyarpanahi

Running TGI on NVIDIA T4

#2456 opened Aug 25, 2024 by ivoras

2 of 4 tasks

[Volta] [No flash attention] Dependencies missing for running quantized Llama models in docker

#2448 opened Aug 22, 2024 by ladi-pomsar

2 of 4 tasks

Previous 1 2 3 4 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly