-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Issues: vllm-project/vllm
[RFC]: Reimplement and separate beam search on top of vLLM core
#8306
opened Sep 9, 2024 by
youkaichao
Open
6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: AttributeError: module 'cv2.dnn' has no attribute 'DictValue'
bug
Something isn't working
#8650
opened Sep 20, 2024 by
eyuansu62
1 task done
[Bug]: Using FlashInfer with FP8 model with FP8 KV cache produces an error
bug
Something isn't working
#8641
opened Sep 19, 2024 by
Syst3m1cAn0maly
1 task done
[Performance]: The accept rate of typical acceptance sampling
performance
Performance-related issues
#8639
opened Sep 19, 2024 by
hustxiayang
1 task done
[Bug]: loading embedding model intfloat/e5-mistral-7b-instruct results in a bind error
bug
Something isn't working
#8638
opened Sep 19, 2024 by
nickandbro
1 task done
[Usage]: Ray + vLLM OpenAI (offline) Batch Inference
usage
How to use vllm
#8636
opened Sep 19, 2024 by
mbuet2ner
1 task done
[Feature]: OpenAI o1-like Chain-of-thought (CoT) inference workflow
feature request
#8633
opened Sep 19, 2024 by
kozuch
1 task done
[Misc]: How are quantized models loaded compared to non-quantized models?
misc
#8632
opened Sep 19, 2024 by
gnpinkert
[Feature]: Online Inference on local model with OpenAI Python SDK
feature request
#8631
opened Sep 19, 2024 by
pesc101
1 task done
[Bug]: OpenGVLab/InternVL2-Llama3-76B: view size is not compatible with input tensor's size and stride
bug
Something isn't working
#8630
opened Sep 19, 2024 by
erkintelnyx
[Bug]: Speculative decoding interferes with CPU-only execution
bug
Something isn't working
#8628
opened Sep 19, 2024 by
NickLucche
1 task done
[Bug]: MistralTokenizer Detokenization Issue
bug
Something isn't working
#8627
opened Sep 19, 2024 by
ywang96
1 task done
[Usage]: doesn't work on pascal tesla P100
usage
How to use vllm
#8626
opened Sep 19, 2024 by
Stargate256
1 task done
[Bug]: Wrong "completion_tokens" counts in streaming usage
bug
Something isn't working
#8625
opened Sep 19, 2024 by
yuhon0528
1 task done
qwen2-vl: AttributeError: '_OpNamespace' '_C' object has no attribute 'gelu_quick'
bug
Something isn't working
#8624
opened Sep 19, 2024 by
xiangxinhello
1 task done
[Feature]: Output logps of given output
feature request
#8622
opened Sep 19, 2024 by
lycheeyolo
1 task done
[Bug]: vllm deploy medusa, draft acceptance rate: 0.000
bug
Something isn't working
#8620
opened Sep 19, 2024 by
xhjcxxl
[Usage]: Number of requests currently in the queue
usage
How to use vllm
#8617
opened Sep 19, 2024 by
shubh9m
1 task done
[Usage]: Standalone Debugging and Measuring the vLLM Engine Backend
usage
How to use vllm
#8586
opened Sep 19, 2024 by
htang2012
1 task done
[Usage]: How to run VLLM on multiple tpu hosts V4-32
usage
How to use vllm
#8582
opened Sep 18, 2024 by
sparsh35
1 task done
[Bug]: Wrong Response with Gemma2 with 8k context length
bug
Something isn't working
#8580
opened Sep 18, 2024 by
hahmad2008
[Bug]: lm-format-enforcer guided decoding kills MQLLMEngine
bug
Something isn't working
#8578
opened Sep 18, 2024 by
joerunde
1 task done
Previous Next
ProTip!
Adding no:label will show everything without a label.