You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
serving gemma2-9b on 4 GPUs with 8k context length. I disabled the sliding window and enable flashinfer backend, however when I send a long prompt I got weird responses with repeated same first sentences!!
example:
prompt is to answer a question given a paper content.
response:
"content": "\n\nThe paper you'\n\nLet me know if you'\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper.\n\nLet me know if you have any other questions about the paper"
How to solve this issue?
The text was updated successfully, but these errors were encountered:
serving gemma2-9b on 4 GPUs with 8k context length. I disabled the sliding window and enable flashinfer backend, however when I send a long prompt I got weird responses with repeated same first sentences!!
example:
prompt is to answer a question given a paper content.
response:
How to solve this issue?
The text was updated successfully, but these errors were encountered: