-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blogs/a-note-on-deepseek-r1 #8
Comments
Thank you for the guide. WARN source=server.go:216 msg="flash attention enabled but not supported by model" WARN source=server.go:234 msg="quantized kv cache requested but flash attention disabled" type=q8_0 So, looks like DeepSeek model doesn't support flash attention. |
不错的文章,follow了 |
很棒,感谢佬提供的数据,准备配双路EPYC9654跑Q8的满血版,但目前了解到的数据显示这配置最快才8-9tps,混合推理能加速的话加上四张2080ti应该就可用了 |
从 https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_M 下载的是3-4个 .gguf 结尾的文件,那 /home/snowkylin/DeepSeek-R1-UD-IQ1_M.gguf 具体指的是哪一个? |
ignore, 没注意 Note 里有合并流程 |
llama-gguf-split --merge DeepSeek-R1-UD-IQ1_M-00001-of-00004.gguf DeepSeek-R1-UD-IQ1_S.gguf 这个应该是 |
@lcgogo 感谢提醒,已经修正 |
为啥我的DeepSeek-R1-UD-IQ1_M版好蠢,没有思考过程,且输出也不对。
The word "strawberry" contains 2 'r's. |
@KKIverson 1. 请检查是否从 https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_M 下载模型(总大小约158G,一般网络需要数十个小时下载) 2. 检查TEMPLATE是否正确 3. 多生成几次看看 4. 尝试问其他问题 |
@snowkylin 感谢大佬,重新ollama run之后似乎又变好了。 |
还有个问题想请教佬,文中提到DeepSeek-R1-Q4_K_M的推理速度是2-4 tokens/s,这是在使用gpu混合推理后的速度还是纯cpu推理的速度呢?如果是混合推理的速度,不知它是和动态量化版本运行时起到了相同的加速效果,还是由于pcie带宽太低反向拉低了推理速度呢 |
你们的速度怎么样?加载到卡里用到了50G,GPU利用不起来,如何更高效的利用GPU?长文本1-2token/s? |
我试了下,在A100上,跑 DeepSeek-R1-UD-IQ1_M
结果好慢,推理很久(PARAMETER num_gpu 14 用了 17m)
奇怪的是我有2块A100,但是 ollama 只用了一块的显存,
|
奇怪的是我有2块A100,但是 ollama 只用了一块的显存,
|
@nilecui 我们是2张A800,跑DeepSeek-R1-UD-IQ1_M.gguf,配置如下 本来我想把length设置成8k或者16k,结果报kv cache,out of memory,所以设置成了4k。每张卡占用量60 - 70多G。 |
@canghaimeng 是使用GPU混合推理后的速度。因为测试设备内存有限(384G),无法使用纯CPU推理4-bit模型,所以无法实际测试比较4-bit下纯CPU推理和混合推理的速度,不过我倾向于认为GPU还是起到了一定加速效果。 |
Why does my model fail to run? |
@qfb594 Are you using AngthingLLM? I haven't used it yet, so not sure what the problem is. Maybe try |
运行模型报错 Error: Post "http://127.0.0.1:11434/api/generate": read tcp 127.0.0.1:58813->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host. 运行报错这个 但是我用千问的模型就可以正常跑起来,请问大佬如何解决? |
感谢佬们,借了8张L20准备部署一个试试,llama.cpp 刚部署好,正准备下一个UD-IQ2_XXS玩玩,国内为了加速需要 hf-mirror 换源么,各位可以指点下么,刚上手。 UPDATE: 这是我部署笔记,记录和参考,再次感谢 @snowkylin 大佬做出的先行研究,参考了很多,很有启发! |
1 + 1 等于 2。这是基础的加法运算结果。
输出一堆乱码,有大佬知道咋回事吗? |
第一个问题无法闭合,第二个问题就不出现出现乱码 |
root@196cbfc9c720:/deepseek# ollama run DeepSeek-R1-UD-IQ1_S:latest |
It's cool. I deploy Deepseek R1 on one GPU - AMD Instinct MI300X refer to it . Sharing my experience at https://medium.com/@alexhe.amd/deploy-deepseek-r1-in-one-gpu-amd-instinct-mi300x-7a9abeb85f78 |
作者你好,我在8卡4090机器上运行DeepSeek-R1-UD-IQ1_S模型,num_gpu 61,运行起来模型的时候,每秒大概14tokens,但是为什么CPU利用率很大呀,一直都是2000%多,有什么解决方法吗? |
目前进展:经常出现context shift is disabled,正在研究
|
ollama create DeepSeek-R1-UD-IQ1_M -f DeepSeekQ1_Modelfile |
你好我也是部署r1在服务端回答出现乱码,请问如何解决呢 |
What is the ollama version?0.5.7? |
@zengqingfu1442 yes, it is |
看机器,这个我感觉比较耗cpu和diskio,没仔细看,大概10+分钟 |
I use ollama to run DeepSeek-R1-Q4_K_M.Why does the ollama say the model architecture is deepseekv2? The deepseek-v2 has 61 layers but deepseek-r1 should have 32 layers, right?
|
目前用 llama.cpp server 做后端,测试 5 token/s 生成,8 * L20,没有测并发。
|
@lcgogo 有一张卡闲置可能是你测试的时候只有一个上下文,我这边测试也是两个A100,多个终端访问它会把两张卡都吃起来。另外,num_gpu 配置了61,大概14token/s |
ollama底层的推理引擎用的也是llama.cpp吧?应该差不多才对吧 |
没错,但是我感觉 openwebui 调用 ollama 挺慢的,反应没有 llama.cpp 快,不打算评测了,还是学习做企业RAG应用吧。 |
小白提问一下,unsloth里的模型,是否带-UD-的就是dynamically quantized。另外512GB内存的服务器建议选用哪个 |
在两张4090 24G机器上运行DeepSeek-R1-UD-IQ1_M(671B,1.73-bit 动态量化),已经设置了ollama参数,但是运行时候不能调用GPU,每秒低于1 token,有什么解决办法吗?
|
@wwl5600 试试在环境变量中添加:
让Ollama运行时尽量占满GPU,然后重新Ollama run 模型 |
@rodickmini 试了修改这两个环境变量,但还是不行,还是不能调用GPU,感谢~ |
重装下驱动或ollama试试吧
获取Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Weilin Wu ***@***.***>
Sent: Saturday, February 22, 2025 1:42:42 AM
To: snowkylin/snowkylin.github.io ***@***.***>
Cc: CHN-STUDENT ***@***.***>; Comment ***@***.***>
Subject: Re: [snowkylin/snowkylin.github.io] blogs/a-note-on-deepseek-r1 (Issue #8)
@rodickmini<https://github.com/rodickmini> 试了修改这两个环境变量,但还是不行,还是不能调用GPU,感谢~
―
Reply to this email directly, view it on GitHub<#8 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFB2CSDZWMZ2A2ZVRWY5XIL2Q5QRFAVCNFSM6AAAAABWMM5CLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZVGE3TMMRSGM>.
You are receiving this because you commented.Message ID: ***@***.***>
[wwl5600]wwl5600 left a comment (snowkylin/snowkylin.github.io#8)<#8 (comment)>
@rodickmini<https://github.com/rodickmini> 试了修改这两个环境变量,但还是不行,还是不能调用GPU,感谢~
―
Reply to this email directly, view it on GitHub<#8 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFB2CSDZWMZ2A2ZVRWY5XIL2Q5QRFAVCNFSM6AAAAABWMM5CLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZVGE3TMMRSGM>.
You are receiving this because you commented.Message ID: ***@***.***>
|
A Note on DeepSeek R1 Deployment
Your browser does not support the video tag. This is a (minimal) note on deploying DeepSeek R1 671B (the full version without distillation) locally with olla...
https://snowkylin.github.io/blogs/a-note-on-deepseek-r1.html
The text was updated successfully, but these errors were encountered: