Increasing performance of 65B llama #369
SpeedyCraftah
started this conversation in
General
Replies: 2 comments 2 replies
-
64GB should make a big difference. With 32GB it is likely swapping on the hard disk all the time |
Beta Was this translation helpful? Give feedback.
1 reply
-
This doesn't answer your question, but I think the 65B parameter model isn't worth it - I'm getting much better responses from the 30B model. Could just be that more training/tuning is needed. Their paper says the 65B model is better - I just haven't seen it. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am wondering if there is a way of potentially speeding up the 65B llama model? I have 32GB ram at 3600mhz with a ryzen 7 5800x and it's maxing both out (ryzen 7 maxed out probably because of RAM swap) and it generates tokens at a rate of 2-3 minutes per token with my RAM being maxed out.
I am running on windows 11 but thought to perhaps try to get Linux on a flash drive and attempt to run the model from there to see if there's an improvement? Since linux probably handles swap a little better as well as using much less RAM than windows.
Beta Was this translation helpful? Give feedback.
All reactions