Replies: 5 comments 4 replies
-
That rig will be awesome. If you look at TheBloke's GGUF conversion descriptions (like this one for Llama 2 70B Chat), they say how much RAM is required for different quantization levels. You should be able to run the big models with that RAM with space left over (perhaps for a draft model for speculative sampling when that gets integrated more widely). |
Beta Was this translation helpful? Give feedback.
-
I've recently demonstrated Q8 LLaMAv2 70B running on M2 Ultra 192GB at about ~8 t/s with Metal inference. Note that Metal can access only ~155GB of the total 192GB (more info). You can access all 192GB with the CPU (i.e. without Metal), but this is significantly slower. These are some objective numbers, valid only about In case you are looking for a more subjective and potentially biased opinion, here is mine: The M2 Ultra is the absolute best personal LLM inference node that you can buy today. But again - this is the opinion of an Apple "fanboy", so take it with a grain of salt 😄 |
Beta Was this translation helpful? Give feedback.
-
Have you heard about macweb.com? they have M2 Ultra configurations for testing |
Beta Was this translation helpful? Give feedback.
-
I have a 128 GB M2 Ultra (60 core). I can run Falcon 180B at Q3_K_M quantization (80GB), without needing to resort to patching the VRAM split. With 192 GB, you could probably run up to Falcon 180B @ Q6_K (148 GB). |
Beta Was this translation helpful? Give feedback.
-
Will the inference speed of Mac M2 Ultra be slow after running LLM? |
Beta Was this translation helpful? Give feedback.
-
I'm currently exploring the capabilities of the M2 Ultra and its 192 GB RAM configuration. I've read that it's possible to fit the Llama 2 70B model. However, I'm curious if this is the upper limit or if it's feasible to fit even larger models within this memory capacity.
Any insights or experiences regarding the maximum model size (in terms of parameters) that can comfortably fit within the 192 GB RAM would be greatly appreciated.
Thank you in advance!
edit: 192 GB 😅
Beta Was this translation helpful? Give feedback.
All reactions