Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Open
Xxianna opened this issue Feb 20, 2025 · 0 comments
Open

Comments

@Xxianna
Copy link

Xxianna commented Feb 20, 2025

Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
version: 4743 (d07c621)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

e5 2686v4 + 2080ti22g

Models

deepseek r1 1776 q6 (unsloth)

Problem description & steps to reproduce

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512
--model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf
--threads 36 --mmap 0
--numa distribute
-ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes

model size params backend ngl mmap test t/s
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'

But Then

I loss 490GB ram 😭😭😭

Image

Image

First Bad Commit

No response

Relevant log output

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512 \
  --model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf \
  --threads 36 --mmap 0 \
  --numa distribute \
  -ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
| model                          |       size |     params | backend    | ngl | mmap |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | -------------------: |
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant