Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Xxianna · 2025-02-20T05:43:04Z

Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
version: 4743 (d07c621)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

e5 2686v4 + 2080ti22g

Models

deepseek r1 1776 q6 (unsloth)

Problem description & steps to reproduce

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512
--model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf
--threads 36 --mmap 0
--numa distribute
-ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes

model	size	params	backend	ngl	mmap	test	t/s
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'

But Then

I loss 490GB ram 😭😭😭

First Bad Commit

No response

Relevant log output

llamacppb4743cuda/llama-bench -p 128,512 -n 128,512 \
  --model /mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf \
  --threads 36 --mmap 0 \
  --numa distribute \
  -ngl 3

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes
| model                          |       size |     params | backend    | ngl | mmap |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | -------------------: |
main: error: failed to load model '/mnt/fast10k/deepseekr1/q6k1776/r1-1776-Q6_K-00001-of-00012.gguf'

The text was updated successfully, but these errors were encountered:

Xxianna added the bug-unconfirmed label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Xxianna commented Feb 20, 2025

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6 #11965

Comments

Xxianna commented Feb 20, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

But Then

First Bad Commit

Relevant log output