Why the compiled of the new llama-gguf-split is way bigger than the old gguf-split #9536

andibuwono · 2024-09-18T10:11:01Z

andibuwono
Sep 18, 2024

Hello all,

Just curious, after building the llama.cpp suite. i notice the gguf-split replaced by llama-gguf-split. But as i saw the compiled binary size is way bigger now (from ~ 3MB to ~66MB), as shown below.

Is there significant changes that make the binary bigger than it was?

Many thanks.

ab

Answered by ggerganov

Sep 19, 2024

I just tested on my Linux box building static binaries with make without CUDA or other backends and the resulting static binaries are much larger than the ones that I get on MacOS:

$ make clean
$ make -j
$ ls -lh

-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-baby-llama
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-batched
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-batched-bench
-rwxrwxr-x  1 ggerganov ggerganov  48M сеп 19 17:47 llama-bench
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:47 llama-benchmark-matmult
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-cli
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-convert-llama2c-…

View full answer

ggerganov · 2024-09-18T11:17:22Z

ggerganov
Sep 18, 2024
Maintainer

No changes have been introduced. Probably some sub-optimal build steps. On MacOS the produced binary with make is just 3.5MB.

6 replies

slaren Sep 19, 2024
Collaborator

If you are using a CUDA build, large binaries are expected. The only way to avoid that is with a shared library.

andibuwono Sep 19, 2024
Author

@slaren i used standard linux build make without any specific compute solutions using Ubuntu 24.04. I might need explore more on the build options. Thanks

slaren Sep 19, 2024
Collaborator

It's the default behavior of the cmake build that you mentioned to before, it's not possible to build shared libraries with the make build. gguf-split needs to link to llama.cpp, and with a static build the llama.cpp library can be very big, since it includes all the CUDA kernels. Alternatively, you can build llama.cpp and gguf-split using only the CPU backend.

ggerganov Sep 19, 2024
Maintainer

I just tested on my Linux box building static binaries with make without CUDA or other backends and the resulting static binaries are much larger than the ones that I get on MacOS:

$ make clean
$ make -j
$ ls -lh

-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-baby-llama
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-batched
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-batched-bench
-rwxrwxr-x  1 ggerganov ggerganov  48M сеп 19 17:47 llama-bench
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:47 llama-benchmark-matmult
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-cli
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-convert-llama2c-to-ggml
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-cvector-generator
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-embedding
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-eval-callback
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-export-lora
-rwxrwxr-x  1 ggerganov ggerganov 3,0M юли 19 15:51 llama-finetune
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-gbnf-validator
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-gen-docs
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:47 llama-gguf
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-gguf-hash
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-gguf-split
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-gritlm
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-imatrix
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-infill
-rwxrwxr-x  1 ggerganov ggerganov  49M сеп 19 17:48 llama-llava-cli
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-lookahead
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-lookup
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-lookup-create
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-lookup-merge
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-lookup-stats
-rwxrwxr-x  1 ggerganov ggerganov  49M сеп 19 17:48 llama-minicpmv-cli
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-parallel
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-passkey
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-perplexity
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:47 llama-q8dot
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-quantize
-rwxrwxr-x  1 ggerganov ggerganov  47M сеп 19 17:47 llama-quantize-stats
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-retrieval
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-save-load-state
-rwxrwxr-x  1 ggerganov ggerganov  58M сеп 19 17:47 llama-server
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-simple
-rwxrwxr-x  1 ggerganov ggerganov  46M сеп 19 17:47 llama-speculative
-rwxrwxr-x  1 ggerganov ggerganov  45M сеп 19 17:47 llama-tokenize
-rwxrwxr-x  1 ggerganov ggerganov 3,0M юли 19 15:51 llama-train-text-from-scratch
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:47 llama-vdot

Then I applied this patch to the Makefile to remove the debug symbols and the size became similar to the MacOS one without applying the patch:

diff --git a/Makefile b/Makefile
index f922f708..6d1d8127 100644
--- a/Makefile
+++ b/Makefile
@@ -337,8 +337,8 @@ ifdef LLAMA_DEBUG
        endif
 else
        MK_CPPFLAGS   += -DNDEBUG
-       MK_CFLAGS     += -O3 -g
-       MK_CXXFLAGS   += -O3 -g
+       MK_CFLAGS     += -O3
+       MK_CXXFLAGS   += -O3
        MK_NVCCFLAGS  += -O3 -g
 endif

$ make clean
$ make -j
$ ls -lh

-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-baby-llama
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-batched
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-batched-bench
-rwxrwxr-x  1 ggerganov ggerganov 3,4M сеп 19 17:50 llama-bench
-rwxrwxr-x  1 ggerganov ggerganov 832K сеп 19 17:50 llama-benchmark-matmult
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-cli
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-convert-llama2c-to-ggml
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-cvector-generator
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-embedding
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-eval-callback
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-export-lora
-rwxrwxr-x  1 ggerganov ggerganov 3,0M юли 19 15:51 llama-finetune
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-gbnf-validator
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-gen-docs
-rwxrwxr-x  1 ggerganov ggerganov 836K сеп 19 17:50 llama-gguf
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-gguf-hash
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-gguf-split
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-gritlm
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-imatrix
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-infill
-rwxrwxr-x  1 ggerganov ggerganov 3,5M сеп 19 17:50 llama-llava-cli
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-lookahead
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-lookup
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-lookup-create
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-lookup-merge
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-lookup-stats
-rwxrwxr-x  1 ggerganov ggerganov 3,5M сеп 19 17:50 llama-minicpmv-cli
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-parallel
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-passkey
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-perplexity
-rwxrwxr-x  1 ggerganov ggerganov 831K сеп 19 17:50 llama-q8dot
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-quantize
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-quantize-stats
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-retrieval
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-save-load-state
-rwxrwxr-x  1 ggerganov ggerganov 4,1M сеп 19 17:50 llama-server
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-simple
-rwxrwxr-x  1 ggerganov ggerganov 3,3M сеп 19 17:50 llama-speculative
-rwxrwxr-x  1 ggerganov ggerganov 3,2M сеп 19 17:50 llama-tokenize
-rwxrwxr-x  1 ggerganov ggerganov 3,0M юли 19 15:51 llama-train-text-from-scratch
-rwxrwxr-x  1 ggerganov ggerganov 832K сеп 19 17:50 llama-vdot

$ ▶ ldd ./llama-gguf-split 
	linux-vdso.so.1 (0x00007ffd4c749000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6b64400000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6b64c2a000)
	libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f6b64be0000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6b64bc0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6b64000000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6b64d2b000)

Not sure why there is a difference between the 2 systems. On MacOS, the -g flag does not significantly affect the final size of the executables, though it reduces the sizes of the intermediate .o object files significantly. For example ./common/arg.o goes from 6.8MB with -g down to 600KB without -g. But then, for the final link step producing the executable, I guess somehow clang automatically discards unused stuff and gcc does not? Sounds weird - I might be missing something.

Answer selected by andibuwono

slaren Sep 19, 2024
Collaborator

It seems to be an Apple thing, they leave the debug info in "dSYM" files instead of linking it into the executable. There are some details here: https://wiki.dwarfstd.org/Apple%27s_%22Lazy%22_DWARF_Scheme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the compiled of the new llama-gguf-split is way bigger than the old gguf-split #9536

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Why the compiled of the new llama-gguf-split is way bigger than the old gguf-split #9536

andibuwono Sep 18, 2024

Replies: 1 comment · 6 replies

ggerganov Sep 18, 2024 Maintainer

slaren Sep 19, 2024 Collaborator

andibuwono Sep 19, 2024 Author

slaren Sep 19, 2024 Collaborator

ggerganov Sep 19, 2024 Maintainer

slaren Sep 19, 2024 Collaborator

andibuwono
Sep 18, 2024

Replies: 1 comment 6 replies

ggerganov
Sep 18, 2024
Maintainer

slaren Sep 19, 2024
Collaborator

andibuwono Sep 19, 2024
Author

slaren Sep 19, 2024
Collaborator

ggerganov Sep 19, 2024
Maintainer

slaren Sep 19, 2024
Collaborator