Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory retention using either mimalloc v2 or v3 on macOS #1025

Open
h-a-n-a opened this issue Feb 26, 2025 · 12 comments
Open

Excessive memory retention using either mimalloc v2 or v3 on macOS #1025

h-a-n-a opened this issue Feb 26, 2025 · 12 comments

Comments

@h-a-n-a
Copy link

h-a-n-a commented Feb 26, 2025

Rspack is using mimalloc for speeding up the memory allocation. It's is a tool for transforming and bundling basically JavaScript files. According to our test to resolve this issue, we found out, on macOS, there's some strange memory retention as rspack rebuilds. I tried to create a minimal reproducible demo but unluckily they all failed. So I have to put on my local testing demo for reproduction using Rust:

https://github.com/h-a-n-a/rspack-allocation-test

In this demo, an rspack compiler was created to compile JavaScript in 10000 directory. For each build and rebuild, Rspack would trigger tokio-rs to spawn(if not already spawned) a few green threads to drive asynchronous tasks. Then, Rspack would trigger a series of JavaScript module transformations, then optimizations. Finally, assets generated in each build or rebuild will be emitted to the dist file.

During the compilation, the initial memory on macOS would be around 600 MB, and after a few rebuilds, the memory will skyrocket to 1 GB and more. This does not happen on my ubuntu-22.04 or when I was using macOS's system allocator. This does happen on both mimalloc v2 and v3.

I've added some details to help reproduce the issue in the repo and will try my best to create a minimal reproducible demo. Please bear with me.

Looking forward to hearing from you.
Cheers!

@daanx
Copy link
Collaborator

daanx commented Feb 27, 2025

Yikes -- thanks for the report. Strange that it happens with both v2 and v3, and not on ubuntu. Thanks for the repo -- if I find time I will try it out and see. Can you try the following environment settings on the latest dev3-bin branch:

  • MIMALLOC_ARENA_EAGER_COMMIT=0
  • and independently,MIMALLOC_PAGE_COMMIT_ON_DEMAND=1

Also, as an experiment, trying, MIMALLOC_PURGE_DELAY=0 would be interesting (this can slow things down though but would perhaps give us a clue).

@h-a-n-a
Copy link
Author

h-a-n-a commented Mar 3, 2025

@daanx Thanks for the quick reply!

I've tested these options on my macOS. Nothing has been changed in the demo other than changing some value to make it rebuild indefinitely.

Here's some stats I yanked off from top. The result shows memory consumption on branch dev3-bin do accumulate slower than it was while it was on dev2-bin, but it still accumulates as time passes.

With branch dev3-bin:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    633M+
mimalloc-test    635M+
mimalloc-test    639M+
mimalloc-test    666M+
mimalloc-test    700M+
mimalloc-test    704M+
mimalloc-test    706M+
mimalloc-test    706M+
mimalloc-test    719M+
mimalloc-test    737M+
mimalloc-test    738M+
mimalloc-test    738M+
mimalloc-test    738M
mimalloc-test    741M+
mimalloc-test    753M+
mimalloc-test    771M+
mimalloc-test    772M+
mimalloc-test    788M+
mimalloc-test    801M+
mimalloc-test    803M+
mimalloc-test    804M+
mimalloc-test    804M+
mimalloc-test    804M
mimalloc-test    810M+
mimalloc-test    813M+
mimalloc-test    832M+
mimalloc-test    833M+
mimalloc-test    834M+
mimalloc-test    836M+
mimalloc-test    836M+
mimalloc-test    837M+
mimalloc-test    837M+
mimalloc-test    852M+
mimalloc-test    869M+
mimalloc-test    869M+
mimalloc-test    870M+
mimalloc-test    876M+
mimalloc-test    896M+
mimalloc-test    896M
mimalloc-test    896M+
mimalloc-test    896M
mimalloc-test    898M+
mimalloc-test    901M+
mimalloc-test    901M
mimalloc-test    902M+
mimalloc-test    907M+
mimalloc-test    926M+
mimalloc-test    931M+
mimalloc-test    934M+
mimalloc-test    934M+
mimalloc-test    934M-
mimalloc-test    934M+
mimalloc-test    942M+
mimalloc-test    944M+
mimalloc-test    946M+
mimalloc-test    948M+
mimalloc-test    965M+
mimalloc-test    965M+
mimalloc-test    965M

With branch dev2-bin:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    584M+
mimalloc-test    720M+
mimalloc-test    843M+
mimalloc-test    885M+
mimalloc-test    946M+
mimalloc-test    1015M+
mimalloc-test    1122M+
mimalloc-test    1253M+
mimalloc-test    1268M+
mimalloc-test    1303M+
mimalloc-test    1326M+
mimalloc-test    1339M+
mimalloc-test    1344M+
mimalloc-test    1353M+
mimalloc-test    1355M+
mimalloc-test    1358M+
mimalloc-test    1360M+
mimalloc-test    1444M+
mimalloc-test    1485M+
mimalloc-test    1507M+
mimalloc-test    1511M+
mimalloc-test    1526M+
mimalloc-test    1529M+
mimalloc-test    1536M+
mimalloc-test    1556M+
mimalloc-test    1558M+
mimalloc-test    1562M+
mimalloc-test    1567M+
mimalloc-test    1569M+
mimalloc-test    1571M+
mimalloc-test    1634M+
mimalloc-test    1669M+
mimalloc-test    1671M+
mimalloc-test    1691M+
mimalloc-test    1713M+
mimalloc-test    1716M+
mimalloc-test    1717M+
mimalloc-test    1719M+
mimalloc-test    1721M+
mimalloc-test    1724M+
mimalloc-test    1727M+
mimalloc-test    1729M+
mimalloc-test    1731M+
mimalloc-test    1733M+
mimalloc-test    1735M+
mimalloc-test    1753M+
mimalloc-test    1756M+
mimalloc-test    1758M+
mimalloc-test    1762M+
mimalloc-test    1763M+
mimalloc-test    1834M+
mimalloc-test    1841M+
mimalloc-test    1861M+
mimalloc-test    1867M+
mimalloc-test    1886M+
mimalloc-test    1887M+
mimalloc-test    1890M+
mimalloc-test    1921M+
mimalloc-test    1924M+
mimalloc-test    1936M+
mimalloc-test    1938M+

With branch dev3-bin and environment set to MIMALLOC_ARENA_EAGER_COMMIT=0:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    576M+
mimalloc-test    634M+
mimalloc-test    661M+
mimalloc-test    663M+
mimalloc-test    682M+
mimalloc-test    698M+
mimalloc-test    700M+
mimalloc-test    717M+
mimalloc-test    732M+
mimalloc-test    734M+
mimalloc-test    735M+
mimalloc-test    735M+
mimalloc-test    735M
mimalloc-test    740M+
mimalloc-test    740M
mimalloc-test    741M+
mimalloc-test    763M+
mimalloc-test    764M+
mimalloc-test    767M+
mimalloc-test    781M+
mimalloc-test    798M+
mimalloc-test    801M+
mimalloc-test    809M+
mimalloc-test    829M+
mimalloc-test    829M
mimalloc-test    830M+
mimalloc-test    830M+
mimalloc-test    830M+
mimalloc-test    831M+
mimalloc-test    832M+
mimalloc-test    836M+
mimalloc-test    857M+
mimalloc-test    862M+
mimalloc-test    862M+
mimalloc-test    862M+
mimalloc-test    865M+
mimalloc-test    865M
mimalloc-test    865M
mimalloc-test    869M+
mimalloc-test    894M+
mimalloc-test    897M+
mimalloc-test    897M
mimalloc-test    897M
mimalloc-test    897M
mimalloc-test    897M
mimalloc-test    897M+
mimalloc-test    906M+
mimalloc-test    923M+
mimalloc-test    928M+
mimalloc-test    928M+
mimalloc-test    928M
mimalloc-test    928M+
mimalloc-test    928M
mimalloc-test    938M+
mimalloc-test    941M+
mimalloc-test    959M+
mimalloc-test    959M
mimalloc-test    959M+
mimalloc-test    972M+
mimalloc-test    974M+
mimalloc-test    974M
mimalloc-test    974M
mimalloc-test    991M+
mimalloc-test    991M+
mimalloc-test    991M
mimalloc-test    991M+
mimalloc-test    991M
mimalloc-test    992M+
mimalloc-test    1002M+
mimalloc-test    1002M
mimalloc-test    1005M+

With branch dev3-bin and environment set to MIMALLOC_PAGE_COMMIT_ON_DEMAND=1:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    661M
mimalloc-test    681M+
mimalloc-test    691M+
mimalloc-test    692M+
mimalloc-test    693M+
mimalloc-test    697M+
mimalloc-test    711M+
mimalloc-test    728M+
mimalloc-test    731M+
mimalloc-test    731M+
mimalloc-test    733M+
mimalloc-test    751M+
mimalloc-test    751M
mimalloc-test    752M+
mimalloc-test    761M+
mimalloc-test    762M+
mimalloc-test    766M+
mimalloc-test    782M+
mimalloc-test    782M
mimalloc-test    793M+
mimalloc-test    797M+
mimalloc-test    797M+
mimalloc-test    798M+
mimalloc-test    814M+
mimalloc-test    815M+
mimalloc-test    823M+
mimalloc-test    825M+
mimalloc-test    826M+
mimalloc-test    826M
mimalloc-test    827M+
mimalloc-test    831M+
mimalloc-test    831M
mimalloc-test    836M+
mimalloc-test    839M+
mimalloc-test    857M+
mimalloc-test    859M+
mimalloc-test    862M+
mimalloc-test    862M
mimalloc-test    878M+
mimalloc-test    884M+
mimalloc-test    885M+
mimalloc-test    892M+
mimalloc-test    893M+
mimalloc-test    893M
mimalloc-test    903M+
mimalloc-test    903M+
mimalloc-test    908M+
mimalloc-test    925M+
mimalloc-test    925M
mimalloc-test    925M+
mimalloc-test    927M+
mimalloc-test    927M+
mimalloc-test    927M+
mimalloc-test    951M+
mimalloc-test    958M+
mimalloc-test    958M+
mimalloc-test    958M+
mimalloc-test    971M+
mimalloc-test    971M
mimalloc-test    988M+
mimalloc-test    988M+

With branch dev3-bin and environment set to MIMALLOC_PURGE_DELAY=0:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    548M+
mimalloc-test    629M+
mimalloc-test    644M+
mimalloc-test    662M+
mimalloc-test    664M+
mimalloc-test    666M+
mimalloc-test    696M+
mimalloc-test    698M+
mimalloc-test    699M+
mimalloc-test    700M+
mimalloc-test    718M+
mimalloc-test    731M+
mimalloc-test    731M+
mimalloc-test    733M+
mimalloc-test    734M+
mimalloc-test    745M+
mimalloc-test    746M+
mimalloc-test    746M+
mimalloc-test    762M+
mimalloc-test    763M+
mimalloc-test    766M+
mimalloc-test    778M+
mimalloc-test    779M+
mimalloc-test    798M+
mimalloc-test    798M+
mimalloc-test    798M
mimalloc-test    798M
mimalloc-test    815M+
mimalloc-test    823M+
mimalloc-test    823M+
mimalloc-test    823M+
mimalloc-test    827M+
mimalloc-test    828M+
mimalloc-test    830M+
mimalloc-test    831M+
mimalloc-test    831M-
mimalloc-test    841M+
mimalloc-test    841M+
mimalloc-test    841M
mimalloc-test    858M+
mimalloc-test    858M+
mimalloc-test    863M+
mimalloc-test    864M+
mimalloc-test    864M
mimalloc-test    864M
mimalloc-test    864M
mimalloc-test    864M
mimalloc-test    872M+
mimalloc-test    894M+
mimalloc-test    894M
mimalloc-test    895M+
mimalloc-test    895M
mimalloc-test    896M+
mimalloc-test    896M
mimalloc-test    897M+
mimalloc-test    903M+
mimalloc-test    928M+
mimalloc-test    928M
mimalloc-test    929M+
mimalloc-test    929M
mimalloc-test    941M+
mimalloc-test    959M+
mimalloc-test    959M
mimalloc-test    959M
mimalloc-test    961M+
mimalloc-test    961M-
mimalloc-test    961M+
mimalloc-test    961M+
mimalloc-test    962M+
mimalloc-test    962M
mimalloc-test    974M+
mimalloc-test    991M+
mimalloc-test    991M+
mimalloc-test    991M
mimalloc-test    991M
mimalloc-test    991M+
mimalloc-test    992M+
mimalloc-test    992M
mimalloc-test    992M+
mimalloc-test    995M+
mimalloc-test    995M
mimalloc-test    995M+
mimalloc-test    1003M+
mimalloc-test    1023M+
mimalloc-test    1023M
mimalloc-test    1023M

@daanx
Copy link
Collaborator

daanx commented Mar 3, 2025

Good to see v3 does much better as v2; I guess because it doesn't occur on Linux it must be something system specific, like an allocation in a thread that is about to be terminated (reinitializing the heap and leave it orphaned). Not sure. I'll try to repro when I find some time.

@daanx
Copy link
Collaborator

daanx commented Mar 3, 2025

I tried to compile from your repo (awesome that you constructed this! maybe we can use it a standard benchmark in the future).
I first got an error to use nightly, so I use cargo +nightly build --release but now I get:

error[E0599]: no method named `get_many_mut` found for struct `HashMap` in the current scope
   --> /Users/daan/.cargo/git/checkouts/rspack-c7c50c913aba6932/a04609d/crates/rspack_collections/src/ukey.rs:156:16
    |
156 |     self.inner.get_many_mut(ids)
    |                ^^^^^^^^^^^^
    |
help: there is a method `get_mut` with a similar name
    |
156 -     self.inner.get_many_mut(ids)
156 +     self.inner.get_mut(ids)
    |

For more information about this error, try `rustc --explain E0599`.
error: could not compile `rspack_collections` (lib) due to 1 previous error

Can you help?

@h-a-n-a
Copy link
Author

h-a-n-a commented Mar 4, 2025

I tried to compile from your repo (awesome that you constructed this! maybe we can use it a standard benchmark in the future). I first got an error to use nightly, so I use cargo +nightly build --release but now I get:

error[E0599]: no method named `get_many_mut` found for struct `HashMap` in the current scope
   --> /Users/daan/.cargo/git/checkouts/rspack-c7c50c913aba6932/a04609d/crates/rspack_collections/src/ukey.rs:156:16
    |
156 |     self.inner.get_many_mut(ids)
    |                ^^^^^^^^^^^^
    |
help: there is a method `get_mut` with a similar name
    |
156 -     self.inner.get_many_mut(ids)
156 +     self.inner.get_mut(ids)
    |

For more information about this error, try `rustc --explain E0599`.
error: could not compile `rspack_collections` (lib) due to 1 previous error

Can you help?

Turned out the rust lang team had changed this API in the latest nightly release. I've pushed a new edit to add a rust-toolchain.toml that locks the rust toolchain to a specific version. Would you please pull and run cargo build --release again?

You can also check out the active rust toolchain in the repo directory with command rustup show.

...

active toolchain
----------------
name: nightly-2024-11-27-aarch64-apple-darwin
active because: overridden by '/path-to-rspack-allocation-test/rust-toolchain.toml'
installed targets:
  aarch64-apple-darwin
  x86_64-pc-windows-msvc

@daanx
Copy link
Collaborator

daanx commented Mar 5, 2025

Thanks -- I got it running locally; but it doesn't quire reproduce. I set it to 100 iterations using the latest dev3 branch. The memory usage is much lower though than yours (around 280MiB at peak). Is that expected? Secondly, I couldn't quite reproduce the behaviour as it gets stable after about 30 to 50 iterations (from ~180MiB rss to ~280MiB rss). This is on an Apple M1, Sequoia 15.3 with carge build --release with the latest dev3.

When I set MIMALLOC_PURGE_DELAY=0 I see that there are about 400 threads, and the memory looks like (using mi_arenas_print available in the latest dev3):

Image

Here I see lots of low use pages (the red P's) which I guess belong to many not often used threads. Over each iteration the heap keeps looking like this with around 9 chunks in use, with sometimes some large singleton objects (like the final green s page which is about 5 MiB I guess). Maybe the initial growth is due to the threadpool like nature of tokio where per-thread pages get slowly used a bit more depending on the tasks that happen to execute no them -- but in the end it stabilizes? Maybe not, as you remarked you didn't see this on Linux. Is the benchmark reading from disk? Are you writing to a log file in that same directory?

Maybe I need the larger workload that you observed of 800MiB+ -- let me know how to do that.

ps. with no options on a release build (with latest dev3, 100 iterations):

$ top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    257M 
mimalloc-test    257M- 
mimalloc-test    257M+ 
mimalloc-test    257M  
mimalloc-test    257M  
mimalloc-test    257M- 
mimalloc-test    257M  
mimalloc-test    257M  
mimalloc-test    257M  
mimalloc-test    257M  
mimalloc-test    257M+ 
mimalloc-test    257M  
...
mimalloc-test    287M  
mimalloc-test    288M+ 
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M+ 
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    288M+ 
mimalloc-test    288M  
mimalloc-test    288M  
mimalloc-test    286M- 
mimalloc-test    286M+ 
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  
mimalloc-test    286M  

@chenjiahan
Copy link

I can also reproduce the problem by following the steps in this demo.

  • environment: Apple M3 Max / 15.3.1
  • mimalloc branch: dev3
  • command: carge build --release

Result:

top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test    643M+ 
mimalloc-test    672M+ 
mimalloc-test    676M+ 
mimalloc-test    679M+ 
mimalloc-test    681M+ 
mimalloc-test    693M+ 
mimalloc-test    712M+ 
mimalloc-test    717M+ 
mimalloc-test    735M+ 
mimalloc-test    746M+ 
mimalloc-test    746M  
mimalloc-test    762M+ 
mimalloc-test    762M+ 
mimalloc-test    779M+ 
mimalloc-test    780M+ 
mimalloc-test    780M+ 
mimalloc-test    780M+ 
mimalloc-test    782M+ 
mimalloc-test    784M+ 
mimalloc-test    784M+ 
mimalloc-test    802M+ 
mimalloc-test    802M+ 
mimalloc-test    831M+ 
mimalloc-test    832M+ 
mimalloc-test    832M+ 
mimalloc-test    833M+ 
mimalloc-test    833M  
mimalloc-test    834M+ 
mimalloc-test    834M+ 
mimalloc-test    834M+ 
mimalloc-test    842M+ 
mimalloc-test    842M  
mimalloc-test    843M+ 
mimalloc-test    843M+ 
mimalloc-test    843M  
mimalloc-test    845M+ 
mimalloc-test    845M+ 
mimalloc-test    845M+ 
mimalloc-test    845M  
mimalloc-test    845M+ 
mimalloc-test    848M+ 
mimalloc-test    860M+ 
mimalloc-test    876M+ 
mimalloc-test    877M+ 
mimalloc-test    877M  
mimalloc-test    877M  
mimalloc-test    878M+ 
mimalloc-test    878M  
mimalloc-test    878M  
mimalloc-test    879M+ 
mimalloc-test    879M  
mimalloc-test    883M+ 
mimalloc-test    910M+ 
mimalloc-test    910M  
mimalloc-test    910M  
mimalloc-test    910M  
mimalloc-test    910M  
mimalloc-test    910M  
mimalloc-test    940M+ 
mimalloc-test    940M  
mimalloc-test    956M+ 

@h-a-n-a
Copy link
Author

h-a-n-a commented Mar 5, 2025

The memory usage is much lower though than yours (around 280MiB at peak). Is that expected?

Unfortunately, this is not expected.

I forgot to mention to install node modules in the directory 10000. This might be related to the low memory consumption in your reproduction. I've modified the README.md. You might need to install node on the machine and call pnpm install to install node modules. My edit to README.md: h-a-n-a/rspack-allocation-test@ecf28cb. Then, calling cargo run --release should emit no error.

Is the benchmark reading from disk?

The benchmark contains heavy operation of reading files from disk (in directory 10000) recursively. Doing this would invoke a blocking thread (this type of thread used only for reading files in rspack) of tokio to send filesystem file read command. It will keep alive for the default timeout of 10 seconds (i.e. the thread_keep_alive option)

Are you writing to a log file in that same directory?

We don't normally write log files. The only file writing operation is called at the end of each rebuild. This is driven by the same blocking thread of tokio.

@daanx
Copy link
Collaborator

daanx commented Mar 6, 2025

I got it running now -- it still uses much less memory but I can see it grow now, from about 240MiB to 480MiB after 100 iterations. One thing I noticed was that every iteration an extra thread(s) is created, it starts at 339 threads and increases to 526 threads after 100 iterations. I will test on Linux too to see if that happens there too. I guess that extra thread is the issue? edit: on linux the threads stay constant at 199 (but uses more memory and goes much faster?)

Question: how can I configure the project to use the standard macOS allocator?

@h-a-n-a
Copy link
Author

h-a-n-a commented Mar 6, 2025

One thing I noticed was that every iteration an extra thread(s) is created, it starts at 339 threads and increases to 526 threads after 100 iterations. I will test on Linux too to see if that happens there too. I guess that extra thread is the issue?

I did some tests on my local machine and found out the thread count and the memory consumption are indeed related and intertwined. In my opinion, it's strongly linked to these two configurations passed to tokio-rs:

  • max_blocking_thread: Specifies the limit for additional threads spawned by the Runtime. (i.e. some fs operations)
  • thread_keep_alive: Sets a custom timeout for a thread in the blocking pool.

Tuning either of the option will lower the memory usage.

For example, if I set max_blocking_thread to 8, which is the default blocking thread count in rspack now, it avoids tokio-rs from create as much blocking threads as they might like to. This decreases the memory.

Even for mimalloc v2.1.7, which consumes more memory than v3 (dev3-bin), memory usage is way better than it was:

# Mimalloc v2.1.7 with `max_blocking_thread` set to 8
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test    33/1   2054M+ 616M+  618M+
mimalloc-test    33     2062M+ 677M+  679M+
mimalloc-test    33     2070M+ 718M+  720M+
mimalloc-test    33/8   2070M  726M+  728M+
mimalloc-test    33     2086M+ 737M+  739M+
mimalloc-test    33     2214M+ 759M+  761M+
mimalloc-test    33/1   2222M+ 764M+  766M+
mimalloc-test    33     2222M  767M+  769M+
mimalloc-test    33     2222M  780M+  782M+
mimalloc-test    33/19  2222M  780M+  782M+
mimalloc-test    33     2222M  783M+  785M+
mimalloc-test    33/1   2222M  785M+  787M+
mimalloc-test    34/2   2224M+ 790M+  792M+
mimalloc-test    33     2222M- 792M+  794M+
mimalloc-test    33/20  2222M  792M+  794M+
mimalloc-test    33     2222M  792M+  794M+
mimalloc-test    33/1   2222M  793M+  795M+
mimalloc-test    33/20  2222M  794M+  796M+
mimalloc-test    33     2222M  794M+  796M+
mimalloc-test    33/1   2222M  795M+  797M+
mimalloc-test    34/2   2224M+ 795M+  797M+
mimalloc-test    33     2222M- 797M+  799M+
mimalloc-test    33/5   2222M  797M+  799M+
mimalloc-test    34/2   2224M+ 797M+  799M+

# Mimalloc v2.1.7 with `max_blocking_thread` set to default (512)
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test    272/1  10G+   611M+  617M+
mimalloc-test    272    10G+   706M+  712M+
mimalloc-test    272/1  10G+   761M+  767M+
mimalloc-test    272    10G    790M+  796M+
mimalloc-test    272/1  10G    819M+  825M+
mimalloc-test    272    10G    826M+  832M+
mimalloc-test    272    10G    832M+  838M+
mimalloc-test    272/40 10G    837M+  844M+
mimalloc-test    272    10G    839M+  845M+
mimalloc-test    272    10G    845M+  851M+
mimalloc-test    272/3  10G    846M+  852M+
mimalloc-test    272    10G    850M+  856M+
mimalloc-test    272/20 10G    862M+  868M+
mimalloc-test    284    10G+   919M+  925M+
mimalloc-test    284    10G+   924M+  931M+
mimalloc-test    284/56 10G    927M+  933M+
mimalloc-test    299    11G+   1005M+ 1012M+
mimalloc-test    299/16 11G+   1025M+ 1031M+
mimalloc-test    299    11G    1026M+ 1033M+
mimalloc-test    317    11G+   1081M+ 1088M+
mimalloc-test    317/1  11G    1086M+ 1093M+
mimalloc-test    317    12G+   1105M+ 1112M+
mimalloc-test    317/53 12G    1108M+ 1114M+
mimalloc-test    317    12G    1109M+ 1116M+

The result for mimalloc-v3 (dev3-bin):

# Mimalloc v3 with `max_blocking_thread` set to 8
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test    33     1443M+ 605M+  607M+
mimalloc-test    33/21  1442M- 615M+  616M+
mimalloc-test    33     1442M+ 615M+  617M+
mimalloc-test    33     1442M  618M+  619M+
mimalloc-test    33/20  1442M  618M+  619M+
mimalloc-test    33     1442M  621M+  623M+
mimalloc-test    33/12  1442M  622M+  623M+
mimalloc-test    33     1442M  622M+  623M+
mimalloc-test    33     1442M  622M   623M
mimalloc-test    33/18  1442M  622M+  623M+
mimalloc-test    33     1450M+ 622M+  624M+
mimalloc-test    33/12  1450M  622M+  624M+
mimalloc-test    33     1450M  622M+  624M+
mimalloc-test    33/12  1450M  623M+  625M+
mimalloc-test    33     1458M+ 623M+  625M+
mimalloc-test    33/1   1458M  623M+  625M+
mimalloc-test    33     1458M  623M+  625M+
mimalloc-test    33     1458M  623M+  625M+
mimalloc-test    33/3   1458M  623M+  625M+
mimalloc-test    33     1458M  623M+  625M+
mimalloc-test    33/13  1458M  623M   625M
mimalloc-test    33     1458M  623M   625M
mimalloc-test    33     1458M  623M+  625M+
mimalloc-test    33/20  1458M  623M+  625M+
mimalloc-test    33     1458M  623M   625M
mimalloc-test    33/12  1458M  623M   625M
mimalloc-test    33     1458M  624M+  625M+
mimalloc-test    33     1458M  624M+  625M+
mimalloc-test    33/20  1458M  624M+  625M+
mimalloc-test    33     1458M  624M+  625M+
mimalloc-test    33/12  1458M  624M+  625M+
mimalloc-test    33     1586M+ 640M+  642M+
mimalloc-test    33     1586M  640M+  642M+
mimalloc-test    33/17  1586M  640M   642M
mimalloc-test    33     1586M  640M+  642M+
mimalloc-test    33/1   1586M  644M+  646M+


# Mimalloc v3 with `max_blocking_thread` set to default (512)
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test    209/55 1816M+ 563M+  565M+
mimalloc-test    227    1980M+ 628M+  630M+
mimalloc-test    227/29 1981M+ 630M+  632M+
mimalloc-test    243    2013M+ 651M+  653M+
mimalloc-test    263/19 2054M+ 651M+  653M+
mimalloc-test    263    2054M  653M+  655M+
mimalloc-test    263/1  2054M  653M+  655M+
mimalloc-test    263    2054M  663M+  665M+
mimalloc-test    264/2  2056M+ 663M+  666M+
mimalloc-test    270    2068M+ 673M+  675M+
mimalloc-test    270    2068M  674M+  676M+
mimalloc-test    270    2068M  694M+  696M+
mimalloc-test    270    2068M  696M+  698M+
mimalloc-test    270    2068M  696M+  698M+
mimalloc-test    276    2080M+ 696M+  698M+
mimalloc-test    276    2080M  701M+  703M+
mimalloc-test    276    2080M  701M+  703M+
mimalloc-test    276/12 2080M  701M+  703M+
mimalloc-test    276    2080M  701M+  703M+
mimalloc-test    276/1  2080M  705M+  707M+
mimalloc-test    276    2080M  722M+  724M+
mimalloc-test    276/1  2080M  728M+  730M+
mimalloc-test    276    2080M  731M+  733M+
mimalloc-test    296/12 2121M+ 732M+  734M+
mimalloc-test    323    2176M+ 732M+  734M+
mimalloc-test    370/12 2271M+ 739M+  741M+
mimalloc-test    370    2399M+ 764M+  766M+
mimalloc-test    370/16 2399M  764M+  766M+
mimalloc-test    370    2399M  764M   766M
mimalloc-test    370/8  2399M  764M   766M
mimalloc-test    370    2399M  764M   766M
mimalloc-test    370/19 2399M  765M+  768M+
mimalloc-test    370    2399M  770M+  773M+
mimalloc-test    370/39 2399M  790M+  793M+

Memory still goes up pretty quick on larger workload though.

To adjust these two options, you may go to src/main.rs and add the option to the tokio runtime initialization:

    let rt = tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .max_blocking_thread(8)
        .thread_keep_alive(Duration::from_millis(1_000))
        .build()
        .unwrap();

on linux the threads stay constant at 199 (but uses more memory and goes much faster?)

IIRC, mine test on linux gets stable at around 550~MB. I think this is totally fine if it's not increasing.


how can I configure the project to use the standard macOS allocator?

Just comment out these two lines at the top of src/main.rs and recompile.

#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;

@daanx
Copy link
Collaborator

daanx commented Mar 6, 2025

Thanks! It is good to see that v3 does much better as it was essentially redesigned to deal with threadpools better.
Just to clarify, the threads on Linux stay constant under 200 (and memory growth stabilizes). Btw. in a threadpool like scenario we expect an initial growth of memory as each thread retains some owned memory but it should level out after a while.

Maybe the creation of the new threads that I see on macOS is the root cause of the memory growth (where there as an increasing amount of threads that are never terminated -- maybe due to mimalloc? maybe not?).

@h-a-n-a
Copy link
Author

h-a-n-a commented Mar 11, 2025

Thanks for the reply!

Maybe the creation of the new threads that I see on macOS is the root cause of the memory growth (where there as an increasing amount of threads that are never terminated -- maybe due to mimalloc? maybe not?).

That's weird. If I set max_blocking_thread to 8 with mimalloc-v3, the thread count gets stable at 33 and the memory is still growing over time. I've uploaded a large workload with 20000 modules, but it still does not quite reproduce the memory growth in a large scale even though it's growing quite slowly. There's a big workload in my company project that I'm unable to share:

For each rebuild, it consumed and retained 300 MiB memory.

$ top -stats pid,command,threads,vprvt,rprvt,mem -r | grep -i 5732
5732   node             50     5353M 2117M 2286M
5732   node             52/1   3805M- 1842M- 2296M+
5732   node             52     4583M+ 1985M+ 2296M+
5732   node             52     4583M  1985M  2296M
5732   node             51     4582M- 1894M- 2152M-
5732   node             50/1   4524M- 1857M- 2110M-
5732   node             50     4524M  1857M+ 2110M+
5732   node             50/1   4524M  1857M  2110M+
5732   node             50     4524M  1857M  2110M+
5732   node             50/1   4524M  1857M  2110M+
5732   node             50     4524M  1857M  2110M+
5732   node             50     4524M  1857M+ 2111M+
5732   node             50/1   4524M  1857M  2111M+
5732   node             50     4524M  1857M  2111M+
5732   node             50     4524M  1857M  2111M+
5732   node             51/3   4509M- 1773M- 2067M-
5732   node             51/13  4309M- 1832M+ 2167M+
5732   node             51/2   4688M+ 2168M+ 2498M+
5732   node             51/7   4633M- 2356M+ 2766M+
5732   node             51/2   5171M+ 2460M+ 2833M+
5732   node             51/13  5209M+ 2537M+ 2919M+
5732   node             51/2   5869M+ 2380M- 2773M-
5732   node             51/1   5886M+ 2392M+ 2786M+
5732   node             51/1   5886M  2392M  2786M
5732   node             51/1   5886M  2392M  2786M
5732   node             51/1   5886M  2392M  2786M
5732   node             50/1   5862M- 2364M- 2723M-
5732   node             50     5851M- 2355M- 2712M-  ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ First build finished
5732   node             50     5851M  2355M  2712M+
5732   node             50     5851M  2355M  2712M+
5732   node             50     5851M  2355M  2712M+
5732   node             50/1   5851M  2355M  2712M+
5732   node             50/1   5851M  2355M  2712M+
5732   node             50     5851M  2355M  2712M+
5732   node             50/1   5851M  2355M  2712M+
5732   node             50     5851M  2355M+ 2712M+
5732   node             50/1   5851M  2355M+ 2712M+
5732   node             50     5851M  2355M  2712M+
5732   node             50/1   5851M  2355M  2712M+
5732   node             50     5851M  2355M  2712M+
5732   node             50/1   5851M  2355M  2712M+
5732   node             51/14  5972M+ 2266M- 2666M-
5732   node             51/1   5958M- 2349M+ 2750M+
5732   node             52/3   5624M- 2553M+ 3005M+
5732   node             52/3   6091M+ 2794M+ 3236M+
5732   node             52/2   6124M+ 2787M- 3208M-
5732   node             52/13  6128M+ 2808M+ 3243M+
5732   node             52/1   6295M+ 2723M- 3188M-
5732   node             52/1   6286M- 2663M- 3128M-
5732   node             52     6294M+ 2667M+ 3128M
5732   node             52     6294M  2667M  3128M
5732   node             52     6294M  2667M+ 3129M+
5732   node             50     6239M- 2611M- 3035M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Second build finished
5732   node             50/1   6239M- 2611M+ 3035M+
5732   node             50/1   6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3035M
5732   node             50     6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3036M+
5732   node             50/1   6239M  2611M  3036M+
5732   node             50/1   6239M  2611M  3035M-
5732   node             50/1   6239M  2611M  3035M+
5732   node             50/1   6239M  2611M  3035M+
5732   node             50     6239M  2611M  3035M+
5732   node             51/5   6194M- 2565M- 3002M-
5732   node             51/22  6036M- 2656M+ 3104M+
5732   node             52/13  6348M+ 2886M+ 3350M+
5732   node             52/2   6288M- 3105M+ 3587M+
5732   node             52/1   6402M+ 3065M- 3545M-
5732   node             52/13  6394M- 3073M+ 3556M+
5732   node             52     6303M- 3013M- 3472M-
5732   node             52/1   6292M- 2946M- 3413M-
5732   node             52     6300M+ 2950M+ 3413M
5732   node             52/1   6300M  2950M  3413M
5732   node             52/1   6300M  2950M  3413M
5732   node             52     6259M- 2902M- 3329M-
5732   node             50     6245M- 2897M- 3322M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Third build finished
5732   node             50     6245M  2897M+ 3322M+
5732   node             50/1   6245M  2897M  3322M+
5732   node             50     6245M  2897M  3322M+
5732   node             50/1   6245M  2897M  3322M+
5732   node             50/1   6245M  2897M  3323M+
5732   node             50/1   6245M  2897M+ 3323M+
5732   node             50     6245M  2897M  3323M+
5732   node             50/1   6245M  2897M  3323M+
5732   node             50     6245M  2897M  3323M+
5732   node             50/1   6245M  2897M  3323M+
5732   node             50/1   6245M  2897M+ 3323M+
5732   node             50     6245M  2897M  3323M+
5732   node             50     6245M  2897M  3323M+
5732   node             50     6245M  2897M  3323M-
5732   node             50/1   6245M  2897M  3323M+
5732   node             53/15  6253M+ 2895M- 3313M-
5732   node             52/3   6356M+ 2909M+ 3372M+
5732   node             52/3   5970M- 3020M+ 3515M+
5732   node             52/2   6039M+ 3249M+ 3754M+
5732   node             52/2   6361M+ 3444M+ 3957M+
5732   node             52/1   6842M+ 3391M- 3957M-
5732   node             52/1   6956M+ 3492M+ 4066M+
5732   node             52/2   6805M- 3228M- 3790M-
5732   node             52/1   6795M- 3234M+ 3797M+
5732   node             52     6803M+ 3234M+ 3797M
5732   node             52     6803M  3235M+ 3798M+
5732   node             52     6803M  3235M  3798M
5732   node             50/6   6780M- 3209M- 3737M-
5732   node             50     6769M- 3202M- 3729M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Forth build finished
5732   node             50     6769M  3202M  3729M+
5732   node             50     6769M  3202M  3729M+
5732   node             50     6769M  3202M  3729M+
5732   node             50/1   6769M  3202M  3729M+
5732   node             50     6769M  3202M  3729M+
5732   node             50/1   6769M  3202M  3729M+
5732   node             50     6769M  3202M+ 3729M+
5732   node             50/1   6769M  3202M  3729M+
5732   node             50     6769M  3202M+ 3729M+
5732   node             50/1   6769M  3203M+ 3729M+

So... do you know if there's any recommend way to debug this? For example, checking out the thread of which piece of memory was being hold off from releasing. Or do you know if there's anything should be checked out for me locally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants