Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-update: mkDebounce leaking the thread #980

Open
Yuras opened this issue Mar 21, 2024 · 11 comments
Open

auto-update: mkDebounce leaking the thread #980

Yuras opened this issue Mar 21, 2024 · 11 comments

Comments

@Yuras
Copy link

Yuras commented Mar 21, 2024

mkDebounce internally forks a thread, but never kills it. To reproduce, run the following with +RTS -s -RTS and observe memory usage.

import Control.Debounce
import Control.Monad

main :: IO ()
main = forM_ [(0::Int) .. 100000] $ \_ -> do
  r <- mkDebounce defaultDebounceSettings
  r

Output:

     250,129,600 bytes allocated in the heap
     417,814,184 bytes copied during GC
      47,018,288 bytes maximum residency (8 sample(s))
       3,325,648 bytes maximum slop
             132 MiB total memory in use (0 MB lost due to fragmentation)

132MiB memory is too much for the above program.

As a possible solution, consider attaching a finalizer to the debounced action that will kill the thread. I'd be happy to prepare a patch if you agree with such a solution.
Other possibility is to use a closable channel instead of MVar. I didn't try it though.

A bit of context: fast-logger uses mkDebounce to flush logs, so each logger comes with a thread which never exits.

Thanks!

@Vlix
Copy link
Contributor

Vlix commented May 21, 2024

I wonder how this could be fixed.

I guess mkDebounce could also provide a finalizer?

Or create a function that captures its own MVar and doesn't need to fork a thread?

mkDebounce :: IO () -> IO (IO ())
mkDebounce action = do
  baton <- newMVar ()
  pure $ do
    mBaton <- tryTakeMVar baton
    case mBaton of
      Nothing -> pure ()
      Just () -> do
        -- handle Leading/Trailing
        -- and `putMVar ()` when done with action + delay

@kazu-yamamoto would this idea be an ok replacement for Control.Debounce.mkDebounce? Or do you see any issues with it? (a finally putMVar might also be needed, come to think of it)

@kazu-yamamoto
Copy link
Contributor

This module was created by @snoyberg.
I don't have strong opinions but yes, cleaning up threads is a MUST.

@Vlix Vlix mentioned this issue Jun 8, 2024
3 tasks
@Vlix
Copy link
Contributor

Vlix commented Oct 29, 2024

Will try to reproduce with my PR #996 at the earliest convenience.

@kazu-yamamoto
Copy link
Contributor

Note that the recent GHC has GHC.Conc.Sync.listThreads.
Using it with lableThreads, we can check if the clean up works well.
Now "label" can be passwd to auto-update stuffs via fooThreadName.

@Vlix
Copy link
Contributor

Vlix commented Oct 30, 2024

I'll add listThreads to my reproduction.
I don't know what you mean with "passing 'label' to 'auto-update stuffs'", though 🤔

@kazu-yamamoto
Copy link
Contributor

See updateThreadName.

@Yuras
Copy link
Author

Yuras commented Nov 1, 2024

There is a high change that the issue is actually caused by labelThread, see https://gitlab.haskell.org/ghc/ghc/-/issues/23949

The reproducer in the issue description is probably wrong, it's not letting the threads chance to receive BlockedIndefinitelyOnMVar. There should be threadDelay and performMajorGC somewhere in the loop.

@Vlix
Copy link
Contributor

Vlix commented Nov 1, 2024

There is a high change that the issue is actually caused by labelThread, see https://gitlab.haskell.org/ghc/ghc/-/issues/23949

The reproducer in the issue description is probably wrong, it's not letting the threads chance to receive BlockedIndefinitelyOnMVar. There should be threadDelay and performMajorGC somewhere in the loop.

labelThread was not added in the debouncing function(s) until auto-update-0.2.2, which was released about a month ago, so unless the labelThread issue was always a problem for literally every fork, even if the labelThread wasn't called. This would still be an issue.

@Vlix
Copy link
Contributor

Vlix commented Nov 2, 2024

I've ran OP's example on master with GHC 9.6.6 and GHC 9.2.8 and found no significant difference.
(They were both always above 100MB total memory in use, and would also sometimes result in the 130~ish MB the OP showed.)

And I've ran it with my PR #996 on GHC 9.6.6, which used a lot less memory, and also found that when I added a print statement in the debounce action, the printing would stagger and lag with the current code, but with my PR it was a lot smoother and just flooded my screen as fast as possible, instead of having hiccups and freezes.

Average results from trying the OP's example
-------------------------------
`master` BRANCH GHC 9.6.6
-------------------------------
     132,844,528 bytes allocated in the heap
     354,407,584 bytes copied during GC
      37,886,632 bytes maximum residency (6 sample(s))
       4,250,408 bytes maximum slop
             116 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        29 colls,     0 par    0.061s   0.061s     0.0021s    0.0033s
  Gen  1         6 colls,     0 par    0.071s   0.071s     0.0118s    0.0166s

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    9.728s  ( 10.614s elapsed)
  GC      time    0.132s  (  0.132s elapsed)
  EXIT    time    0.007s  (  0.014s elapsed)
  Total   time    9.868s  ( 10.760s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    13,655,472 bytes per MUT second

  Productivity  98.6% of total user, 98.6% of total elapsed

-------------------------------
`master` BRANCH GHC 9.2.8
-------------------------------
     253,499,096 bytes allocated in the heap
     378,401,504 bytes copied during GC
      37,405,288 bytes maximum residency (8 sample(s))
       4,423,064 bytes maximum slop
             113 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        52 colls,     0 par    0.064s   0.064s     0.0012s    0.0028s
  Gen  1         8 colls,     0 par    0.078s   0.078s     0.0098s    0.0149s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    9.738s  (  9.738s elapsed)
  GC      time    0.143s  (  0.143s elapsed)
  EXIT    time    0.266s  (  0.266s elapsed)
  Total   time   10.148s  ( 10.148s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    26,031,989 bytes per MUT second
  Productivity  96.0% of total user, 96.0% of total elapsed

-------------------------------
NEW (#996) DEBOUNCE GHC 9.6.6
-------------------------------

     641,195,544 bytes allocated in the heap
     433,286,048 bytes copied during GC
      14,834,760 bytes maximum residency (15 sample(s))
       2,876,344 bytes maximum slop
              48 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       146 colls,     0 par    0.063s   0.063s     0.0004s    0.0015s
  Gen  1        15 colls,     0 par    0.064s   0.064s     0.0043s    0.0062s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    9.412s  (  9.389s elapsed)
  GC      time    0.127s  (  0.127s elapsed)
  EXIT    time    0.278s  (  0.284s elapsed)
  Total   time    9.817s  (  9.801s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    68,126,982 bytes per MUT second

  Productivity  95.9% of total user, 95.8% of total elapsed

@Vlix
Copy link
Contributor

Vlix commented Nov 2, 2024

Now some tests with listThreads.

The master branch kept about 57000 threads active if the debounceAction was only pure (). It kept 47000 active if the debounceAction had a threadDelay 1_000_000 and this goes down the more the threadDelay increases... not entirely sure why. (See Addendum 1)

The #996 way of doing debouncing still has 11000~ish threads open after all that, and this goes down by a bit with a 1s threadDelay as the debounceAction, but somehow the amount of threads open at the end increases again if the debounceAction is set to 2s threadDelay? 🤔 (See Addendum 2)

This might have to do with the GC taking longer to recognize the threads can be cleaned up? Or something like that? I honestly don't have too much knowledge about the GHC runtime to know how it cleans up green threads, and how accurate listThreads is.
But all in all, #996 does seem like an improvement 🤷

Source code that was run:
{-# LANGUAGE NumericUnderscores #-}
import Control.Concurrent (threadDelay)
import Control.Monad (forM_)
import GHC.Conc (listThreads)
import System.Environment (getArgs)
import System.Mem (performMajorGC)

import Control.Debounce (
    debounceAction,
    defaultDebounceSettings,
    mkDebounce,
 )

main :: IO ()
main = do
    args <- getArgs
    let action =
            case args of
                [] -> pure ()
                "0" : _ -> pure ()
                i : _ -> threadDelay $ read i * 1_000_000
    forM_ [(0::Int) .. 100_000] $ \_ -> do
        r <- mkDebounce defaultDebounceSettings{debounceAction = action}
        r
    nowThreads <- length <$> listThreads
    threadDelay 5_000_000

    laterThreads <- length <$> listThreads
    threadDelay 1_000_000

    performMajorGC

    gcThreads <- length <$> listThreads
    threadDelay 1_000_000

    putStr "\nThreads amount right after: " >> print nowThreads
    putStr "\nThreads amount 5 sec later: " >> print laterThreads
    putStr "\nThreads amount after GC:    " >> print gcThreads
    putStrLn ""
Addendum 1
---------------------------
`master` BRANCH GHC 9.6.6 (with `listThreads`)
---------------------------
Threads amount right after: 57671

Threads amount 5 sec later: 57671

Threads amount after GC:    57671

     149,547,936 bytes allocated in the heap
     421,271,680 bytes copied during GC
      64,671,328 bytes maximum residency (7 sample(s))
       4,768,160 bytes maximum slop
             144 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        32 colls,     0 par    0.066s   0.066s     0.0021s    0.0039s
  Gen  1         7 colls,     0 par    0.100s   0.100s     0.0143s    0.0258s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    9.709s  ( 16.607s elapsed)
  GC      time    0.166s  (  0.167s elapsed)
  EXIT    time    0.005s  (  0.006s elapsed)
  Total   time    9.881s  ( 16.780s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    15,402,472 bytes per MUT second

  Productivity  98.3% of total user, 99.0% of total elapsed

---------------------------
`master` BRANCH GHC 9.6.6 (with `listThreads`) 
** 1 second `threadDelay` **
---------------------------
Threads amount right after: 47431

Threads amount 5 sec later: 47431

Threads amount after GC:    47431

     493,958,480 bytes allocated in the heap
     452,738,768 bytes copied during GC
      53,122,272 bytes maximum residency (10 sample(s))
       3,930,912 bytes maximum slop
             121 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       113 colls,     0 par    0.078s   0.078s     0.0007s    0.0041s
  Gen  1        10 colls,     0 par    0.103s   0.103s     0.0103s    0.0227s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   20.493s  ( 26.422s elapsed)
  GC      time    0.181s  (  0.181s elapsed)
  EXIT    time    0.003s  (  0.007s elapsed)
  Total   time   20.677s  ( 26.610s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    24,103,976 bytes per MUT second

  Productivity  99.1% of total user, 99.3% of total elapsed

---------------------------
`master` BRANCH GHC 9.6.6 (with `listThreads`) 
** 2 second `threadDelay` **
---------------------------
Threads amount right after: 38926

Threads amount 5 sec later: 38926

Threads amount after GC:    38926

     491,857,688 bytes allocated in the heap
     455,557,856 bytes copied during GC
      42,842,008 bytes maximum residency (10 sample(s))
       4,241,224 bytes maximum slop
             125 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       112 colls,     0 par    0.085s   0.085s     0.0008s    0.0029s
  Gen  1        10 colls,     0 par    0.116s   0.116s     0.0116s    0.0220s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   26.134s  ( 31.982s elapsed)
  GC      time    0.201s  (  0.201s elapsed)
  EXIT    time    0.002s  (  0.007s elapsed)
  Total   time   26.337s  ( 32.190s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    18,820,478 bytes per MUT second

  Productivity  99.2% of total user, 99.4% of total elapsed
Addendum 2
---------------------------
NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`)
---------------------------
Threads amount right after: 11371

Threads amount 5 sec later: 11371

Threads amount after GC:    11371

     363,695,808 bytes allocated in the heap
     362,168,344 bytes copied during GC
      14,189,632 bytes maximum residency (15 sample(s))
       2,079,680 bytes maximum slop
              46 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        80 colls,     0 par    0.050s   0.050s     0.0006s    0.0023s
  Gen  1        15 colls,     0 par    0.061s   0.061s     0.0040s    0.0062s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    9.509s  ( 16.404s elapsed)
  GC      time    0.111s  (  0.111s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time    9.620s  ( 16.520s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    38,248,628 bytes per MUT second

  Productivity  98.8% of total user, 99.3% of total elapsed

---------------------------
NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`) 
** 1 second `threadDelay` **
---------------------------
Threads amount right after: 11068

Threads amount 5 sec later: 11068

Threads amount after GC:    11068

     710,811,992 bytes allocated in the heap
     385,014,272 bytes copied during GC
      13,558,344 bytes maximum residency (17 sample(s))
       1,990,072 bytes maximum slop
              45 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       161 colls,     0 par    0.055s   0.055s     0.0003s    0.0016s
  Gen  1        17 colls,     0 par    0.065s   0.065s     0.0038s    0.0064s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time   19.795s  ( 26.229s elapsed)
  GC      time    0.120s  (  0.120s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   19.916s  ( 26.350s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    35,907,809 bytes per MUT second

  Productivity  99.4% of total user, 99.5% of total elapsed

---------------------------
NEW (#996) BRANCH GHC 9.6.6 (with `listThreads`) 
** 2 second `threadDelay` **
---------------------------
Threads amount right after: 18413

Threads amount 5 sec later: 18413

Threads amount after GC:    18413

     712,927,520 bytes allocated in the heap
     396,490,936 bytes copied during GC
      20,182,664 bytes maximum residency (16 sample(s))
       2,793,136 bytes maximum slop
              59 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       163 colls,     0 par    0.065s   0.066s     0.0004s    0.0034s
  Gen  1        16 colls,     0 par    0.072s   0.072s     0.0045s    0.0085s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   25.060s  ( 31.797s elapsed)
  GC      time    0.138s  (  0.138s elapsed)
  EXIT    time    0.001s  (  0.005s elapsed)
  Total   time   25.198s  ( 31.940s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    28,449,329 bytes per MUT second

  Productivity  99.4% of total user, 99.6% of total elapsed

@Vlix
Copy link
Contributor

Vlix commented Nov 2, 2024

Ah, after adding some BangPatterns in front of the results from listThread (like !nowThreads <-), this is the result with #996 (debounceAction is pure ()):

#996 Executable
Threads amount right after: 12638

Threads amount 5 sec later: 12638

Threads amount after GC:    1

     362,847,320 bytes allocated in the heap
     350,826,880 bytes copied during GC
      14,189,632 bytes maximum residency (15 sample(s))
       2,079,680 bytes maximum slop
              45 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        80 colls,     0 par    0.049s   0.049s     0.0006s    0.0022s
  Gen  1        15 colls,     0 par    0.057s   0.057s     0.0038s    0.0060s

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    9.428s  ( 16.301s elapsed)
  GC      time    0.106s  (  0.106s elapsed)
  EXIT    time    0.000s  (  0.003s elapsed)
  Total   time    9.534s  ( 16.411s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    38,487,966 bytes per MUT second

  Productivity  98.9% of total user, 99.3% of total elapsed

And this is the result on the master thread:

master Executable
Threads amount right after: 57749

Threads amount 5 sec later: 57749

Threads amount after GC:    34841

     148,134,400 bytes allocated in the heap
     425,123,096 bytes copied during GC
      64,377,464 bytes maximum residency (8 sample(s))
       6,040,968 bytes maximum slop
             145 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        31 colls,     0 par    0.066s   0.066s     0.0021s    0.0035s
  Gen  1         8 colls,     0 par    0.115s   0.116s     0.0145s    0.0242s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    9.660s  ( 16.566s elapsed)
  GC      time    0.181s  (  0.181s elapsed)
  EXIT    time    0.001s  (  0.002s elapsed)
  Total   time    9.843s  ( 16.750s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    15,335,172 bytes per MUT second

  Productivity  98.1% of total user, 98.9% of total elapsed

So it does seem the new way of debouncing actually cleans up the threads, whereas the current one leaks at least somewhat?


Some extra testing

1 sec leaves less:

`master` branch 1s `threadDelay` action
Threads amount right after: 46215

Threads amount 5 sec later: 46215

Threads amount after GC:    23412

2 sec leaves even less:

`master` branch 2s `threadDelay` action
Threads amount right after: 38577

Threads amount 5 sec later: 38577

Threads amount after GC:    11457

But 4 sec leaves more?:

`master` branch 4s `threadDelay` action
Threads amount right after: 42459

Threads amount 5 sec later: 42459

Threads amount after GC:    19088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants