Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unsafe_free! with finalize? #2444

Open
maleadt opened this issue Jul 12, 2024 · 2 comments
Open

Replace unsafe_free! with finalize? #2444

maleadt opened this issue Jul 12, 2024 · 2 comments
Labels
performance How fast can we go? speculative Not sure about this one yet.

Comments

@maleadt
Copy link
Member

maleadt commented Jul 12, 2024

I seemed to remember that finalize is slow, and that is why we implemented our own refcounting and provided unsafe_free!. However, the cost seems manageable:

julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min  max):  18.506 ns  36.669 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     19.458 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.489 ns ±  0.536 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                            ▂▅█
  ▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▁▂▂▂▂▃▇▇▄▃███▅▃▃▃▃▂▂▂▂▂▁▂▁▂ ▃
  18.5 ns         Histogram: frequency by time        19.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark CUDA.unsafe_free!(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  3.010 ns  18.370 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.080 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.093 ns ±  0.292 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                            █    ▅   ▁
  ▂▁▁▁▂▁▁▁▂▁▁▁▂▃▁▁▁▃▁▁▁▂▆▁▁▁█▁▁▁▂█▁▁▁█▁▁▁▂▆▁▁▁▄▁▁▁▂▃▁▁▁▃▁▁▁▂ ▂
  3.01 ns        Histogram: frequency by time        3.14 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@gbaraldi @vchuravy Thoughts? Does the cost maybe only manifest when the GC is loaded?

@maleadt maleadt added performance How fast can we go? speculative Not sure about this one yet. labels Jul 12, 2024
@vchuravy
Copy link
Member

IIRC it's a linear scan over the finalizer list, to remove the object from it.

So maybe create a couple thousand object with a finalizer and benchmark it then.

@maleadt
Copy link
Member Author

maleadt commented Jul 13, 2024

Hmm, doesn't seem to significantly affect the performance:

julia> mutable struct ListNode
         key::Int64
         next::ListNode
         ListNode() = new()
         ListNode(x)= new(x)
         ListNode(x,y) = new(x,y)
       end

julia> function list(n=128)
           start::ListNode = ListNode(1)
           current::ListNode = start
           for i = 2:(n*1024^2)
               current = ListNode(i,current)
               finalizer(identity, current)
           end
           return current.key
       end
list (generic function with 2 methods)

julia> x = list();

julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  19.117 ns … 38.384 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.599 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.941 ns ±  0.811 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

         █▇▆▄▃          ▆▆▄▄▁                                 ▂
  ▇▆▇▆▆████████▄▅▆▇▆▆▆▇▇██████▄▃▁▃▁▁▁▄▁▄▆▃▄█▇▆▅▄▃▁▁▃▃▁▄▅▄▄▆▆▄ █
  19.1 ns      Histogram: log(frequency) by time      22.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Even though the code for jl_finalize_th and finalize_object does indeed seems fairly complex, iterating finalizers and even allocating a list. Not sure why that isn't visible here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance How fast can we go? speculative Not sure about this one yet.
Projects
None yet
Development

No branches or pull requests

2 participants