Replace unsafe_free! with finalize? #2444

maleadt · 2024-07-12T14:47:23Z

I seemed to remember that finalize is slow, and that is why we implemented our own refcounting and provided unsafe_free!. However, the cost seems manageable:

julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  18.506 ns … 36.669 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.458 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.489 ns ±  0.536 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                            ▂▅█
  ▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▁▂▂▂▂▃▇▇▄▃███▅▃▃▃▃▂▂▂▂▂▁▂▁▂ ▃
  18.5 ns         Histogram: frequency by time        19.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark CUDA.unsafe_free!(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  3.010 ns … 18.370 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.080 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.093 ns ±  0.292 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                            █    ▅   ▁
  ▂▁▁▁▂▁▁▁▂▁▁▁▂▃▁▁▁▃▁▁▁▂▆▁▁▁█▁▁▁▂█▁▁▁█▁▁▁▂▆▁▁▁▄▁▁▁▂▃▁▁▁▃▁▁▁▂ ▂
  3.01 ns        Histogram: frequency by time        3.14 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@gbaraldi @vchuravy Thoughts? Does the cost maybe only manifest when the GC is loaded?

The text was updated successfully, but these errors were encountered:

vchuravy · 2024-07-12T17:45:24Z

IIRC it's a linear scan over the finalizer list, to remove the object from it.

So maybe create a couple thousand object with a finalizer and benchmark it then.

maleadt · 2024-07-13T08:30:35Z

Hmm, doesn't seem to significantly affect the performance:

julia> mutable struct ListNode
         key::Int64
         next::ListNode
         ListNode() = new()
         ListNode(x)= new(x)
         ListNode(x,y) = new(x,y)
       end

julia> function list(n=128)
           start::ListNode = ListNode(1)
           current::ListNode = start
           for i = 2:(n*1024^2)
               current = ListNode(i,current)
               finalizer(identity, current)
           end
           return current.key
       end
list (generic function with 2 methods)

julia> x = list();

julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  19.117 ns … 38.384 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.599 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.941 ns ±  0.811 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

         █▇▆▄▃          ▆▆▄▄▁                                 ▂
  ▇▆▇▆▆████████▄▅▆▇▆▆▆▇▇██████▄▃▁▃▁▁▁▄▁▄▆▃▄█▇▆▅▄▃▁▁▃▃▁▄▅▄▄▆▆▄ █
  19.1 ns      Histogram: log(frequency) by time      22.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Even though the code for jl_finalize_th and finalize_object does indeed seems fairly complex, iterating finalizers and even allocating a list. Not sure why that isn't visible here.

maleadt added performance How fast can we go? speculative Not sure about this one yet. labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace unsafe_free! with finalize? #2444

Replace unsafe_free! with finalize? #2444

maleadt commented Jul 12, 2024

vchuravy commented Jul 12, 2024

maleadt commented Jul 13, 2024

Replace unsafe_free! with finalize? #2444

Replace unsafe_free! with finalize? #2444

Comments

maleadt commented Jul 12, 2024

vchuravy commented Jul 12, 2024

maleadt commented Jul 13, 2024