You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I seemed to remember that finalize is slow, and that is why we implemented our own refcounting and provided unsafe_free!. However, the cost seems manageable:
julia>@benchmarkfinalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial:10000 samples with 997 evaluations.
Range (min … max):18.506 ns …36.669 ns ┊ GC (min … max):0.00%…0.00%
Time (median):19.458 ns ┊ GC (median):0.00%
Time (mean ± σ):19.489 ns ±0.536 ns ┊ GC (mean ± σ):0.00%±0.00%
▂▅█
▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▁▂▂▂▂▃▇▇▄▃███▅▃▃▃▃▂▂▂▂▂▁▂▁▂ ▃
18.5 ns Histogram: frequency by time 19.8 ns <
Memory estimate:0 bytes, allocs estimate:0.
julia>@benchmark CUDA.unsafe_free!(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial:10000 samples with 1000 evaluations.
Range (min … max):3.010 ns …18.370 ns ┊ GC (min … max):0.00%…0.00%
Time (median):3.080 ns ┊ GC (median):0.00%
Time (mean ± σ):3.093 ns ±0.292 ns ┊ GC (mean ± σ):0.00%±0.00%
█ ▅ ▁
▂▁▁▁▂▁▁▁▂▁▁▁▂▃▁▁▁▃▁▁▁▂▆▁▁▁█▁▁▁▂█▁▁▁█▁▁▁▂▆▁▁▁▄▁▁▁▂▃▁▁▁▃▁▁▁▂ ▂
3.01 ns Histogram: frequency by time 3.14 ns <
Memory estimate:0 bytes, allocs estimate:0.
@gbaraldi@vchuravy Thoughts? Does the cost maybe only manifest when the GC is loaded?
The text was updated successfully, but these errors were encountered:
Hmm, doesn't seem to significantly affect the performance:
julia> mutable struct ListNode
key::Int64
next::ListNode
ListNode() = new()
ListNode(x)= new(x)
ListNode(x,y) = new(x,y)
end
julia> function list(n=128)
start::ListNode = ListNode(1)
current::ListNode = start
for i = 2:(n*1024^2)
current = ListNode(i,current)
finalizer(identity, current)
end
return current.key
end
list (generic function with 2 methods)
julia> x = list();
julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 19.117 ns … 38.384 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.599 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.941 ns ± 0.811 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▆▄▃ ▆▆▄▄▁ ▂
▇▆▇▆▆████████▄▅▆▇▆▆▆▇▇██████▄▃▁▃▁▁▁▄▁▄▆▃▄█▇▆▅▄▃▁▁▃▃▁▄▅▄▄▆▆▄ █
19.1 ns Histogram: log(frequency) by time 22.5 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Even though the code for jl_finalize_th and finalize_object does indeed seems fairly complex, iterating finalizers and even allocating a list. Not sure why that isn't visible here.
I seemed to remember that
finalize
is slow, and that is why we implemented our own refcounting and providedunsafe_free!
. However, the cost seems manageable:@gbaraldi @vchuravy Thoughts? Does the cost maybe only manifest when the GC is loaded?
The text was updated successfully, but these errors were encountered: