Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill() performance #57

Open
belm0 opened this issue Apr 15, 2019 · 6 comments
Open

fill() performance #57

belm0 opened this issue Apr 15, 2019 · 6 comments
Assignees

Comments

@belm0
Copy link

belm0 commented Apr 15, 2019

For the use case of collecting execution time samples during a program run (and ultimately reporting quantiles), I'd like fill() to be fairly fast.

physt fill() execution time seem to be independent of binning strategy (trivially using constant data value in my tests). I was surprised that bin search is implemented via np.searchsorted() in all cases, even fixed_width binning.

$ python -m timeit -s 'from physt import h1; h = h1(None, "exponential", 100, range=(1e-6, 1))' 'h.fill(.1)'
10000 loops, best of 5: 38 usec per loop

$ python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 36.9 usec per loop

Comparing to (unmaintained) https://github.com/carsonfarmer/streamhist:

python -m timeit -s 'from streamhist import StreamHist; h = StreamHist()' 'h.update(.1)'
50000 loops, best of 5: 7.52 usec per loop

(aside: streamhist is quite nice about managing binning and being able to report arbitrary quantiles. Perhaps some of it could be adopted.)

@janpipek janpipek self-assigned this Apr 15, 2019
@janpipek
Copy link
Owner

This can be solved by moving find_bin from Histogram1D to BinningBase (and including more efficient variants in daughter classes).

@belm0
Copy link
Author

belm0 commented Apr 16, 2019

I don't believe that StreamHist uses fixed-width bins, yet is still able to have 5x faster update in pure Python. The README credits https://github.com/grantjenks/sorted_containers, if I understand correctly.

@janpipek
Copy link
Owner

janpipek commented Apr 16, 2019

I probably won't be able to make a significant refactoring soon but... in any case, I'd recommend you to use the "fill_n" method if you can.

In [26]: data = np.random.randn(100000)                                                                                                                                   

In [27]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [28]: %time for d in data: HA.fill(d)                                                                                                                                  
CPU times: user 3.86 s, sys: 56.4 ms, total: 3.92 s
Wall time: 3.84 s

In [29]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [30]: %time HA.fill_n(data)                                                                                                                                            
CPU times: user 16.2 ms, sys: 4.01 ms, total: 20.2 ms
Wall time: 19.2 ms

Or, more realistically (simulating that the data come from somewhere one by one):

In [48]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [49]: %time l = []; [l.append(i) for i in data]; HA.fill_n(l)                                                                                                          
CPU times: user 36.9 ms, sys: 4.04 ms, total: 40.9 ms
Wall time: 40.3 ms

@belm0
Copy link
Author

belm0 commented Apr 16, 2019

My use case is real time, and spikes from fill_n() batches would be unwanted. Also fill_n() is very slow for small arrays (probably because numpy is).

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(.0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 35.3 usec per loop

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5)); d=[.1]*10' 'h.fill_n(d)'
500 loops, best of 5: 740 usec per loop

@janpipek
Copy link
Owner

Ok, I'll try to optimize the single-value fill soon-ish.

@belm0
Copy link
Author

belm0 commented Apr 25, 2019

Here is a more fair timing of streamhist. Since my previous test filled with a constant value, the compute-intensive merging of bins was never triggered.

$ python -m timeit -s 'from random import random; from streamhist import StreamHist; h = StreamHist();' 'h.update(random())'
10000 loops, best of 5: 35.4 usec per loop

That result is just with the the default max bin count of 64. It gets worse quickly as max bins is increased. (Note: overhead of random() is negligible, about 50 ns.)

However, physt is not off the hook so easily... I have an implementation working at 12 usec for the same max bin count. More at #58 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants