-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.histogram
for cubes
#5902
Comments
@SciTools/peloton Thanks @schlunma for this suggestion. We are curious about your use case, could you share this with us and why you require this additional Iris specific functionality rather than just using numpy? |
The numpy and dask versions will always collapse the entire array; there is now way of calculating the histogram along one or more specified axes. However, this is exactly what I need for my particular use case: I want to calculate a metric called Earth mover's distance across different coordinates. The default numpy and dask histograms would only allow me to calculate that metric across the entire dataset. More details can be found in the corresponding ESMValCore PR, in there you can also find working code. |
There is an open issue in numpy about adding the |
Thanks for the link @rcomer! Especially the I am also completely fine to include this into ESMValCore, so we can close this if this is not relevant for you. |
I reckon further discussion first, now we have a more detailed use case. |
For me the key question here is : ? what is the point of making this function of a Cube, rather than just an operation on an array, It could be that the coords add some validity to operation, or that a Cube with a 'value_bins' dimension is itself useful. Perhaps iris.plot has a role. |
I don't have the killer argument for this; I guess it's just nicer to have this work with labeled dimensions instead of axes and include proper metadata handling. For my specific use case, it would also be totally fine to have this work with arrays. However, your argumentation could also be applied to most mathematical operations in iris, right? For example, why do you have |
Totally, it's a judgement thing. In this case, I guess the result cube would always have a count or frequency identity, so probably a long-name and units of '1'. |
✨ Feature Request
I am currently working on an ESMValTool preprocessor that calculates histograms from cubes along given coordinates similar to
np.histogram
. I think this would also be a nice fit to iris in theiris.analysis.stats
module. Here is a possible call signature:This function should fully support lazy and/or masked data. If this is considered relevant for iris, I can open a PR (already have some code for this).
Motivation
Calculating histograms is a common task in geosciences.
The text was updated successfully, but these errors were encountered: