-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timedelta, timedelta64 and datetime64 plus respective conversions #509
base: main
Are you sure you want to change the base?
Conversation
@cjdoris what do you think about this PR? |
@cjdoris still hoping to get this integrated ... |
Just chiming in with some comments: 1: Could you add some tests for this? This adds a lot of new features so I think should have testing to cover everything. Ideally the test coverage of the diff should be 100%. It should also cover your intended usecase with
as this seems subtle. 2: Is the 3-arg version of the |
year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, | ||
millisecond::Int=_millisecond, microsecond::Int=_microsecond, nanosecond::Int=_nanosecond | ||
) | ||
pyimport("numpy").datetime64("$(DateTime(year, month, day, hour, minute, second))") + pytimedelta64(;millisecond, microsecond, nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pyimport("numpy")
the correct API call, or is that just to be used in user packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw similar calls at different places in the package, so I took this approach. But I also wouldn't know how to code a timedelta64 without calling pyimport.
Please let me know if there's a better solution.
src/Convert/numpy.jl
Outdated
T = types[findfirst(==(unit), units)] | ||
pyconvert_return(CompoundPeriod(T(value * count)) |> canonicalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to do this would be to use Base.Cartesian.@nif
. That way you could write this code to avoid dynamic dispatch on types
(which will be very slow).
src/Convert/numpy.jl
Outdated
units = ("Y", "M", "W", "D", "h", "m", "s", "ms", "us", "ns") | ||
types = (Year, Month, Week, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond) | ||
T = types[findfirst(==(unit), units)] | ||
pyconvert_return(DateTime(_base_datetime) + T(value * count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment – you should write this using Base.Cartesian.@nif
over the types
tuple to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, tested with a julia function calls and found that the julia part is around 25ns, whereas the python call is around 1.5microsecond
src/Convert/numpy.jl
Outdated
_year::Int=0, _month::Int=0, _day::Int=0, _hour::Int=0, _minute::Int=0, _second::Int=0, _millisecond::Int=0, _microsecond::Int=0, _nanosecond::Int=0, _week::Int=0; | ||
year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, microsecond::Int=_microsecond, millisecond::Int=_millisecond, nanosecond::Int=_nanosecond, week::Int=_week) | ||
pytimedelta64(sum(( | ||
Year(year), Month(month), # you cannot mix year or month with any of the below units in python, the error will be thrown by `pytimedelta64(::CompoundPeriod)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment
you cannot mix year or month with any of the below units in python, the error will be thrown by
pytimedelta64(::CompoundPeriod)
Should be presented to the user as a descriptive error message rather than a comment in the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment isn't clear enough.
Python throws a well understandable descriptive error in case of wrong usage, so no need for us to do so. Agree?
|
||
function pyconvert_rule_timedelta64(::Type{CompoundPeriod}, x::Py) | ||
unit, count = pyconvert(Tuple, pyimport("numpy").datetime_data(x)) | ||
value = reinterpret(Int64, pyconvert(Vector, x))[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reinterpret
safe here? Is there a better alternative to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought, pyconvert creates a new Julia Vector which is not mapped onto Python data. If that would be the case, we'd need to wrap the vector by a copy()
.
src/Convert/numpy.jl
Outdated
for T in (CompoundPeriod, Year, Month, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond, Week) | ||
pyconvert_add_rule("numpy:timedelta64", T, pyconvert_rule_timedelta64, priority) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Julia is unlikely to unroll this loop, you should use Base.Cartesian.@nexprs
here to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried my best, but I'm not sure how to test whether this will speed up things
src/Convert/numpy.jl
Outdated
args = T .== (Day, Second, Millisecond, Microsecond, Minute, Hour, Week) | ||
pydatetime64(x.value .* args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to rewrite this with Base.Cartesian.@nif
rather than doing a masked sum, since you know there will be only 1 element in the sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sum is dammed fast (16ns), and I couldn't beat it with a different version.
Thanks for all the comments, I already learned some new things about Julia, which is always nice. I will go through them the next days and once we've agreed on the solutions write the test functions. Just want to re-raise my question from the previous PR. Do you think, we should stick with CompoundPeriod as default conversion type? |
Co-authored-by: Miles Cranmer <[email protected]>
Co-authored-by: Miles Cranmer <[email protected]>
Concerning the three-argument version of |
@cjdoris @MilesCranmer |
This PR replaces #334 and takes into account the major refactoring of PythonCall.
Particularly, it fixes #293.
What's new?
Python Constructors
Conversion to Julian types
DataFrame handling
I've set the priority of datetime, timedelta, datetime64 and timedelta64 to ARRAY, which allows for automatic Table conversion - I hope that's the intended way to do it.
Default Conversion
I chose to use
Dates.CompoundPeriod
as result type of default conversion from timedelta64 as both types support year, month and minor period units. This is debatable, we could also change it toPeriod
, hence the resulting type would depend on the input.Both Python and Julia do not convert between Year/Month and the other period types, so there is no danger with this choice to arrive at ill-determined intervals.
The difference is that Julia allows addition/subtraction of mixed types while Python/Numpy throws an error.
The difference to the previous PR is that all conversions rely on either builtin or numpy functions and do not use pandas.
Ordering of arguments for
pytimedelta()
was chosen to be identical to the python version, while ordering forpytimedelta64()
is strictly descending, exceptweek
which comes last.EDIT: add comments in what's new code, added conversions
EDIT2: removed comment about datetime_data, I had misunderstood the meaning and have updated the code