Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reticulate can't handle large integer values: Seems to experience integer overflow when it exceeds the limit for 32-bit signed integers #1647

Open
BorgeJorge opened this issue Aug 14, 2024 · 0 comments

Comments

@BorgeJorge
Copy link

BorgeJorge commented Aug 14, 2024

The easiest way of seeing this is to run:

> py_eval("10000**2")
[1] 100000000
> py_eval("100000**2")
[1] -1

To give another example: I'm using the Splink package to do probabilistic record linkage. At one point I run the linker.training.estimate_probability_two_random_records_match() function, which calculates a Cartesian product and stores it in an interim variable. My code runs fine in native Python but returns the following error when I use reticulate():

ValueError: Deterministic matching rules led to more observed matches than is consistent with supplied recall. With these rules, recall must be at least -0.00.

A line of Splink code before this error message is:

num_total_comparisons = summary_record["cartesian"]

When I display the value of num_total_comparisons, it shows as:

-2147483648

Which is -2^31, the minimum value for a 32-bit signed integer. When I reduce the size of my data frame so that the Cartesian product of the number of records is less than 2^31, it runs fine.

Let me know if you need any more info. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant