Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the error message when build_multivariate_dataframe has the list of stat_vars more than the batch_size #184

Open
sharadshriram opened this issue Oct 13, 2022 · 0 comments

Comments

@sharadshriram
Copy link

cc: @shifucun

I was using a script to build_multivariate_dataframe for a stat_var list of length more than 50 and got the following error:

Traceback (most recent call last):
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 88, in <module>
    save_statvar_to_csv(place, 'data.csv')
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 67, in save_statvar_to_csv
    df = dpd.build_multivariate_dataframe([place], stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 314, in build_multivariate_dataframe
    df = pd.DataFrame.from_records(_multivariate_pd_input(places, stat_vars))
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 238, in _multivariate_pd_input
    rows_dict = _group_stat_all_by_obs_options(places,
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 88, in _group_stat_all_by_obs_options
    stat_all = dc.get_stat_all(places, stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/stat_vars.py", line 226, in get_stat_all
    batches = -(-len(places) // places_per_batch)
ZeroDivisionError: integer division or modulo by zero

However, ZeroDivisionError: integer division or modulo by zero did not help me understand what caused the ZeroDivisionError. After backtracking, I observed the error was caused not because of batching, but because the len(stat_var) passed to dc.get_stat_all(places, stat_vars) was greater than 50.

Is it possible for the error message to read out that the length of stat_var list passed is more than the batch_size limit of 50?

I also wonder whether, we can extend the get_stat_all() method to chunk long lists of stat_var to length 50, and do the API query. Would like to hear your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant