Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC-Earth3 downscaled precip ssp126 data has nans #581

Closed
emileten opened this issue Feb 17, 2022 · 6 comments
Closed

EC-Earth3 downscaled precip ssp126 data has nans #581

emileten opened this issue Feb 17, 2022 · 6 comments

Comments

@emileten
Copy link
Contributor

@brews I am surprised we find nans after cleaning -- is it possible that we create nans during the pipeline ? I was tempted to re-run this, maybe it's just that something went wrong in that run...

Workflow : https://argo.cildc6.org/archived-workflows/default/ae0379ac-6f74-47af-bd91-1c4e6266bf11

Log :

Validating gs://downscaled-288ec5ac/stage/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp126/r1i1p1f1/day/pr/gr/v20220217012505.zarr
Traceback (most recent call last):
  File "/argo/staging/script", line 47, in <module>
    tasks = dask.compute(*tasks)
  File "/opt/conda/lib/python3.9/site-packages/dask/base.py", line 570, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/opt/conda/lib/python3.9/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)
  File "/opt/conda/lib/python3.9/site-packages/dask/local.py", line 315, in reraise
    raise exc
  File "/opt/conda/lib/python3.9/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/opt/conda/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/argo/staging/script", line 29, in clear_memory_intensive_tests
    _test_for_nans(d, v)
  File "/opt/dodola/dodola/core.py", line 701, in _test_for_nans
    assert ds[var].isnull().sum() == 0, "there are nans!"
AssertionError: there are nans!

@brews
Copy link
Member

brews commented Feb 17, 2022

It's weird that we're only seeing this in ssp126 and after downscaling...

I suppose the first step is to examine the output downscaled data (gs://downscaled-288ec5ac/stage/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp126/r1i1p1f1/day/pr/gr/v20220217012505.zarr) and see if these nans have a consistent pattern.

Edit:
Oh, and ds[var].isnull().sum() == 0 is potentially a case of the Mike bug...? I'm not positive whether this matters if .data is a nupy or dask array in this case...

@brews
Copy link
Member

brews commented Feb 17, 2022

Okay, after manually searching I was unable to find nulls in the data. I manually reran the quality-control check that failed — it passed.

So it appears that the data is good. Might have been an issue with transferring Zarr data from GCS. The retry-backoff for the QC steps starts at 5 seconds. It might be worth increasing that to 30 seconds if this really is the problem.

Other than that, the run and data appear fine.

@emileten
Copy link
Contributor Author

Thanks for digging @brews. I am not sure I see how a change in the retry-backoff would have fixed that though ...?

In general sounds like we can retry this run ? I can do it if you're not on it already.

@brews
Copy link
Member

brews commented Feb 17, 2022

@emileten I don't know for sure. This is all speculation on my part.

Zarr returns NaNs for missing chunks — might be that GCS is slow, busy, or limiting I/O. Waiting may give GCS a better chance to "empty its buffer" and catch up, or for other I/O or network-intensive work to complete. We've generally used a backoff of 15 to 30 seconds in other workflow steps, but this QC step starts at only 5 seconds.

I'll see if I can't get it to just pass by retrying the existing workflow, first.

@brews
Copy link
Member

brews commented Feb 17, 2022

After retrying the same workflow, it succeeded.

I'm going to close this issue unless you have other ideas, concerns, or suggestions, @emileten.

@emileten
Copy link
Contributor Author

emileten commented Feb 17, 2022

Wait, does it resume where it failed last time when using retry ? I didn't know that. Amazing 😳.

(Edit : don't have anything useful to add of course... Thanks !)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants