-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfers pipeline improvements #54
Conversation
…-0 institution and rm duplicate model
…l possibilities to avoid row_di reference before assignment error
…ptive ls error messages; handle multiple row returns in make_holdings_table() and check for non-dict rows; incorporate retry into operation_ls()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all is looking good to me!! As you know I have made a few changes:
- improved network error handling of the
esgf_holdings.py
- it seemed like we were really only getting 502 errors (at least I was) on directories that were in fact valid paths / could feasibly have data. I tried to make this more robust and add retry logic that, while it seems to work, can really stretch the execution time (exponential backoff). See also revision suggestion on the operation_ls function. - changed things in the config a bit to include E3SM models and added an explicit "models_of_interest" list and similar tweaks to relevant scripts that makes it clear "we are interested in some models. we want to download a subset of those based on results"
- added a script
esgf_holdings_e3sm.py
for getting the holdings of E3SM only. This seemed like the less complex way of doing things at the time. it creates a CSV for E3SM holdings only. - updated the
select_variants.ipynb
based on updated holdings and knowledge of ensemble selection - update the
holdings_summary.ipynb
notebook based on updated holdings - reviewed made a couple of tweaks to the
select_grids.ipynb
notebook
transfers/utils.py
Outdated
try: | ||
r = tc.operation_ls(ep, path) | ||
except globus_sdk.TransferAPIError as exc: | ||
return exc.http_status | ||
|
||
n = 5 | ||
for retry in range(n): | ||
try: | ||
r = tc.operation_ls(ep, path) | ||
except globus_sdk.TransferAPIError as exc: | ||
if retry < n - 1: | ||
continue | ||
else: | ||
return exc.http_status | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually might want to keep this as is. I added an exponential time backoff to the call of this function should it return specific status. I think I was still getting errors even with the retries because this would retry immediately. I think it is kind of helpful to keep this function simply "try it once, and return the error code if it fails" and we can add wrappers for the retries like I did in esgf_holdings.py
. Open to discussing though.
try: | |
r = tc.operation_ls(ep, path) | |
except globus_sdk.TransferAPIError as exc: | |
return exc.http_status | |
n = 5 | |
for retry in range(n): | |
try: | |
r = tc.operation_ls(ep, path) | |
except globus_sdk.TransferAPIError as exc: | |
if retry < n - 1: | |
continue | |
else: | |
return exc.http_status | |
try: | |
r = tc.operation_ls(ep, path) | |
except globus_sdk.TransferAPIError as exc: | |
return exc.http_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised in c5f3c2b
This PR closes #39 , closes #38 , addresses #32, and closes #3.
Changes to model ensemble
The following models were added to the transfer pipeline:
and the following model was removed:
All files were successfully transferred by running the pipeline as described in the README, except the E3SM files. Though the E3SM models appear in the holdings, only the historical period is available on the LLNL ESGF2 globus node. Therefore the E3SM models did not have a variant chosen for the
config.prod_variant_lu
table and were not included in the batch files. We will need to keep #32 open for now and decide what to do about the incomplete data for the E3SM models.Retry in
ls
utilityThe
utils.operation_ls()
function was improved by incorporating a "retry" logic, to attempt 5 retries if the operation recieves an HTTP error. This seemed to fix the intermittent errors from issue #3 and resulted in a clean transfers run. By printing error statements from the holdings audit (see changes toesgf_holdings.get_filenames()
function) and capturing the output in a text file (see the newesgf_holdings_output.txt
) it should be possible to parse any persistent errors and potentially retry those specfic paths in a new transfer session. (This is somewhat related to issue #53 in that we may want to include the ability to audit/transfer smaller batches of files.) That output text file also prints errors for empty directories, which could be retried at a later date to check if more data has been uploaded to the globus node.Allow multiple institutions & multiple grid types
The code was refactored to allow for including lists of institutions and grid types in
config.py
. Some functions were revised to return a list of dicts instead of just a dict, so that if multiple directories were encountered duringls
operations, all contents of the directories could be audited. (Since there is no recursivels
function availble in the globus SDK, this was necessary in cases where we don't know which models have multiple grid types.) Using the newselect_grids.ipynb
notebook, we see that only two models (CNRM-CM6-1-HR and EC-Earth3-Veg) require more than one grid type in order to capture all frequency/variable combinations. Otherwise, models with more than one grid type have equal representation of frequency/variable combinations regardless of which grid type is chosen; in those cases, the native gridgn
is always specified in theconfig.py
.In all cases where the
config.model_inst_lu
table is used, a conditional statement was added in order to allow for multiple institutions. This only applies to the MPI-ESM1-2-HR model, but the code now supports this situation in general without flagging this model explicitly. From what I can tell, these revisions to the transfersconfig.py
should not affect the regridding or indicators pipelines since those use their ownconfig.py
files.TO TEST:
I don't think you need to run the entire transfer, but you can if you really want to! What I'd really appreciate is a sanity check on the list logic in the
esgf_holdings.py
functions, and the conditional statements ingenerate_manifest.py
,generate_batch_files.py
, andtests/test_mirror.py
. I think my testing confirms that this works for the cases we have identified, but are there other cases that could cause this to break or behave unexpectedly? Are there any obvious cases that are not captured by the conditional statements inesgf_holdings.get_filenames()
?Check out the new
select_grids.ipynb
to see the analysis. This notebook uses very similar logic and borrows some code fromselect_variants.ipynb
.Also, if there are any ideas to improve the printed error statements that end up in
esgf_holdings_output.txt
, let me know how that could be improved to make that output more useful in subsequent transfer attempts.