Automated processing of Counterfactuals #91

A-Buch · 2023-07-24T12:33:21Z

i just add my notes and things we discussed here as an issue, so we have everything in one place. Feel free to edit this comment with your bullet points and tasks.

Maybe later some descriptions about the workflow could be moved to README.md

ToDos set up:

adapt settings.py, runscript.sh, etc. to my paths and input folder.
optional: adapt inside runscript.sh variable-placeholder and tile-placeholder
read out jobid --> in bash: runid=$(sbatch slurm.sh)

runscript.sh: Aim of the script: run all variables for one tile

creates folder for each variable per tile
loads respective python scripts to these folder
set inside bash scripts array number to around 100
test --depend_after command, which starts new variable after previous is finished, but 4-5 varialbes in parallel runs should be actually possible

ToDos processing:

merge traces into one file e.g. as dictionary and save via pickle (not as netcdf) (simon: netcdf is also ok. doesn't really matter), same for timeseries files.

ToDos debugging `rechunk_netcdf()` :

sanity tests and further checks:

write small tests for timeseries and trace files e.g. via xarray to assure that at least the checked file is as expected
check amount of failing cells (if it is not automatically already done in one of the scripts)
logp is stored in trace files as deterministic variable,
so check via xxx.py if logp is below threshold

ToDos after processing:

visual check of generated files e.g. via ncview or xarray

After sanity tests, merge files and then finally delete not needed intermediate steps e.g. .h5 files, keep merged traces and merged timeseries files as backup

In the end we should have following files for each tile:

*_cfact.nc
*_rechunked.nc > reversed rechunking
*_rechunked_valid.nc > the approved version of *_rechunked.nc , contains replaced inf and nan values, this will be the final counterfactual file, if everything file is correct, the other nc file can be deleted
*trend..nc > just for verification needed
*_yearmean.nc > just for verification needed

ToDo during run:

check memory usage , number of files etc. with mmlsquota
if needed cancel jobs starting from the newest one

Be aware:

first test with one tile, if this works fine, multiple tiles can be run in sequence
only if memory limit is exceeded: split tasks and let them run in personal section and on project folder
optional: check if 8 cpus-per-task is too much (decrease only if necessary)
Check code lines if values are hard coded

The text was updated successfully, but these errors were encountered:

SimonTreu · 2023-07-24T13:00:04Z

I added my notes below

symlink to attrici_input
- change input_dir to the one provided by dominik.
create runfolders for a tile
- use create_runscripts.sh
- change paths as required
- call it e.g. for tile t00001: create_runscripts.sh 00001

data_processing_workflow.py

for each variable → test if it is ok to start the processes for all variables in parallel or better serial.
- cd into the folder
- run sbatch submit.sh
  - save the jobid into a variable
- run sbatch --depend-afterany prev_jobid sanity_check.sh
  - sanity_check.sh should:
  - count number of created files and compare to number of non-masked cells in the landmask else fail
  - check that there is no error in outdir/failing_cells.log. else → fail
- if sanity_check was ok → sbatch --depend-aftersuccess prev_jobid sbatch_write_netcdf.sh
- parallel if sanity_check was ok run → merge traces
- if successfull run quality_ceck on the merged files → check logs for → error messages, number of nan should not be much larger than the number of nan in the (masked) orig. file
- auch irgendwie die erzeugten netCDF files testen.
  - if ok delete hdf files with timeseries
- sanity check on merged trace files
  - if ok delete pickl files with traces
- output a command to cancle all started or queueued jobs in the correct order (last in -- first out) such that dependent jobs don't get started
end for

Manual steps while this is running and after all is done

check the data_processing_workflow log → all ok?
if it is necessary to cancel jobs, cancel in the correct order
Speicherplatz prüfen in /p/tmp und in /p/projects/ou/rd3/dmcci
- mmlsquota --block-size G -j dmcci projects
- mmlsquota --block-size G tmp

A-Buch · 2023-07-24T13:57:29Z

One further question: Where should i push my commits in a new branch or in update_pymc branch? Probably in a new branch would be better to avoid mixing, right?

SimonTreu · 2023-07-24T14:00:11Z

It might be necessary to change

cdo_processing = True

to

cdo_processing = False

if this step takes to much time. The trend was used as quick check of the. trend2 should be close to 0 for the counterfactual at all grid points.

SimonTreu · 2023-07-24T14:01:40Z

One further question: Where should i push my commits in a new branch or in update_pymc branch? Probably in a new branch would be better to avoid mixing, right?

Yes, please use a new branch as those changes are not directly connected to the update anymore

A-Buch · 2023-08-10T20:27:06Z

cdo_processing = False is needed, at least for tiles with a lot of land cells, due that it takes too much time to generate the netcdf files otherwise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated processing of Counterfactuals #91

Automated processing of Counterfactuals #91

A-Buch commented Jul 24, 2023 •

edited

Loading

SimonTreu commented Jul 24, 2023 •

edited by A-Buch

Loading

A-Buch commented Jul 24, 2023

SimonTreu commented Jul 24, 2023

SimonTreu commented Jul 24, 2023

A-Buch commented Aug 10, 2023

Automated processing of Counterfactuals #91

Automated processing of Counterfactuals #91

Comments

A-Buch commented Jul 24, 2023 • edited Loading

ToDos set up:

ToDos processing:

ToDos debugging rechunk_netcdf() :

sanity tests and further checks:

ToDos after processing:

ToDo during run:

SimonTreu commented Jul 24, 2023 • edited by A-Buch Loading

data_processing_workflow.py

Manual steps while this is running and after all is done

A-Buch commented Jul 24, 2023

SimonTreu commented Jul 24, 2023

SimonTreu commented Jul 24, 2023

A-Buch commented Aug 10, 2023

A-Buch commented Jul 24, 2023 •

edited

Loading

ToDos debugging `rechunk_netcdf()` :

SimonTreu commented Jul 24, 2023 •

edited by A-Buch

Loading