Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated processing of Counterfactuals #91

Open
11 of 16 tasks
A-Buch opened this issue Jul 24, 2023 · 5 comments
Open
11 of 16 tasks

Automated processing of Counterfactuals #91

A-Buch opened this issue Jul 24, 2023 · 5 comments

Comments

@A-Buch
Copy link

A-Buch commented Jul 24, 2023

Hi @SimonTreu ,

i just add my notes and things we discussed here as an issue, so we have everything in one place. Feel free to edit this comment with your bullet points and tasks.

  • Maybe later some descriptions about the workflow could be moved to README.md

ToDos set up:

  • adapt settings.py, runscript.sh, etc. to my paths and input folder.
  • optional: adapt inside runscript.sh variable-placeholder and tile-placeholder
  • read out jobid --> in bash: runid=$(sbatch slurm.sh)

runscript.sh: Aim of the script: run all variables for one tile

  • creates folder for each variable per tile
  • loads respective python scripts to these folder
  • set inside bash scripts array number to around 100
  • test --depend_after command, which starts new variable after previous is finished, but 4-5 varialbes in parallel runs should be actually possible

ToDos processing:

  • merge traces into one file e.g. as dictionary and save via pickle (not as netcdf) (simon: netcdf is also ok. doesn't really matter), same for timeseries files.

ToDos debugging rechunk_netcdf() :

sanity tests and further checks:

  • write small tests for timeseries and trace files e.g. via xarray to assure that at least the checked file is as expected
  • check amount of failing cells (if it is not automatically already done in one of the scripts)
    logp is stored in trace files as deterministic variable,
  • so check via xxx.py if logp is below threshold

ToDos after processing:

  • visual check of generated files e.g. via ncview or xarray

After sanity tests, merge files and then finally delete not needed intermediate steps e.g. .h5 files, keep merged traces and merged timeseries files as backup

In the end we should have following files for each tile:

  • *_cfact.nc
  • *_rechunked.nc > reversed rechunking
  • *_rechunked_valid.nc > the approved version of *_rechunked.nc , contains replaced inf and nan values, this will be the final counterfactual file, if everything file is correct, the other nc file can be deleted
  • *trend..nc > just for verification needed
  • *_yearmean.nc > just for verification needed

ToDo during run:

  • check memory usage , number of files etc. with mmlsquota
  • if needed cancel jobs starting from the newest one

Be aware:

  • first test with one tile, if this works fine, multiple tiles can be run in sequence
  • only if memory limit is exceeded: split tasks and let them run in personal section and on project folder
  • optional: check if 8 cpus-per-task is too much (decrease only if necessary)
  • Check code lines if values are hard coded
@SimonTreu
Copy link
Collaborator

SimonTreu commented Jul 24, 2023

I added my notes below

  • symlink to attrici_input
    • change input_dir to the one provided by dominik.
  • create runfolders for a tile
    • use create_runscripts.sh
    • change paths as required
    • call it e.g. for tile t00001: create_runscripts.sh 00001

data_processing_workflow.py

  • for each variable → test if it is ok to start the processes for all variables in parallel or better serial.
    • cd into the folder
    • run sbatch submit.sh
      • save the jobid into a variable
    • run sbatch --depend-afterany prev_jobid sanity_check.sh
      • sanity_check.sh should:
      • count number of created files and compare to number of non-masked cells in the landmask else fail
      • check that there is no error in outdir/failing_cells.log. else → fail
    • if sanity_check was ok → sbatch --depend-aftersuccess prev_jobid sbatch_write_netcdf.sh
    • parallel if sanity_check was ok run → merge traces
    • if successfull run quality_ceck on the merged files → check logs for → error messages, number of nan should not be much larger than the number of nan in the (masked) orig. file
    • auch irgendwie die erzeugten netCDF files testen.
      • if ok delete hdf files with timeseries
    • sanity check on merged trace files
      • if ok delete pickl files with traces
    • output a command to cancle all started or queueued jobs in the correct order (last in -- first out) such that dependent jobs don't get started
  • end for

Manual steps while this is running and after all is done

  • check the data_processing_workflow log → all ok?
  • if it is necessary to cancel jobs, cancel in the correct order
  • Speicherplatz prüfen in /p/tmp und in /p/projects/ou/rd3/dmcci
    • mmlsquota --block-size G -j dmcci projects
    • mmlsquota --block-size G tmp

@A-Buch
Copy link
Author

A-Buch commented Jul 24, 2023

One further question: Where should i push my commits in a new branch or in update_pymc branch? Probably in a new branch would be better to avoid mixing, right?

@SimonTreu
Copy link
Collaborator

It might be necessary to change

cdo_processing = True

to

cdo_processing = False

if this step takes to much time. The trend was used as quick check of the. trend2 should be close to 0 for the counterfactual at all grid points.

@SimonTreu
Copy link
Collaborator

One further question: Where should i push my commits in a new branch or in update_pymc branch? Probably in a new branch would be better to avoid mixing, right?

Yes, please use a new branch as those changes are not directly connected to the update anymore

@A-Buch
Copy link
Author

A-Buch commented Aug 10, 2023

cdo_processing = False is needed, at least for tiles with a lot of land cells, due that it takes too much time to generate the netcdf files otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants