Framework for tracking what data we actually have #29

kyleredilla · 2024-02-27T20:02:52Z

We need some kind of tool / method / framework for quickly understanding what data we have. We have discussed this a few times but here is the ticket finally :D This would be most useful for the raw mirrored CMIP6 data, but eventually we will want it for all derived datasets.

The essential information is to know what combinations of time periods / models/ scenarios/ variables/ frequencies/ we have data for. Perhaps a dashboard, perhaps relying on a testing framework, perhaps an audit like what the esgf_holdings.py script does but for our own filesystem, maybe integrate with Google sheets, not sure what is best!

Joshdpaul · 2024-04-25T16:17:10Z

An internal holdings audit is currently accomplished by running transfers/holdings_summary.ipynb and manually joining the CSV outputs into a Google sheet. A more automated way of running the audit, generating the sheet, and sharing the results would be a great feature to add.

Maybe implement via a scheduled prefect flow? Maybe host the table on ARDAC somewhere for reference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework for tracking what data we actually have #29

Framework for tracking what data we actually have #29

kyleredilla commented Feb 27, 2024

Joshdpaul commented Apr 25, 2024

Framework for tracking what data we actually have #29

Framework for tracking what data we actually have #29

Comments

kyleredilla commented Feb 27, 2024

Joshdpaul commented Apr 25, 2024