structure of data export to InfluxDB #234
Replies: 3 comments
-
This results of course in another issue, similar to the second thought described: If a measurement is re-collected with a new data-quality-level, the old record will be re-written and is not available anymore. Having the data-quality-level as a tag rather than a field would result in two distinct records. |
Beta Was this translation helpful? Give feedback.
-
Dear @wetterfrosch, thanks for your valuable feedback. I have to admit that - while the data export feature was important to me - it apparently has not grown beyond a proof-of-concept implementation yet. Apart from the fact that it croaks on larger amounts of data (we will come back to this through different issues I will open later), these are important details you are bringing to the plate here. As I am in a rush right now, just two quick answers:
Hasta pronto and with kind regards, |
Beta Was this translation helpful? Give feedback.
-
This can be resolved by adding what you have called a "tag". I have already written about this in #142 as we should differentiate between historical and other data as only historical data is quality proven. Once per annum those records must be rewritten and for other terms, one can append the data by using "recent" or "now", but for those products I'd expect the data to be always the same for given reasons. Furthermore: Should we make tidy_data the standard? This would in consequence ease the process of mapping/renaming. |
Beta Was this translation helpful? Give feedback.
-
As @amotl asked how I think about the way the data-set of the InfluxDB is organized, I open this issue to collect some thoughts. Here the first two:
appearance of data-quality ("Qualitätsniveau") as a field rather a tag. I assume that the filtering of a data-field rather than a tag is more expensive and probably not feasible in every query.
missing tag-value of the product If I conclude correctly: As long as two products/sources (e.g. hourly and 10_minutes) are submitted to the same database, I assume that a hourly-record will rewrite an existing 10_minutes-record and vice-versa. IMHO that doesn't give necessary tribute to our data-provider (both data-sources have different attributions, both scopes of the product may differ!) -- and more important: This can (and does!) result in irregularities!
To explain a particular issue: The air_temperature value within the 10_minutes product is gathered as the 1-minute average of the minute before the timestamp (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/10_minutes/air_temperature/historical/BESCHREIBUNG_obsgermany_climate_10min_tu_historical_de.pdf), while the air_temperature value of the hourly product is gathered as the 1-minute mean of the minute 10 minutes before the timestamp (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/BESCHREIBUNG_obsgermany_climate_hourly_tu_recent_de.pdf)!
Having a tag for the product at every series' record would help to distinguish these product when there are written to the same database.
Beta Was this translation helpful? Give feedback.
All reactions