structure of data export to InfluxDB #234

wetterfrosch · 2020-11-21T10:49:54Z

wetterfrosch
Nov 21, 2020

As @amotl asked how I think about the way the data-set of the InfluxDB is organized, I open this issue to collect some thoughts. Here the first two:

appearance of data-quality ("Qualitätsniveau") as a field rather a tag. I assume that the filtering of a data-field rather than a tag is more expensive and probably not feasible in every query.

missing tag-value of the product If I conclude correctly: As long as two products/sources (e.g. hourly and 10_minutes) are submitted to the same database, I assume that a hourly-record will rewrite an existing 10_minutes-record and vice-versa. IMHO that doesn't give necessary tribute to our data-provider (both data-sources have different attributions, both scopes of the product may differ!) -- and more important: This can (and does!) result in irregularities!

To explain a particular issue: The air_temperature value within the 10_minutes product is gathered as the 1-minute average of the minute before the timestamp (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/10_minutes/air_temperature/historical/BESCHREIBUNG_obsgermany_climate_10min_tu_historical_de.pdf), while the air_temperature value of the hourly product is gathered as the 1-minute mean of the minute 10 minutes before the timestamp (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/BESCHREIBUNG_obsgermany_climate_hourly_tu_recent_de.pdf)!

Having a tag for the product at every series' record would help to distinguish these product when there are written to the same database.

wetterfrosch · 2020-11-21T11:10:21Z

wetterfrosch
Nov 21, 2020
Author

appearance of data-quality ("Qualitätsniveau") as a field rather a tag. I assume that the filtering of a data-field rather than a tag is more expensive and probably not feasible in every query.

This results of course in another issue, similar to the second thought described: If a measurement is re-collected with a new data-quality-level, the old record will be re-written and is not available anymore. Having the data-quality-level as a tag rather than a field would result in two distinct records.

0 replies

amotl · 2020-11-21T11:57:12Z

amotl
Nov 21, 2020
Maintainer

Dear @wetterfrosch,

thanks for your valuable feedback. I have to admit that - while the data export feature was important to me - it apparently has not grown beyond a proof-of-concept implementation yet. Apart from the fact that it croaks on larger amounts of data (we will come back to this through different issues I will open later), these are important details you are bringing to the plate here.

As I am in a rush right now, just two quick answers:

Data quality should be an InfluxDB tag rather than a field: Acknowledged!
Differentiating between different resolutions: You should be able to ingest them into different tables, no? However, we also may think about mixing them into the same table, but then we will have to introduce another tag for designating the resolution as you suggested. So: Sure, we might consider doing this.

Hasta pronto and with kind regards,
Andreas.

0 replies

gutzbenj · 2020-11-21T12:12:24Z

gutzbenj
Nov 21, 2020
Maintainer

appearance of data-quality ("Qualitätsniveau") as a field rather a tag. I assume that the filtering of a data-field rather than a tag is more expensive and probably not feasible in every query.

This results of course in another issue, similar to the second thought described: If a measurement is re-collected with a new data-quality-level, the old record will be re-written and is not available anymore. Having the data-quality-level as a tag rather than a field would result in two distinct records.

This can be resolved by adding what you have called a "tag". I have already written about this in #142 as we should differentiate between historical and other data as only historical data is quality proven. Once per annum those records must be rewritten and for other terms, one can append the data by using "recent" or "now", but for those products I'd expect the data to be always the same for given reasons.

Furthermore: Should we make tidy_data the standard? This would in consequence ease the process of mapping/renaming.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structure of data export to InfluxDB #234

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

structure of data export to InfluxDB #234

wetterfrosch Nov 21, 2020

Replies: 3 comments

wetterfrosch Nov 21, 2020 Author

amotl Nov 21, 2020 Maintainer

gutzbenj Nov 21, 2020 Maintainer

wetterfrosch
Nov 21, 2020

wetterfrosch
Nov 21, 2020
Author

amotl
Nov 21, 2020
Maintainer

gutzbenj
Nov 21, 2020
Maintainer