-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug harvesting data from WDC for services other than NWISUV & NWISDV #2417
Comments
Thanks for the investigation and examples @emiliom, this is a great catch. I'll take a look when I get the chance. |
Hi @emiliom, I just made a gist demoing the error we get for CoCoRaHs values: https://gist.github.com/rajadain/22b0bc5546fcd0dfdc0dba874bb31698. In this gist, I'm first doing a GetSeriesCatalogForBox2 search, then finding a CocoRaHs series record, then fetching date range for it using the ULMO's Looking at the error output it seems to be related to how the return values are being parsed. Do you have any insight into this? |
@rajadain I'm looking into it. I'll try to follow up today; otherwise, definitely by early Monday. |
I've found that the date range you're searching (2016-11-30 to 2016-12-31) doesn't actually have any data, despite the
I have no clue how widespread this problem is, within the CoCoRaHs service or others. But obviously in this particular case the problem is not on your end. You're applying correct logic: issuing date range parameters based on the metadata information extracted from There's nothing constructive we can do at this time, that I can think of right now. Except, maybe, changing the default date interval to be harvested initially to one year. Let's keep in mind that we've tended to assume (biased by often focusing on NWISUV) that all data sets are fairly high frequency, such as hourly. But many datasets are likely to be daily. The volume of data is not a function solely of the date range, but also of the data time step. Even a year of hourly data might not be a very large response, for a single variable. But, we'll only find out by trial and error. I'll give it more thought on Monday ... |
I should add that many datasets are much less frequent than daily! Some are very sporadic and sparse. For example, within the bounding box @rajadain used in his sample notebook, we can see these two examples (at least according to the series metadata):
|
Thanks for the insight. Given that I'll try and benchmark the time difference between fetching a month and a year for NWISUV / DV and report back. |
I just tried fetching 2 years of data (1 January 2015 to 31 December 2016) and got the same error: https://gist.github.com/rajadain/a03c6009b0ef80d1b3326e41c3c174af |
My apologies; what I reported on earlier about the CoCoRaHS site (being able to ingest data up to 2016-4-30) was probably based on tests that used lower-level, I'm able to reproduce the error with serv_client = Client(sample.ServURL + '?WSDL', timeout=5)
response = serv_client.service.GetValues(sample.location, sample.VarCode,
startDate=from_date.strftime('%m/%d/%Y'),
endDate=to_date.strftime('%m/%d/%Y')) I'll try to track down the problem in ulmo (which, FYI, likely involves an idiosyncracy with the service response). But as @ajrobbins said in her email today, we need to step back and decide whether pursuing much more effort on this issue is worthwhile. In the meantime, I'll comment briefly on your other comments:
That's the crux. I don't have anything new to say, yet, beyond my earlier suggestion for a year.
Well, NWISDV will definitely be fine. The D stands for "daily", so a one-year request will return 366 values at most. That shouldn't be a problem at all. For comparison, since NWISUV is typically hourly (I think), the one month request you're already doing returns 24*30 = 720 values.
No blanket guarantees. Every service is different. There will be some that are in fact hourly or more frequent, and others that are super sparse -- yearly, irregular, or a single value 80 years ago! Let me see if I can come up with a scheme that doesn't involve much hard-wiring and is generally applicable. |
I've found the source of the problem that @rajadain demoed in his latest notebook (where he tries fetching two years of data for a CoCoRaHs site). Unfortunately it's due to an interaction of conventions used in the WDC response (less than ideal, or possibly wrong, depending on what the standards spec says) and ulmo expectations. I think I (with help from Don) can create a fix to ulmo by tomorrow; however, if you are currently using the ulmo conda package, you'd have to switch to pip install from an ulmo fork instead. Anyway, more fodder for discussion and decision making. |
@emiliom, it would be great if you and Don found the fix. I just had a conversation with @ajrobbins and let her know that we can use WPF funds to fix this, as it will be important for Monitor My Watershed. So, if you do get a fix working in Ulmo by tomorrow, there might just be enough time for us to fix this bug before finalizing the Nov. release. |
Thanks, @aufdenkampe. I assume that Azavea has the time, though? I thought the core bottleneck was more time than money. Be that as it may ... I'd like to hear from @rajadain first if he's able to install (This would be an interim solution, of course. Ultimately we would submit a PR to ulmo, but we can't count on a time frame for the PR to be accepted there and turned into a new release.) |
Yes we can support custom GitHub fork URLs. We'll incorporate it as soon as it is available. |
Great, thanks. Fingers crossed ... |
@lsetiawan Look over the exchanges on this issue, from the start. I'll send you more info later today. I'll take a shot late today/tonight at implementing the Or if things go well I just may be done (including testing) by tomorrow morning 😸 |
@rajadain I have a tweaked fork of You can grab it as the Note that for expedience, my fix simply skips the "metadata" parsing for elements that were causing the error due to the convention mismatch. This means that metadata is not being read. I'm going to fix that later before I submit a PR to ulmo. But for the portal application, the skipped metadata is actually never used at this time, so you won't notice its absence. Let me know if you have any questions or run into any issues. I hope it works cleanly! Of course, we still have to deal with the issue of the |
Because of parse issues in the main repo. See here for background: #2417 (comment)
@rajadain Thanks for cc'ing me on #2446. That's looking great! One quick question/suggestion: we're going with a year of data as the default. Hopefully that length of time is specified in such a way that we could throttle it easily later on, as we accumulate experience and conclude that, for example, it's slowing down NWISUV responses too much and should be reduced. |
I went with a year on all because it seemed more consistent, although for many we don't actually get a year (I think NWISUV values are going back only till July, so maybe a quarter?). We can have different limits for different service providers, and if it makes sense to limit NWISUV to one month that can be done. |
I understand and agree with the reasoning for having a single length across all services (a year). What I'm suggesting is that if the implementation was done in such a way that that time interval is specified in a single place/configuration, we'd be able to adjust it easily (still across all services) later on. |
At our last call @rajadain mentioned something wrong with the GHCN service available via WDC, such that the data are not plotted in the detailed view. I've done some digging and tests on those other services. I also discussed this with @aufdenkampe on Thursday, but before I did more thorough testing.
I think the bug is on the portal code itself, and I suspect it has to do with trying to query data from services that don't include recent data (the last 1-3 months, the default window used by th BiG CZ portal).
The problem is not just with GHCN. No service other than NWISUV & NWISDV seems to work, even NWISGW. I've taken two sites identified via the portal (one from GHCN and one from CocoRAHs -- note these maybe temporary links), and tried them on the CUAHSI/WDC HydroClient. In both cases, the sites were fine on the latter.
If confirmed, this is an important bug. WDC data access that's limited to NWISUV & NWISDV is a very constrained implementation.
cc @ajrobbins
The text was updated successfully, but these errors were encountered: