Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with get_nhdplushr for a few basins #411

Open
tjstagni opened this issue Sep 13, 2024 · 9 comments
Open

Issue with get_nhdplushr for a few basins #411

tjstagni opened this issue Sep 13, 2024 · 9 comments

Comments

@tjstagni
Copy link

tjstagni commented Sep 13, 2024

Hi @dblodgett-usgs, I'm trying to load a few basins with the get_nhdplusHR function and I'm running into a couple of issues.

For these HUC04 basins, 1704, 0318, and 0307 The function never completes, and the data is not loaded into R. I had basin 0318 running for a few hours and it still did not complete. I've download all HUC04s surrounding these basins and had no issues with any of the adjacent basins.

Also, running this basin 0201 gave me this error, and the basin was not downloaded using download_nhdplushr

image

Here is a reproducible example and I'm using the latest version 1.2.1:

temp_dir = file.path(nhdplusTools_data_dir(), "temp_hr_cache")

download_dir = download_nhdplushr(temp_dir, "1704")

hr = get_nhdplushr(download_dir, file.path(download_dir, "nhdplus_out.gpkg"),
layers=c("NHDFlowline","NHDPlusBurnWaterbody"))
@dblodgett-usgs
Copy link
Collaborator

I see how this is just stalled out. Will have to do some looking to see what the deal is.

There is a mix of casing in the nhdplushr attributes that cause some issues. I thought I'd handled all the issues but may have missed one. To verify, are you on the latest hydroloom as well? https://github.com/DOI-USGS/hydroloom

@tjstagni
Copy link
Author

Yes, I have hydroloom 1.1.0 installed but I'm not using for this.

@dblodgett-usgs
Copy link
Collaborator

OK -- so my theory is that this is the culprit -- it did eventually finish.

Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 1: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow.  You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined

@dblodgett-usgs
Copy link
Collaborator

Yeah -- I turned off polygon ring direction checks and it runs fine. NHDPlusHR is a fairly clean dataset from that perspective so I feel comfortable leaving it off. Once #412 is merged, you can install from github and this should work.

@tjstagni
Copy link
Author

@dblodgett-usgs thanks so much for working quickly on the updates. Do you have a sense for how long get_nhdplushr should run with the new updates? My current run with basin "1704" has been going for 45 minutes and it still has not completed.

Also, I still have the same error with basin "0201", The basin does not download when using download_nhdplus

@dblodgett-usgs
Copy link
Collaborator

Apologies -- I rushed this "fix" and missed what was actually changed to make it work faster. (I had a cached gpkg that was being read fast!) Will get the actual fix up in a moment.

@tjstagni
Copy link
Author

@dblodgett-usgs no worries, thank you! Does this fix include the solving the issue for basin "0201" as well?

@dblodgett-usgs
Copy link
Collaborator

0201 doesn't exist in archive or current under here: https://prd-tnm.s3.amazonaws.com/index.html?prefix=StagedProducts/Hydrography/NHDPlusHR/VPU/

So not a lot I can do there. :/

@dblodgett-usgs
Copy link
Collaborator

dblodgett-usgs commented Sep 16, 2024

OK -- so my guess about what this was turned out to be wrong.

Something was hung in the make_standalone() call.

This should work:

temp_dir = file.path(nhdplusTools_data_dir(), "temp_hr_cache")

download_dir = download_nhdplushr(temp_dir, "1704")

unlink(file.path(download_dir, "nhdplus_out.gpkg"))

hr = get_nhdplushr(download_dir, file.path(download_dir, "nhdplus_out.gpkg"),
                   layers=c("NHDFlowline","NHDPlusBurnWaterbody"), 
                   check_terminals = FALSE)

check_terminals = FALSE just avoids the call to make_standalone() though so your network won't be self contained unless everything terminates to the coast.

I'll try and figure out what's hanging in that call when I have a few minutes but that should get you unstuck for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants