Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #532.
This PR adds a GeoTIFF mask file for nearly every Rasdaman coverage used by the Data API. The API uses these files to do a local lat/lon check against the GeoTIFF to verify that data will be available before sending a data fetch to Rasdaman. This has a couple big benefits:
This PR only checks GeoTIFFs for single lat/lon points, not areas, because deciding what to do about an area with partial nodata is not obvious, and also performing pixel intersection operations with a local GeoTIFF could add (I think) a non-trivial amount of extra work to each area request. But we can revisit this later if needed.
I've omitted the ALFRESCO endpoints from the local GeoTIFF check for the reason mentioned above. Since all ALFRESCO point queries get turned into HUC-12 area queries, it's not clear how to handle this or whether it's worth it.
I've also omitted local GeoTIFF checks for the CMIP6 monthly and CMIP6 indicators coverages since Rasdaman deals with the antimeridian nodata "ray of hope" issues better than a local GeoTIFF check, often returning valid data, whereas a local GeoTIFF check returns nodata. Plus, the CMIP6 data footprint is basically just a rectangular BBOX that is better handled via BBOX lat/lon validation.
I've also had to tweak every impacted endpoint to recognize the new 404 error code that will result from finding nodata in a local GeoTIFF.
Testing this PR is tricky because there are so many endpoints to test, and deciding which lat/lon coordinate to check depends largely on the endpoint/coverage being tested. To make this easier, I've whipped up the following throwaway Python script that tests all affected endpoints with random lat/lon coordinates within a BBOX (loosely) around Alaska. This BBOX covers a lot of both land & ocean, so it's a good way to test land- and ocean-based datasets at the same time.
The script runs silently until it encounters any endpoint where the HTTP status code differs between the local (development) API and our current production data, which may be worth investigating further. While running this script, I've found a few interesting discrepancies between local vs. production API:
nullify_and_prune
call or two, but I thinknull
s are also getting wrapped in quotation marks when they weren't before (i.e., before vs. after a recent Rasdaman upgrade) and aren't detected as proper nodata for this reason. Example: https://earthmaps.io/tas2km/point/57.59/-151.81 The local GeoTIFF check solves this in a different way, returning the HTTP 404 code/page as expected.But in general, any lat/lon endpoint that returns a HTTP 200 (data returned) on the production server, and returns a non-HTTP-200 code for the same endpoint on the development server, is worth paying attention to. If the production server response is full of nothing but nulls, this is an improvement. If the production server response is full of actual data, this is a problem I'll need to fix.