Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

cstephen · 2025-03-10T23:03:39Z

Closes #532.

This PR adds a GeoTIFF mask file for nearly every Rasdaman coverage used by the Data API. The API uses these files to do a local lat/lon check against the GeoTIFF to verify that data will be available before sending a data fetch to Rasdaman. This has a couple big benefits:

Lat/lon coordinates within a coverage's BBOX, but with no data, now tend to return in a fraction of a second (instead of up to 60+ seconds in some cases). See benchmarks here.
Filters out wasteful nodata requests before sending them to Rasdaman, putting less load on Rasdaman

This PR only checks GeoTIFFs for single lat/lon points, not areas, because deciding what to do about an area with partial nodata is not obvious, and also performing pixel intersection operations with a local GeoTIFF could add (I think) a non-trivial amount of extra work to each area request. But we can revisit this later if needed.

I've omitted the ALFRESCO endpoints from the local GeoTIFF check for the reason mentioned above. Since all ALFRESCO point queries get turned into HUC-12 area queries, it's not clear how to handle this or whether it's worth it.

I've also omitted local GeoTIFF checks for the CMIP6 monthly and CMIP6 indicators coverages since Rasdaman deals with the antimeridian nodata "ray of hope" issues better than a local GeoTIFF check, often returning valid data, whereas a local GeoTIFF check returns nodata. Plus, the CMIP6 data footprint is basically just a rectangular BBOX that is better handled via BBOX lat/lon validation.

I've also had to tweak every impacted endpoint to recognize the new 404 error code that will result from finding nodata in a local GeoTIFF.

Testing this PR is tricky because there are so many endpoints to test, and deciding which lat/lon coordinate to check depends largely on the endpoint/coverage being tested. To make this easier, I've whipped up the following throwaway Python script that tests all affected endpoints with random lat/lon coordinates within a BBOX (loosely) around Alaska. This BBOX covers a lot of both land & ocean, so it's a good way to test land- and ocean-based datasets at the same time.

#!/usr/bin/env python
import requests
import time
import numpy as np
import sys

domain = "http://127.0.0.1:5000"
prod_domain = "https://earthmaps.io"
requests_per_endpoint = int(sys.argv[1])
bbox = [-170, 52, -134, 72]

endpoints = [
    "/beetles/point",
    "/degree_days/heating",
    "/degree_days/below_zero",
    "/degree_days/freezing_index",
    "/degree_days/thawing_index",
    "/hydrology/point",
    "/landfastice/point",
    "/indicators/base/point",
    "/permafrost/point/gipl",
    "/precipitation",
    "/precipitation/frequency/point",
    "/seaice/point",
    "/snow/snowfallequivalent",
    "/tas2km/point",
    "/temperature/jan",
    "/temperature/july",
    "/temperature",
    "/taspr/point",
    "/wet_days_per_year/hp/point",
]


for endpoint in endpoints:
    for i in range(requests_per_endpoint):
        lat = round(np.random.uniform(bbox[1], bbox[3]), 2)
        lon = round(np.random.uniform(bbox[0], bbox[2]), 2)
        latlon_subpath = f"/{lat}/{lon}"

        dev_url = domain + endpoint + latlon_subpath
        start_time = time.time()
        try:
            response = requests.get(dev_url)
            dev_status = response.status_code
        except:
            dev_status = "Failure"
        end_time = time.time()
        dev_time = round(end_time - start_time, 2)

        prod_url = prod_domain + endpoint + latlon_subpath
        start_time = time.time()
        try:
            response = requests.get(prod_url)
            prod_status = response.status_code
        except:
            prod_status = "Failure"
        end_time = time.time()
        prod_time = round(end_time - start_time, 2)

        if dev_status != prod_status:
            print(f"{endpoint}{latlon_subpath}")
            print(f"dev: HTTP {dev_status}, {dev_time} seconds")
            print(f"prod: HTTP {prod_status}, {prod_time} seconds")
            print("----------")

        time.sleep(3)

Usage: python benchmark.py <number of requests per endpoint>
Example: python benchmark.py 5

The script runs silently until it encounters any endpoint where the HTTP status code differs between the local (development) API and our current production data, which may be worth investigating further. While running this script, I've found a few interesting discrepancies between local vs. production API:

The production API will sometimes return a large JSON response full of nulls (nodata). This might be due to missing a nullify_and_prune call or two, but I think nulls are also getting wrapped in quotation marks when they weren't before (i.e., before vs. after a recent Rasdaman upgrade) and aren't detected as proper nodata for this reason. Example: https://earthmaps.io/tas2km/point/57.59/-151.81 The local GeoTIFF check solves this in a different way, returning the HTTP 404 code/page as expected.
Sometimes a production endpoint will return an HTTP 503 error if the WCS Rasdaman fetch times out, if nodata responses aren't properly handled in the API code, etc. The local GeoTIFF check improves some of these cases by returning an HTTP 404 code/page for nodata requests.

But in general, any lat/lon endpoint that returns a HTTP 200 (data returned) on the production server, and returns a non-HTTP-200 code for the same endpoint on the development server, is worth paying attention to. If the production server response is full of nothing but nulls, this is an improvement. If the production server response is full of actual data, this is a problem I'll need to fix.

…al endpoints.

…man endpoints.

cstephen added 5 commits February 24, 2025 15:24

Perform lat/lon data check against local reference GeoTIFFs for sever…

dc73c7c

…al endpoints.

Convert GeoTIFFs to masks and add GeoTIFF lat/lon check to more Rasda…

28be5ac

…man endpoints.

Resolve merge conflict with main branch.

51ac185

Add error handling and comments to GeoTIFF check function.

6dfbc0c

Remove CMIP6 reference GeoTIFFs due to antimeridian issues.

7d54ce1

cstephen requested review from charparr and Joshdpaul March 10, 2025 23:03

Remove duplicate nodata mappings from luts.py.

79ebb14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

cstephen commented Mar 10, 2025 •

edited

Loading

Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

Are you sure you want to change the base?

Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

Conversation

cstephen commented Mar 10, 2025 • edited Loading

cstephen commented Mar 10, 2025 •

edited

Loading