Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform local lat/lon check against GeoTIFF coverage masks to check for nodata before fetching from Rasdaman #553

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cstephen
Copy link
Contributor

@cstephen cstephen commented Mar 10, 2025

Closes #532.

This PR adds a GeoTIFF mask file for nearly every Rasdaman coverage used by the Data API. The API uses these files to do a local lat/lon check against the GeoTIFF to verify that data will be available before sending a data fetch to Rasdaman. This has a couple big benefits:

  • Lat/lon coordinates within a coverage's BBOX, but with no data, now tend to return in a fraction of a second (instead of up to 60+ seconds in some cases). See benchmarks here.
  • Filters out wasteful nodata requests before sending them to Rasdaman, putting less load on Rasdaman

This PR only checks GeoTIFFs for single lat/lon points, not areas, because deciding what to do about an area with partial nodata is not obvious, and also performing pixel intersection operations with a local GeoTIFF could add (I think) a non-trivial amount of extra work to each area request. But we can revisit this later if needed.

I've omitted the ALFRESCO endpoints from the local GeoTIFF check for the reason mentioned above. Since all ALFRESCO point queries get turned into HUC-12 area queries, it's not clear how to handle this or whether it's worth it.

I've also omitted local GeoTIFF checks for the CMIP6 monthly and CMIP6 indicators coverages since Rasdaman deals with the antimeridian nodata "ray of hope" issues better than a local GeoTIFF check, often returning valid data, whereas a local GeoTIFF check returns nodata. Plus, the CMIP6 data footprint is basically just a rectangular BBOX that is better handled via BBOX lat/lon validation.

I've also had to tweak every impacted endpoint to recognize the new 404 error code that will result from finding nodata in a local GeoTIFF.

Testing this PR is tricky because there are so many endpoints to test, and deciding which lat/lon coordinate to check depends largely on the endpoint/coverage being tested. To make this easier, I've whipped up the following throwaway Python script that tests all affected endpoints with random lat/lon coordinates within a BBOX (loosely) around Alaska. This BBOX covers a lot of both land & ocean, so it's a good way to test land- and ocean-based datasets at the same time.

#!/usr/bin/env python
import requests
import time
import numpy as np
import sys

domain = "http://127.0.0.1:5000"
prod_domain = "https://earthmaps.io"
requests_per_endpoint = int(sys.argv[1])
bbox = [-170, 52, -134, 72]

endpoints = [
    "/beetles/point",
    "/degree_days/heating",
    "/degree_days/below_zero",
    "/degree_days/freezing_index",
    "/degree_days/thawing_index",
    "/hydrology/point",
    "/landfastice/point",
    "/indicators/base/point",
    "/permafrost/point/gipl",
    "/precipitation",
    "/precipitation/frequency/point",
    "/seaice/point",
    "/snow/snowfallequivalent",
    "/tas2km/point",
    "/temperature/jan",
    "/temperature/july",
    "/temperature",
    "/taspr/point",
    "/wet_days_per_year/hp/point",
]


for endpoint in endpoints:
    for i in range(requests_per_endpoint):
        lat = round(np.random.uniform(bbox[1], bbox[3]), 2)
        lon = round(np.random.uniform(bbox[0], bbox[2]), 2)
        latlon_subpath = f"/{lat}/{lon}"

        dev_url = domain + endpoint + latlon_subpath
        start_time = time.time()
        try:
            response = requests.get(dev_url)
            dev_status = response.status_code
        except:
            dev_status = "Failure"
        end_time = time.time()
        dev_time = round(end_time - start_time, 2)

        prod_url = prod_domain + endpoint + latlon_subpath
        start_time = time.time()
        try:
            response = requests.get(prod_url)
            prod_status = response.status_code
        except:
            prod_status = "Failure"
        end_time = time.time()
        prod_time = round(end_time - start_time, 2)

        if dev_status != prod_status:
            print(f"{endpoint}{latlon_subpath}")
            print(f"dev: HTTP {dev_status}, {dev_time} seconds")
            print(f"prod: HTTP {prod_status}, {prod_time} seconds")
            print("----------")

        time.sleep(3)
Usage: python benchmark.py <number of requests per endpoint>
Example: python benchmark.py 5

The script runs silently until it encounters any endpoint where the HTTP status code differs between the local (development) API and our current production data, which may be worth investigating further. While running this script, I've found a few interesting discrepancies between local vs. production API:

  • The production API will sometimes return a large JSON response full of nulls (nodata). This might be due to missing a nullify_and_prune call or two, but I think nulls are also getting wrapped in quotation marks when they weren't before (i.e., before vs. after a recent Rasdaman upgrade) and aren't detected as proper nodata for this reason. Example: https://earthmaps.io/tas2km/point/57.59/-151.81 The local GeoTIFF check solves this in a different way, returning the HTTP 404 code/page as expected.
  • Sometimes a production endpoint will return an HTTP 503 error if the WCS Rasdaman fetch times out, if nodata responses aren't properly handled in the API code, etc. The local GeoTIFF check improves some of these cases by returning an HTTP 404 code/page for nodata requests.

But in general, any lat/lon endpoint that returns a HTTP 200 (data returned) on the production server, and returns a non-HTTP-200 code for the same endpoint on the development server, is worth paying attention to. If the production server response is full of nothing but nulls, this is an improvement. If the production server response is full of actual data, this is a problem I'll need to fix.

@cstephen cstephen requested review from charparr and Joshdpaul March 10, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant