Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use percentiles in /rgb bands stretch range #320

Closed
atanas-balevsky opened this issue Oct 5, 2023 · 12 comments
Closed

Use percentiles in /rgb bands stretch range #320

atanas-balevsky opened this issue Oct 5, 2023 · 12 comments

Comments

@atanas-balevsky
Copy link
Contributor

Dear terracotta team,
Many thanks for the great product!

I've a small request: For the /rgb endpoint's range stretching, instead of using absolute values in the *_range parameters, I'd like to use the band's percentiles.

Right now i'm fitting this by calling the /metadata endpoint per band and fetching the percentiles, which later i add as the *_range parameters.

Instead of making 3 http calls to generate an url, would it be possible to add *_percentiles parameters to the /rgb endpoint and calculate the range values internally?

p.s. would love to submit a PR for this, if you welcome it

thanks in advance

@dionhaefner
Copy link
Collaborator

Right now i'm fitting this by calling the /metadata endpoint per band and fetching the percentiles, which later i add as the *_range parameters.

This is the correct solution.

I can see how firing 3 requests for this may seem wasteful, but it's important to keep in mind that serving up tiles fires hundreds of requests within seconds. I'd rather not complicate the API by introducing additional parameters.

I'm going to close this for now, I might be swayed in the future by additional evidence that this is a major pain point for people right now.

@dionhaefner dionhaefner closed this as not planned Won't fix, can't repro, duplicate, stale Oct 5, 2023
@j08lue
Copy link
Collaborator

j08lue commented Oct 5, 2023

This is the correct solution.

Agree - you will probably want all rgb requests to use the same percentiles, too. As XYZ/WMTS works, tile requests are independent from each other and the API is stateless. Also, each request ideally only touches a small portion of the data, for efficiency (e.g. with cloud-optimized GeoTIFF). With this setup, the common information (percentiles) has to come from the client that makes all the requests. Would be different for WMS, where the whole domain is rendered in one call. EDIT

@atanas-balevsky
Copy link
Contributor Author

Dear @dionhaefner and @j08lue,
I can see where you stand with the rgb params. The thing is that i'm working with a larger list of images, where i've to prepare all the terracotta urls in advance, then if the user wants he can preview the assets (1 or another or none).

Calling 3x metadata per asset becomes quite heavy on user path that i'd rather not spend the extra time. Using this approach i've to flood the metadata endpoint excessively, without an actual need for it.

During fetching the tiles i can tolerate some delay, but generating these urls i'd prefer to be without 3rd party calls.

Do you think it's a feature creep to extend the band/range functionality in this or any other ways?

thanks in advance

@j08lue
Copy link
Collaborator

j08lue commented Oct 5, 2023

We did just add functionality to query metadata for many datasets in bulk. Would that help?

@atanas-balevsky
Copy link
Contributor Author

Dear @j08lue
Thanks for proposing this approach. as i understand it can simplify the solution network wise. i don't see how it would help me to creep out of the current 'prepare the url' situation. Will work with what i got.

thanks

@mrpgraae
Copy link
Collaborator

mrpgraae commented Oct 6, 2023

This is the correct solution.

Agree - you will probably want all rgb requests to use the same percentiles, too. As XYZ/WMTS works, tile requests are independent from each other and the API is stateless. Also, each request ideally only touches a small portion of the data, for efficiency (e.g. with cloud-optimized GeoTIFF). With this setup, the common information (percentiles) has to come from the client that makes all the requests. Would be different for WMS, where the whole domain is rendered in one call.

@j08lue I'm not sure I understand what you mean here? Terracotta has the percentiles pre-computed already in the metadata table, so the information is already coming from Terracotta, not the client. It's just that now the information has to be extracted from terracotta with 1 http call to /metadata, then passed back to terracotta in another http call to /rgb, which seems silly when the information is already there.

It seems that you assume the percentiles would be computed dynamically, for just 1 tile? That's not the case. The dataset-wide percentiles are already computed and available in the database.

@mrpgraae
Copy link
Collaborator

mrpgraae commented Oct 6, 2023

Something like a 2%/98% stretch seems to be an obvious use-case for the metadata, so to me, it seems that it would be a simplification of the API to allow it to use this information rather than forcing the user to do several API calls.

Terracotta is built around a database, a design choice with some downsides, but it also theoretically enables Terracotta to do a lot of cool stuff because it can actually know things about the datasets besides what is available in the file format. Taking advantage of the database to do things like this could be a way to set Terracotta apart from other tile servers like TiTiler.

@j08lue
Copy link
Collaborator

j08lue commented Oct 6, 2023

Ah, you are right, @mrpgraae, we already store the global stats in the db. 🤦 Sorry @atanas-balevsky I got that wrong.

That makes percentile-based stretch ranges a lot more interesting as a feature, of course. You could think of pretty easy ways to encode that in the URL parameters, like

r_range=[p25,p75]

and maybe

r_range=[-s1,s1]

for standard deviation.

We could still keep it simple - no need to go custom script crazy like Sentinel Hub. 😉

But it would introduce a new internal dependency on the database and its contents, I guess.

@mrpgraae
Copy link
Collaborator

mrpgraae commented Oct 6, 2023

@j08lue Race-condition 😄 you posted same time as me, but

That would introduce a new internal dependency on the database and its contents, I guess.

seems to go well with my last paragraph.

@atanas-balevsky
Copy link
Contributor Author

r_range=[p25,p75]

Frankly that's all i'm hoping for :)

sorry for not being clear enough that fetching the /metadata would be done only to resend the percentile parameters back to /rgb's ?_range parameters

@dionhaefner
Copy link
Collaborator

Is this:

GET /datasets
# build urls
GET /rgb?r_range=[p25,p75]

really so much simpler than this:

GET /datasets
POST /metadata
# build urls
GET /rgb?r_range=[0.572,1.948]

that it requires dedicated syntax?

@atanas-balevsky
Copy link
Contributor Author

Dear @dionhaefner
currently our system would know which are the datasets without the roundtrip to terracotta, so to generate the terracotta /rgb it would be perfect if we don't depend on external calls. in this exact use case, the pattern would be:

1) build the /rgb urls internally
2) ... later, and optionally, only on some of the generated urls:
GET /rgb?r_range=[p25,p75]

now is:

3x POST /metadata internally for each dataset
# build urls (only get the percentile values for each band)
GET /rgb?r_range=[0.572,1.948]...

i hope you can see how this can make the /rgb endpoint more convenient to use.

p.s. ll repeat if it got lost in the convo: i'd would propose a PR if you'd welcome this effort

thanks
atanas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants