Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred execution #1

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

lopezvoliver
Copy link

Hi, I worked on a major, but purely technical, update to the geeSEBAL python API. That is, the revision to the code changes only the way in which geeSEBAL communicates with the earthengine API, and nothing was changed about the SEBAL algorithm. The goal was to defer the GEE processing as much as possible (e.g. get rid of .getInfo()). The improvement in runtime highly outweighs some minor breaks in compatibility compared to the current version.

Here's the breakdown of the changes:

tools.fexp_sensible_heat_flux:

The iterative process was updated so that it leverages ee.ImageCollection.iterate. By doing this, and removing any .getInfo() calls, this function can be fully asynchronous (defers the execution until requested).

Additionally, a max_iterations (defaults to 15) parameter was added.

image.py

Defined a new function sebal that constitutes the SEBAL algorithm to be applied over one ee.Image, assuming all the necessary inputs are included as bands within the image.

Image class

Revised the code so that it builds the ee.Image with all the Landsat inputs (including T_RAD) and then calls the sebal function.

Note that because I also removed the calls to .getInfo(), most items in the Image object are now deferred and thus return ee Objects. For example, Image.landsat_version now returns a ee.String. The user may use .getInfo() and get the result when needed. This is a break in compatibility that is justified, as the improvement in runtime far outweighs this disadvantage.

Runtime improvement for Image

Here's the comparison using timeit on a simple instance of Image:

%timeit foo=Image("LANDSAT/LC08/C01/T1_SR/LC08_221071_20190714")

serveronly branch:

25.2 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

master branch:

12.7 s ± 1.97 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

i.e., 500 times faster.

When used in a simple script that starts an Export task for a single ee.Image, the improvement is not as impressive (11s v 20s), because of the overhead of initializing the ee library, and creating the Export task. However, as you will see below, this will be much more important for Collection and TimeSeries.

Comparison

The LANDSAT/LC08/C01/T1_SR/LC08_221071_20190714 image was exported using the master and serveronly branches (only the R, GR, B, NDVI, and ET_24h bands were exported). They are publicly available here:

This gee code snapshot was prepared to compare the results, which are identical.

landsatcollection.py

Additions

set_landsat_index: This simple function is necessary to keep the original index from a Landsat image, when collections are merged or joined.

fexp_trad_8, fexp_trad_7, and fexp_trad_5: These new functions return a ee.ImageCollection where each image has the corresponding T_RAD band.

fexp_collection_filter: This new function handles filtering a given ee.ImageCollection by a user-defined cloud cover threshold, start and end dates, and optionally filtered by path, row, and a ee.Geometry (E.g. a coordinate).

Changes

All fexp_landsat_NPathRow and fexp_landsat_NCoordinate (where N is one of {5,7,8}) were replaced by a single fexp_landsat_N function. These functions return the corresponding C01/T1_SR collection filtered using fexp_collection_filter, and with the bands renamed as it was done in the original code.

Collection class

The init method for this class was modified to leverage the fexp_landsat_N and fexp_trad_N functions, which are then joined into a single collection using ee.Join.inner. Furthermore, the python for loop was replaced by ee.ImageCollection.map, making use of the image.sebal function.

For compatibility, the Collection_ET item is given as an ee.Image (the ET_24h collection is cast into ee.Image as bands).
As was the case with the Image class, it was inevitable to break compatibility with some items in the Collection object.

Runtime improvement for Collection

The following short test (3 images only) was used:

%timeit f=Collection(2019,7,1,2019,8,1,15,path=221,row=71)

serveronly branch:

87.4 ms ± 4.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

master branch:

32.1 s ± 5.02 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

That is, 367 times faster. However, the time to generate a longer collection barely increases for the serveronly branch:

%timeit f=Collection(2000,1,1,2010,5,6,15,path=221,row=71)
87.1 ms ± 527 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

while for the master branch it does. For comparison, three images (only the ET_24h band) were exported using the master and serveronly branches. They are available in these image collections:

TimeseriesAsync

The TimeSeries class was largely untouched (except minor adjustments to use the fexp_landsat_N functions. The reason for this was that I feel that the user would expect this function to simply get the time series, on-demand. However, a new TimeSeriesAsync class was defined.

This new class makes use of the Collection class at a given point, then selects the ET_24h band and performs reduceRegion on it. The et_collection item contains the result of this operation, while the Collection items contains the result of the call to the Collection class.

Additionally, three Lists were defined that somewhat mimics the behavior of the TimeSeries class. However, these are returned as ee.Lists, so the user has the option to use .getInfo() on them:

  • List_ET is the result of et_collection.aggregate_array("ET_24h")
  • List_Date is the result of et_collection.aggregate_array("date")
  • List_index is the result of et_collection.aggregate_array("LANDSAT_INDEX")

Finally, two methods were prepared to export the ET table (date, LANDSAT_INDEX, ET_24h columns) as a CSV file. This should be the recommended way to export the table.

  • toDrive
  • toCloudStorage

The following example demonstrates the use of TimeSeriesAsync:

import ee
from etbrasil.geesebal import TimeSeriesAsync
ee.Initialize()
point=ee.Geometry.Point([-47.4522, -16.240119])
geesebal_timeseries=TimeSeriesAsync(2000,1,1,2010,5,6,15,coordinate=point)
geesebal_timeseries.toDrive("sebal-time-series-async")  

This generates a sebal-time-series-async.csv file in Google Drive. The total process (python runtime + earthengine task) took about 2 minutes.

The following example generates the same csv file but synchronously (note the getInfo()s):

import pandas as pd
import ee
from etbrasil.geesebal import TimeSeriesAsync
ee.Initialize()
point=ee.Geometry.Point([-47.4522, -16.240119])
geesebal_timeseries=TimeSeriesAsync(2000,1,1,2010,5,6,15,coordinate=point)

et_list = geesebal_timeseries.List_ET.getInfo()       
date_list = geesebal_timeseries.List_Date.getInfo()  
landsat_index_list = geesebal_timeseries.List_index.getInfo() 

pd.DataFrame({
    "date": date_list,
    "LANDSAT_INDEX": landsat_index_list,
    "ET_24h": et_list
}).to_csv("sebal-time-series-sync.csv", index=False)

This example took about 3 minutes to run, and the resulting csv file was identical to the previous one.

However, the preferred method should be the asynchronous one, especially for long collections, as described here ("Too many concurrent aggregations" error).

As was the case for Collection, the TimeSeriesAsync runtime is fast and does not depend on the image collection size. Here is my result using timeit:

90.3 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Meanwhile, the current version of geesebal took about 25 minutes ⚠️ on this rather short test (19 images):

from etbrasil.geesebal import TimeSeries
point=ee.Geometry.Point([-50.161317, -9.824870])
geeSEBAL_Collection=TimeSeries(2019,1,1,2019,12,31,15,point)

That is all for now, I hope I haven't missed anything to describe from my changes, and that my explanations were clear.

Cheers,

Oliver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant