-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferred execution #1
Open
lopezvoliver
wants to merge
19
commits into
gee-hydro:master
Choose a base branch
from
lopezvoliver:serveronly
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This makes it possible to install geesebal using pip: "git+https://github.com/lopezvoliver/geeSEBAL@serveronly#subdirectory=etbrasil"
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I worked on a major, but purely technical, update to the geeSEBAL python API. That is, the revision to the code changes only the way in which geeSEBAL communicates with the earthengine API, and nothing was changed about the SEBAL algorithm. The goal was to defer the GEE processing as much as possible (e.g. get rid of
.getInfo()
). The improvement in runtime highly outweighs some minor breaks in compatibility compared to the current version.Here's the breakdown of the changes:
tools.fexp_sensible_heat_flux
:The iterative process was updated so that it leverages ee.ImageCollection.iterate. By doing this, and removing any
.getInfo()
calls, this function can be fully asynchronous (defers the execution until requested).Additionally, a
max_iterations
(defaults to 15) parameter was added.image.py
Defined a new function
sebal
that constitutes the SEBAL algorithm to be applied over oneee.Image
, assuming all the necessary inputs are included as bands within the image.Image
classRevised the code so that it builds the
ee.Image
with all the Landsat inputs (includingT_RAD
) and then calls thesebal
function.Note that because I also removed the calls to
.getInfo()
, most items in the Image object are now deferred and thus return ee Objects. For example,Image.landsat_version
now returns aee.String
. The user may use.getInfo()
and get the result when needed. This is a break in compatibility that is justified, as the improvement in runtime far outweighs this disadvantage.Runtime improvement for
Image
Here's the comparison using timeit on a simple instance of Image:
serveronly branch:
master branch:
i.e., 500 times faster.
When used in a simple script that starts an Export task for a single ee.Image, the improvement is not as impressive (11s v 20s), because of the overhead of initializing the ee library, and creating the Export task. However, as you will see below, this will be much more important for Collection and TimeSeries.
Comparison
The
LANDSAT/LC08/C01/T1_SR/LC08_221071_20190714
image was exported using the master and serveronly branches (only the R, GR, B, NDVI, and ET_24h bands were exported). They are publicly available here:This gee code snapshot was prepared to compare the results, which are identical.
landsatcollection.py
Additions
set_landsat_index
: This simple function is necessary to keep the original index from a Landsat image, when collections are merged or joined.fexp_trad_8
,fexp_trad_7
, andfexp_trad_5
: These new functions return aee.ImageCollection
where each image has the correspondingT_RAD
band.fexp_collection_filter
: This new function handles filtering a given ee.ImageCollection by a user-defined cloud cover threshold, start and end dates, and optionally filtered by path, row, and aee.Geometry
(E.g. a coordinate).Changes
All
fexp_landsat_NPathRow
andfexp_landsat_NCoordinate
(where N is one of {5,7,8}) were replaced by a singlefexp_landsat_N
function. These functions return the correspondingC01/T1_SR
collection filtered usingfexp_collection_filter
, and with the bands renamed as it was done in the original code.Collection
classThe init method for this class was modified to leverage the
fexp_landsat_N
andfexp_trad_N
functions, which are then joined into a single collection using ee.Join.inner. Furthermore, the python for loop was replaced by ee.ImageCollection.map, making use of theimage.sebal
function.For compatibility, the
Collection_ET
item is given as an ee.Image (theET_24h
collection is cast into ee.Image as bands).As was the case with the
Image
class, it was inevitable to break compatibility with some items in theCollection
object.Runtime improvement for
Collection
The following short test (3 images only) was used:
serveronly branch:
master branch:
That is, 367 times faster. However, the time to generate a longer collection barely increases for the
serveronly
branch:while for the master branch it does. For comparison, three images (only the ET_24h band) were exported using the master and serveronly branches. They are available in these image collections:
TimeseriesAsync
The
TimeSeries
class was largely untouched (except minor adjustments to use thefexp_landsat_N
functions. The reason for this was that I feel that the user would expect this function to simply get the time series, on-demand. However, a newTimeSeriesAsync
class was defined.This new class makes use of the
Collection
class at a given point, then selects theET_24h
band and performsreduceRegion
on it. Theet_collection
item contains the result of this operation, while theCollection
items contains the result of the call to theCollection
class.Additionally, three Lists were defined that somewhat mimics the behavior of the
TimeSeries
class. However, these are returned asee.Lists
, so the user has the option to use.getInfo()
on them:List_ET
is the result ofet_collection.aggregate_array("ET_24h")
List_Date
is the result ofet_collection.aggregate_array("date")
List_index
is the result ofet_collection.aggregate_array("LANDSAT_INDEX")
Finally, two methods were prepared to export the ET table (date, LANDSAT_INDEX, ET_24h columns) as a CSV file. This should be the recommended way to export the table.
toDrive
toCloudStorage
The following example demonstrates the use of
TimeSeriesAsync
:This generates a
sebal-time-series-async.csv
file in Google Drive. The total process (python runtime + earthengine task) took about 2 minutes.The following example generates the same csv file but synchronously (note the
getInfo()
s):This example took about 3 minutes to run, and the resulting csv file was identical to the previous one.
However, the preferred method should be the asynchronous one, especially for long collections, as described here ("Too many concurrent aggregations" error).
As was the case for
Collection
, theTimeSeriesAsync
runtime is fast and does not depend on the image collection size. Here is my result usingtimeit
:Meanwhile, the current version of geesebal took about 25 minutes⚠️ on this rather short test (19 images):
That is all for now, I hope I haven't missed anything to describe from my changes, and that my explanations were clear.
Cheers,
Oliver.