You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#213 added the Backend class CachingBackend and adjusted JDBCBackend to use it.
Caching is always on in JDBCBackend.
When an item is loaded with filters, e.g. scen.par('demand', filters=dict{'year': [1234]}), then:
a. only the filtered rows are retrieved from Java/ixmp_source.
b. The pd.DataFrame containing these rows is cached.
c. The cache is used for future scen.par() calls with identical filters.
When an item is loaded with different filters, a different pd.DataFrame is cached.
a. For instance, scen.par('demand', filters=dict{'year': [1234, 4321]}) results in an distinct value in cache.
When an item is loaded without filters, the entire item is cached.
If an entire item is cached, subsequent filtered requests (like 2. or 3.) are met by:
a. Taking the cached pd.DataFrame for the entire item.
b. Filtering it in Python, and
c. Returning it.
…i.e. without database access.
A different caching behaviour we could call greedy- or prefetch caching. In this case, to meet 2. or 3., the entire contents (all rows) are first cached (as in 4.); and then filtered on the Python side (as in 5.a–c).
This issue is to discuss:
What should the default behaviour be? i.e. prefetch on, or off?
How should the behaviour be controlled?
E.g. a prefetch=True keyword argument to Platform, passed to JDBCBackend.
or, a method or attribute on JDBCBackend to change the behaviour.
Consider different use cases:
Alice is working with a small set of changes to a MESSAGE parameter like land_out that is very large. If prefetch is on, her code loads the entire item, making it slower than it could be.
Bob is working with an extensive script that performs adjustments to the entire parameter; his code works faster if the whole item is prefetched and then repeatedly filtered in Python.
The text was updated successfully, but these errors were encountered:
#213 added the Backend class CachingBackend and adjusted JDBCBackend to use it.
scen.par('demand', filters=dict{'year': [1234]})
, then:a. only the filtered rows are retrieved from Java/ixmp_source.
b. The pd.DataFrame containing these rows is cached.
c. The cache is used for future
scen.par()
calls with identical filters.a. For instance,
scen.par('demand', filters=dict{'year': [1234, 4321]})
results in an distinct value in cache.a. Taking the cached pd.DataFrame for the entire item.
b. Filtering it in Python, and
c. Returning it.
…i.e. without database access.
A different caching behaviour we could call greedy- or prefetch caching. In this case, to meet 2. or 3., the entire contents (all rows) are first cached (as in 4.); and then filtered on the Python side (as in 5.a–c).
This issue is to discuss:
prefetch=True
keyword argument to Platform, passed to JDBCBackend.Consider different use cases:
land_out
that is very large. If prefetch is on, her code loads the entire item, making it slower than it could be.The text was updated successfully, but these errors were encountered: