Poor performance of iris with latest dask #4736

jamesp · 2022-05-09T15:00:28Z

🐛 Bug Report

Reading netcdf and pp files in iris with the latest version of dask is slower and takes a lot more memory.

I'm not familiar enough with the mechanics of iris loading to understand whether this is a dask issue or iris issue. I suspect it may be dask, if so I'd appreciate any help in pinpointing the specific thing I need to raise a bug report with dask about.

The attached script creates a 4D cube full of random numbers and saves to pp and netcdf. It then loads the file and performs the following operations:

load&extract: constrain to single time and lat when loading, then take lon mean:

cube = iris.load_cube(file, t_const & lat_const)
coll = cube.collapsed('longitude', iris.analysis.MEAN)
coll.data  # realise the data

load,extract: take lon mean, then constrain single time and lat:

cube = iris.load_cube(file)
coll = cube.collapsed('longitude', iris.analysis.MEAN)
ext = cube.extract(t_const & lat_const)
ext.data  # realise the data

Both should end up with identical results. Here is the time taken and peak memory usage for iris 3.2 on my machine

netcdf file:

dask version	2021.6	2022.2
load&extract	0.15s, 1MB	0.36s, 204MB
load,extract	0.3s, 204MB	0.34s, 204MB

More than 2x slower
load&extract used to be lazy, now data is realised

pp file:

dask version	2021.6	2022.2
load&extract	5.18s, 13MB	5.76s, 5MB
load,extract	1.81s, 14MB	2.16s, 7MB

both operations slower
why is load then extract faster than load and extract, despite doing far more work? it is taking the mean at all time and lat points but still more than 2x faster. This is not just for the latest dask version, this applies to all versions

How To Reproduce

Steps to reproduce the behaviour:

Download the file
mamba create -y -c conda-forge -n iris-3.2-dask-2021.6 iris=3.2 dask=2021.6.0
run script in environment.
mamba create -y -c conda-forge -n iris-3.2-dask-2022.2 iris=3.2 dask=2022.2.0
run script in environment.

Expected behaviour

Performance should be better in later versions of iris and dask.

Environment

OS & Version: linux, conda environment
Iris Version: 3.2

Analysis script

Expand for full script to cut-paste

from contextlib import contextmanager
import sys
import tracemalloc
import time

import cf_units
import iris
import iris.cube
import numpy as np

def log(msg):
    print(msg, file=sys.stderr)

class PerfMon:
    """A Simple time and memory profiler."""
    def __init__(self):
        self.runs = []
        
    @contextmanager
    def __call__(self, tag: str):
        tracemalloc.start()
        start = time.perf_counter()
        yield
        stop = time.perf_counter()
        _, peak_mem = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        data = (stop - start, peak_mem / 1024**2)
        self.runs.append((tag, data))
    
    def results(self):
        print("{:<20s} {:>12s} {:>12s}".format("tag", "time (s)", "mem (mb)"))
        for tag, data in self.runs:
            print("{:20s} {:12.2f} {:12.0f}".format(tag, *data))



nt = 20
nz = 19
ny = 325
nx = 432


log('. generating data')
c = iris.cube.Cube(
    np.random.normal(size=(nt,nz,ny,nx)), 
    dim_coords_and_dims=[
        (iris.coords.DimCoord(np.arange(nt), 'time', units=cf_units.Unit('days since 1970-01-01 00:00:00', calendar='gregorian')), 0),
        (iris.coords.DimCoord(np.arange(nz), 'height'), 1),
        (iris.coords.DimCoord(np.linspace(-90, 90, ny, endpoint=True), 'latitude'), 2),
        (iris.coords.DimCoord(np.linspace(0., 360, nx, endpoint=False), 'longitude'), 3)
    ],
    standard_name='atmosphere_relative_vorticity'
)

t_point = c.coord('time').cell(4)
t_const = iris.Constraint(time=lambda x: x == t_point)

lat_point = c.coord('latitude').cell(45)
lat_const = iris.Constraint(latitude=lambda x: x == lat_point)


log('. saving test files')
iris.save(c, 'test.pp')
iris.save(c, 'test.nc')

# do some very simple performance tests:
# 1. load the file
# 2. calculate the zonal mean
# 3. realise the data
#
# compare:
#  - pp vs netcdf performance
#  - extracting (t, lat) at load or after collapse 

perf = PerfMon()

log('. perf test 1')
with perf('pp load&extract'):
    cube = iris.load_cube('test.pp', t_const & lat_const)
    coll = cube.collapsed('longitude', iris.analysis.MEAN)
    res = coll
    res.data

log('. perf test 2')
with perf('pp load,extract'):
    cube = iris.load_cube('test.pp')
    coll = cube.collapsed('longitude', iris.analysis.MEAN)
    res = coll.extract(t_const & lat_const)
    res.data
    
log('. perf test 3')
with perf('nc load&extract'):
    cube = iris.load_cube('test.nc', t_const & lat_const)
    coll = cube.collapsed('longitude', iris.analysis.MEAN)
    res = coll
    res.data

log('. perf test 4')
with perf('nc load,extract'):
    cube = iris.load_cube('test.nc')
    coll = cube.collapsed('longitude', iris.analysis.MEAN)
    res = coll.extract(t_const & lat_const)
    res.data
    
perf.results()

The text was updated successfully, but these errors were encountered:

trexfeathers · 2022-05-09T16:19:54Z

why is load then extract faster than load and extract, despite doing far more work?

@bjlittle explained to me last week that constraints during FF/PP load are applied once to each field, so many very small chunks of memory - suboptimal. By contrast, using extract allows Dask to load optimal sized memory chunks - many fields at once - and apply the constraint once per chunk.

trexfeathers · 2022-05-09T16:27:28Z

As for Dask: for me this is just more support for #4572 and similar. Dask's default chunk handling frequently changes, AFAICT based on the assumption that users can flexibly adapt the chunking in their scripts. As we know Iris currently buries the chunking control inside the loading automation, risking each Dask improvement becoming a 'breaking performance change'.

rcomer · 2022-05-21T16:42:49Z

That PP load&extract gets a lot quicker if you simplify the latitude constraint

lat_point = c.coord('latitude').cell(45)
lat_const = iris.Constraint(latitude=lat_point.point)

Possibly because of

iris/lib/iris/_constraints.py

Lines 308 to 315 in caae450

    
               try_quick = isinstance(coord, DimCoord) and not isinstance( 
        
                   self._coord_thing, Cell 
        
               ) 
        
           # Simple, yet dramatic, optimisation for the monotonic case. 
        
           if try_quick: 
        
               try: 
        
                   i = coord.nearest_neighbour_index(self._coord_thing)

wjbenfold · 2022-05-23T16:08:10Z

load&extract used to be lazy, now data is realised

@jamesp where are you seeing this? When I run your code then ask if the cube is lazy it claims to be

jamesp · 2022-05-23T16:51:18Z

I'm seeing it in the memory footprint that is recorded by tracemalloc shown in the table above. The numbers above suggest that a lot more of cube is being realised at some point, or my crude method of measuring memory performance is wrong.

From our recent dash on MeshCoords (#4749) I understand a bit more about how the real/lazy mechanics of iris works: it seems very possible that I could realise some/all of the data in a cube and yet it would still claim to be lazy.

wjbenfold · 2022-05-24T16:43:20Z

PP load with various environments, using the user-provided file that prompted @jamesp's concern:

tag                      time (s)     mem (mb)
pp load&extract user        44.44           15
pp load,extract user        16.90           56
(iris-3.2-dask-2021.6)

tag                      time (s)     mem (mb)
pp load&extract user        46.81            8
pp load,extract user        17.45           61
(iris-3.2-dask-2022.2)

tag                      time (s)     mem (mb)
pp load&extract user        26.55            8
pp load,extract user        16.84           71
(default-current)

tag                      time (s)     mem (mb)
pp load&extract user        30.28           14
pp load,extract user        18.32          143
(production-os45-1)

We also worked out earlier that the netCDF file isn't realising everything, it's just pulling the entire chunk into memory (i.e. with a 2GB file it still pulls in about 200MB but doesn't pull the full 2GB)

trexfeathers · 2022-05-25T08:39:50Z

the netCDF file isn't realising everything, it's just pulling the entire chunk into memory

Oh that's fine then, it's only the worst of both worlds 😆

wjbenfold · 2022-05-25T13:39:31Z

Oh that's fine then, it's only the worst of both worlds 😆

I don't know, I thought it would have to pull things in at at least the chunk level to read them, so it seems kind of alright? (at least it doesn't pull 2GB into memory - I reckon that's a win! :p)

trexfeathers · 2022-05-25T13:47:24Z

Oh you're right I misinterpreted this - I'm too ready to hear "poor chunking means the while file is loaded".

pp-mo · 2022-05-25T13:51:07Z

the netCDF file isn't realising everything, it's just pulling the entire chunk into memory

Oh that's fine then, it's only the worst of both worlds 😆

I don't know, I thought it would have to pull things in at at least the chunk level to read them, so it seems kind of alright? (at least it doesn't pull 2GB into memory - I reckon that's a win! :p)

Well yes, of course it must load "some few" chunks at some time.
The chunksize default is now configurable via Dask, where it defaults to (I believe) about 100Mb :
code here gets the target size from dask
default:

>>> import dask.config
>>> import dask.array
>>> dask.config.get("array.chunk-size")
'128MiB'
>>>

In practice, I've seen that you can expect about 3* this.
But I suspect it may not be deterministic how soon it releases memory afterward : it may be ~like the Python garbage collector.

github-actions · 2023-10-08T00:14:20Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

jamesp added Type: Bug New: Issue labels May 9, 2022

trexfeathers removed the New: Issue label May 18, 2022

trexfeathers added this to 🚴 Peloton Jun 23, 2023

github-actions bot added the Stale A stale issue/pull-request label Oct 8, 2023

bjlittle removed the Stale A stale issue/pull-request label Oct 18, 2023

scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023

scitools-ci bot added this to 🚴 Peloton Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance of iris with latest dask #4736

Poor performance of iris with latest dask #4736

jamesp commented May 9, 2022

trexfeathers commented May 9, 2022 •

edited

Loading

trexfeathers commented May 9, 2022

rcomer commented May 21, 2022

wjbenfold commented May 23, 2022

jamesp commented May 23, 2022

wjbenfold commented May 24, 2022

trexfeathers commented May 25, 2022

wjbenfold commented May 25, 2022

trexfeathers commented May 25, 2022

pp-mo commented May 25, 2022 •

edited

Loading

github-actions bot commented Oct 8, 2023

Poor performance of iris with latest dask #4736

Poor performance of iris with latest dask #4736

Comments

jamesp commented May 9, 2022

🐛 Bug Report

How To Reproduce

Expected behaviour

Environment

Analysis script

trexfeathers commented May 9, 2022 • edited Loading

trexfeathers commented May 9, 2022

rcomer commented May 21, 2022

wjbenfold commented May 23, 2022

jamesp commented May 23, 2022

wjbenfold commented May 24, 2022

trexfeathers commented May 25, 2022

wjbenfold commented May 25, 2022

trexfeathers commented May 25, 2022

pp-mo commented May 25, 2022 • edited Loading

github-actions bot commented Oct 8, 2023

trexfeathers commented May 9, 2022 •

edited

Loading

pp-mo commented May 25, 2022 •

edited

Loading