Transforms (or preprocessors) are callables that convert input data into a form that is ready to be consumed by deep learning models. In general, a transform can also have internal states, so that calling it with different data inputs will give consistently processed outputs.

In MONAI, the transform takes the following pattern:

class Transform:

  def __init__(self, system_params):
    # set states using system parameters
    self.some_states = ...  # from system_params
  def __call__(self, input_data, data_params):
    # using self.some_states and data_params
    #   process data and return output_data

A typical usage of Transform is:

transform = Transform(system_params)  # construct a transform instance
output_data = transform(input_data, data_params)  # apply the transform

The uses and developers will directly interact with these interfaces.

With the goal of readability and flexibility, the following sections discuss assumptions and designs of the interfaces.


System parameters are "static" information that is not data-dependent. They are known and fixed parameters before we have access to input_data.

The parameters are stored as instance variables (self.some_states) in each transform.

In a multi-processing context, the transform's instance variables are not shared among different workers. Once constructed, each worker process will operate on it's own states.

input_data, data_params

MONAI provides both

The rationale is described here. The main differences are in the assumptions on input_data and data_params:

vanilla transform

should work seamlessly with numpy ndarrays.

 def __call__(self, input_data, data_params):
   # process input_data
  • input_data: a multi-dimensional array,
  • data_params: as additional information from data, to be used when processing the input_data. The data_params are runtime parameters, any static parameters should go into system_params in the transform's constructor.

For example, a vanilla RandRotate90 transform works as the following:

img = np.array((1, 2, 3, 4)).reshape((1, 2, 2))
rotator = RandRotate90(prob=0.0, max_k=3, axes=(1, 2))
img_result = rotator(img)

# output:
<class 'numpy.ndarray'>
[[[1 2]
  [3 4]]]

The vanilla transforms are located in monai/transforms/ in the codebase.

dictionary-based transform

assumes input_data is a dictionary with ndarray, and data_params is also a part of the dictionary. The transform's call method therefore has the form:

def __call__(self, input_data):
  # process dict(input_data)

For example, a dictionary-based RandRotate90d transform works as the following:

data = {
    'img': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
    'seg': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
    'unused': 5,
rotator = RandRotate90d(keys=('img', 'seg'), prob=0.8)
data_result = rotator(data)

# output:
{'unused': 5, 'img': array([[[4, 3],
        [2, 1]]]), 'seg': array([[[4, 3],
        [2, 1]]])}

These transforms are adaptors on top of the vanilla transforms, to facilitate the compositions of multiple transforms:

composed = monai.transforms.Compose([Transform1d(system_params), 
output_data = composed(input_data)  # input_data is a dictionary

The dictionary-based transforms are located in monai/transforms/ in the codebase.

These transforms take [TransformClassName]d as the class name, indicating that it is a dictionary-based adaptor for the vanilla transform monai.transforms.transforms.TransformClassName.

shape convention

All transforms assume the input ndarrays has the shape: [num_channels, spatial_dims], where

  • spatial_dims may have
    • 0 element ([num_channels], e.g. classification labels),
    • 1 element ([num_channels, w], spatially 1D),
    • 2 elements ([num_channels, h, w], spatially 2D)
    • ...
    • N elements ([num_channels, d, h, w, ...], spatially ND).
  • num_channels must be greater or equal than 1 (input data with shape [spatial_dims] has to be reshaped into [1, spatial_dims] beforehand).
  • each transform may or may not support all spatially ND inputs.
  • the returned ndarrays from a transform should take the same shape convention.

randomized transforms

MONAI provides a randomizable interface so that each transform can generate processed data subject to some random factors (often used in training data augmentation).

The interface has:

  • an R variable to store the random number generator container np.random.RandomState. All derived classes should use self.R instead of np.random to generate random factors, E.g., np.random.rand() should be replaced by self.R.rand().
  • a randomize() method, where all self.R related random factors are generated.
  • a set_random_state method, to set the random number generator container's state.

The interface is located at monai/transforms/ in the codebase.

These transforms take Rand[TransformClassName][d] as the class name, indicating that it is a randomized transform for monai.transforms.transforms.TransformClassName[d].

universal adaptors

(WIP) generic adaptors that harmonize various interfaces across:

  • vanilla transform
  • dictionary-based transform
  • data transform in other packages (such as torchvision)
