Tensor size mismatch issue #309

berkut0 · 2024-05-03T13:00:43Z

When running on a database after preprocessing, the following error occurs:
The size of tensor a (118) must match the size of tensor b (119) at non-singleton dimension 2

Changing the architecture from v2_small to v1 changes the number of b tensors from 119 to 121

To be honest, I'm not familiar with learning networks and can't even guess what this is about. If you have any ideas on how to solve this, any ideas would be greatly appreciated. I'm doing the training on the local machine.

I think the same issue:
#157

preprocessing
rave preprocess --channels 2 -v 1 --input_path .\ --output_path .\dataset --sampling_rate 96000

training
rave train --config v2_small --db_path .\dataset --out_path .\model --name electron --channels 2

The text was updated successfully, but these errors were encountered:

berkut0 · 2024-05-03T13:14:56Z

If I change --sampling_rate on the preprocessing stage to 48000 then the error changes too:
The size of tensor a (236) must match the size of tensor b (237) at non-singleton dimension 2

These numbers are not affected by changing the number of channels.

detailed output

(base) PS F:\_ircam\electromagnetic recs> rave train --config v2_small --db_path .\dataset --out_path .\model --name electron --channels 1 I0503 16:29:45.640041 12732 resource_reader.py:50] system_path_file_exists:v2_small.gin E0503 16:29:45.640041 12732 resource_reader.py:55] Path not found: v2_small.gin I0503 16:29:45.640041 12732 resource_reader.py:50] system_path_file_exists:C:\Program Files\Python311\Lib\site-packages\rave\v2_small.gin E0503 16:29:45.649347 12732 resource_reader.py:55] Path not found: C:\Program Files\Python311\Lib\site-packages\rave\v2_small.gin I0503 16:29:45.649347 12732 resource_reader.py:50] system_path_file_exists:configs/v1.gin E0503 16:29:45.649347 12732 resource_reader.py:55] Path not found: configs/v1.gin C:\Program Files\Python311\Lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") train set: 2348 examples val set: 48 examples selected gpu: [] GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs

| Name | Type | Params

0 | pqmf | CachedPQMF | 16.7 K
1 | encoder | VariationalEncoder | 3.9 M
2 | decoder | GeneratorV2 | 3.8 M
3 | discriminator | CombineDiscriminators | 6.8 M
4 | audio_distance | AudioDistanceV1 | 0
5 | multiband_audio_distance | AudioDistanceV1 | 0

14.6 M Trainable params
0 Non-trainable params
14.6 M Total params
58.284 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 8 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Program Files\Python311\Scripts\rave.exe_main.py", line 7, in
File "C:\Program Files\Python311\Lib\site-packages\scripts\main_cli.py", line 30, in main
app.run(train.main)
File "C:\Program Files\Python311\Lib\site-packages\absl\app.py", line 308, in run
_run_main(main, args)
File "C:\Program Files\Python311\Lib\site-packages\absl\app.py", line 254, in _run_main
sys.exit(main(argv))
^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\scripts\train.py", line 268, in main
trainer.fit(model, train, val, ckpt_path=run)
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1103, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1182, in _run_stage
self._run_train()
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1195, in _run_train
self._run_sanity_check()
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1267, in _run_sanity_check
val_loop.run()
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 137, in advance
output = self._evaluation_step(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 234, in _evaluation_step
output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1485, in _call_strategy_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\strategies\strategy.py", line 390, in validation_step
return self.model.validation_step(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\rave\model.py", line 437, in validation_step
distance = self.audio_distance(x, y)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\rave\core.py", line 339, in forward
lin_distance = mean_difference(x, y, norm='L2', relative=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\rave\core.py", line 240, in mean_difference
diff = target - value
~~~~~~~^~~~~~~
RuntimeError: The size of tensor a (118) must match the size of tensor b (119) at non-singleton dimension 2

patrickgates · 2024-05-04T17:13:13Z

+1, experiencing the same issue whenever I change the sampling_rate and/or num_signal/n_signal parameter. Rave version 2.3.1.

gwendal-lv · 2024-05-28T18:16:28Z

I have the same issue even when preprocessing with the "default" parameters --sampling_rate 48000 --channels 1 --num_signal 131072
The issue appears during validation only (same issue as yours), it seems that output signals are longer than the input.
This propagates to the MSS loss computation, where spectrograms don't have the same number of times frames so the difference can't be computed.
In the RAVE class (rave/model.py), I have updated this function:

    def validation_step(self, x, batch_idx):

        z = self.encode(x)
        if isinstance(self.encoder, blocks.VariationalEncoder):
            mean = torch.split(z, z.shape[1] // 2, 1)[0]
        else:
            mean = None

        z = self.encoder.reparametrize(z)[0]
        y = self.decode(z)

        # - - - quick and dirty attempt to fix this mismatch in the MSS loss inputs' shapes - - -
        if x.shape[2] < y.shape[2]:  # Crop output
            warnings.warn("Cropping output y for MSS loss")
            # TODO should crop the beginning instead of the end? Or center the crop?
            y = y[:, :, 0:x.shape[2]]
        elif x.shape[2] > y.shape[2]:
            raise AssertionError("Output is shorter than input")
        # - - - end of quick and dirty fix - - -

        distance = self.audio_distance(x, y)
        full_distance = sum(distance.values())

        if self.trainer is not None:
            self.log('validation', full_distance)

        return torch.cat([x, y], -1), mean

For instance for my dataset, before the crop, x (input) and y (output) had different lengths:

In[2]: x.shape, y.shape
Out[2]: (torch.Size([7, 1, 120423]), torch.Size([7, 1, 120832]))

I have just started using RAVE today so I don't know if this is a proper fix.
Worst case, it should influence validation scores only, not the training itself.

Hope this helps!

ddgg-el · 2024-09-03T10:03:41Z

Same problem here:

RuntimeError: The size of tensor a (26) must match the size of tensor b (29) at non-singleton dimension 2

...and @gwendal-lv fix does not work.

I am working with part of the Audio MNIST dataset (6500 files of 30000). Some files are pretty short so my arguments are:

preprocessing

rave preprocess \
    --input_path $input_path \
    --output_path $output_path \
    --channels 1 \
    --sampling_rate 48000 \
    --num_signal 14400

resulting in:

channels: 1
lazy: false
n_seconds: 2032.8
sr: 48000

training

rave train \
    --config v2_small \
    --db_path $db_path \
    --name $name \
    --val_every 2500 \
    --gpu -1 \
    --channels 1 \
    --n_signal 14400 \ 
    --workers $workers

tried with v1, v2_small and v2. acids-rave==2.3.1 running on an M1pro

btw: there is no --sampling_rate argument for the training right...?

ddgg-el · 2024-09-08T18:08:30Z

the problem seems to be related with the sample rate. Changing the sample the --sampling-rate flag to 44100 works even though the files are all at 48000

gwendal-lv mentioned this issue May 28, 2024

Colab tensor size mismatch issue #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor size mismatch issue #309

Tensor size mismatch issue #309

berkut0 commented May 3, 2024 •

edited

Loading

berkut0 commented May 3, 2024 •

edited

Loading

| Name | Type | Params

0 | pqmf | CachedPQMF | 16.7 K
1 | encoder | VariationalEncoder | 3.9 M
2 | decoder | GeneratorV2 | 3.8 M
3 | discriminator | CombineDiscriminators | 6.8 M
4 | audio_distance | AudioDistanceV1 | 0
5 | multiband_audio_distance | AudioDistanceV1 | 0

patrickgates commented May 4, 2024

gwendal-lv commented May 28, 2024

ddgg-el commented Sep 3, 2024 •

edited

Loading

ddgg-el commented Sep 8, 2024

Tensor size mismatch issue #309

Tensor size mismatch issue #309

Comments

berkut0 commented May 3, 2024 • edited Loading

berkut0 commented May 3, 2024 • edited Loading

| Name | Type | Params

0 | pqmf | CachedPQMF | 16.7 K 1 | encoder | VariationalEncoder | 3.9 M 2 | decoder | GeneratorV2 | 3.8 M 3 | discriminator | CombineDiscriminators | 6.8 M 4 | audio_distance | AudioDistanceV1 | 0 5 | multiband_audio_distance | AudioDistanceV1 | 0

patrickgates commented May 4, 2024

gwendal-lv commented May 28, 2024

ddgg-el commented Sep 3, 2024 • edited Loading

ddgg-el commented Sep 8, 2024

berkut0 commented May 3, 2024 •

edited

Loading

berkut0 commented May 3, 2024 •

edited

Loading

0 | pqmf | CachedPQMF | 16.7 K
1 | encoder | VariationalEncoder | 3.9 M
2 | decoder | GeneratorV2 | 3.8 M
3 | discriminator | CombineDiscriminators | 6.8 M
4 | audio_distance | AudioDistanceV1 | 0
5 | multiband_audio_distance | AudioDistanceV1 | 0

ddgg-el commented Sep 3, 2024 •

edited

Loading