This repository has been archived by the owner on Nov 21, 2022. It is now read-only.
TransformerDataModule.setup()
run more than once unnecessarily
#299
Labels
🐛 Bug
TransformerDataModule.setup()
is run more than once unnecessarily. For example, when running the code included below, it runssetup()
when callingdm.num_classes
and then when callingtrainer.fit(model, dm)
.setup()
then callsself.load_dataset()
,self.split_dataset(dataset)
andself.process_data(dataset, stage=stage)
. Callingself.load_dataset()
several times is not a big deal because it will load it from the cache, but the other two methods are expensive and I think it does not make sense to run them again (since they just override whateverself.ds
was there before.To Reproduce
Take the below example from the docs and just check the console output or run it in debug mode with a breakpoint. It can be seen that
TransformerDataModule.setup()
and the subsequent methodsload_dataset()
,split_dataset()
and are run more than once.Expected behavior
Given that
TransformerDataModule.setup()
currently does the following:Perhaps a way to avoid running it again would be creating the class attribute
self.setup_stages_run = []
when the class is initialized and then defining thesetup
method as:Can create a PR if you think this makes sense.
Thanks!
The text was updated successfully, but these errors were encountered: