You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for quite a while I have tried to train my Model with keras3 and the TF backend (TF2.18) using distributed training and the MirroredStrategy. Not being able to run the training with Model.fit successfully, I turned to try this with one of the examples of the keras documentation. The train examples run fine under keras2/TF2.18 and MirroredStrategy for 1 and 2 devices, and it runs fine as well for Keras 3 with one device. For running this with 2 devices the fit function fails within this error under TF2.17
Having seen a few recent bugs related to distributed training producing incorrect results I'd like to add that in the present case the fit function does not even start to produce any results. It simply fails during the first invocation.
Hello
for quite a while I have tried to train my Model with keras3 and the TF backend (TF2.18) using distributed training and the MirroredStrategy. Not being able to run the training with Model.fit successfully, I turned to try this with one of the examples of the keras documentation. The train examples run fine under keras2/TF2.18 and MirroredStrategy for 1 and 2 devices, and it runs fine as well for Keras 3 with one device. For running this with 2 devices the fit function fails within this error under TF2.17
and with this error under TF2.18
I've put the code with result for running under keras2 here and the version configured for running with keras 3 here.
Many thanks for your help.
The text was updated successfully, but these errors were encountered: