You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow defining a training schedule. I.e. shifting weights of datasets during training per step.
One example could be:
__module__: megatron.energon__class__: MetadatasetV2splits:
train:
blend:
- weight: 1path: ds1
- weight:
__module__: megatron.energon __class__: WeightSchedulelinear: # Maybe "linear" or "step"?0: 100# At iteration 0 (i.e. 0 items yielded on each rank), the weight is 100100: 10# At iteration 100, the weight is 101000: 0# At iteration 1000 (and onwards), the weight is 0path: ds2
Also for epochized_blend:
__module__: megatron.energon__class__: MetadatasetV2splits:
train:
epochized_blend:
- repetitions: 1path: ds1
- repetitions: 2weight: # Combination with "weight" to fade in / out a dataset? The outer repetitions still hold (except when weight becomes 0).__module__: megatron.energon __class__: WeightSchedulelinear: # Maybe "linear" or "step"?0: 100# At iteration 0 (i.e. 0 items yielded on each rank), the weight is 100100: 10# At iteration 100, the weight is 101000: 0# At iteration 1000 (and onwards), the weight is 0path: ds2
With an outer schedule:
__module__: megatron.energon__class__: MetadatasetV2splits:
train:
sequential_schedule: # Or just "schedule" or just "sequential" or "curriculum"?# Does this need an option to end iterating at the end of a stage? Otherwise, the shufbuf will mix stages.# I guess, inside we cannot handle "blend", but only "blend_epochized" or a dataset directly.# This is stage 1 of the training, until the repetitions are done.
- epochized_blend: # Blend the first part consisting of these datasets
- repetitions: 1path: ds1
- repetitions: 2path: ds2# This is stage 2 of the training, until the repetitions are done.
- weight: 1path: ds3
Discussion:
Schedule is depending on the number of dataset iterations. This may not equal the number of gradient updates, e.g. for gradient accumulation. Should we make gradacc / steps_per_iter configurable?
maybe make it rather type: linear instead of linear: and step:? Should unify this with typical lr-schedulers.
The text was updated successfully, but these errors were encountered:
voegtlel
changed the title
Training Schedule
Training Schedule / Curriculum
Oct 10, 2024
Allow defining a training schedule. I.e. shifting weights of datasets during training per step.
One example could be:
Also for
epochized_blend
:With an outer schedule:
Discussion:
type: linear
instead oflinear:
andstep:
? Should unify this with typical lr-schedulers.The text was updated successfully, but these errors were encountered: