We extend magnitude-preserving techniques from the EDM2 architecture to Diffusion Transformers (DiT), ensuring stable training by maintaining activation magnitudes and controlling weight growth throughout the architecture. Additionally, we incorporate power function-based exponential moving averages, enabling flexible post-training reconstruction with adjustable decay parameters. Experiments on DiT-XS/2 and DiT-S/4 show significant improvements in FID-10K, highlighting the effectiveness of our approach. Despite increased computational overhead, our methods offer a scalable and modular solution for transformer-based diffusion models.
Fig 1. DiT-S/4 samples without (left) and with (right) magnitude preserving layers.
This project builds upon key concepts from the following research papers:
- Peebles & Xie (2023) explore the application of transformer architectures to diffusion models, achieving state-of-the-art performance on various generation tasks;
- Karras et al. (2024) introduce the idea of preserving the magnitude of features during the diffusion process, enhancing the stability and quality of generated outputs.
python train.py --data-path /path/to/data --results-dir /path/to/results --model DiT-S/2 --num-steps 400_000 <map feature flags>
Customize the training process by enabling the following flags:
--use-cosine-attention
- Controls weight growth in attention layers.--use-weight-normalization
- Applies magnitude preservation in linear layers.--use-forced-weight-normalization
- Controls weight growth in linear layers.--use-mp-residual
- Enables magnitude preservation in residual connections.--use-mp-silu
- Uses a magnitude-preserving version of SiLU nonlinearity.--use-no-layernorm
- Disables transformer layer normalization.--use-mp-pos-enc
- Activates magnitude-preserving positional encoding.--use-mp-embedding
- Uses magnitude-preserving embeddings.
python sample.py --result-dir /path/to/results/<dir> --class-label <class label>
@misc{bill_jensen_2025,
title={The Art of Balance: Magnitude Preservation in Diffusion Transformers},
author={Bill, Eric Tillmann and Jensen, Cristian Perez},
howpublished = {\url{https://github.com/ericbill21/map-dit}},
year={2025}
}