The Art of Balance: Magnitude Preservation in Diffusion Transformers

📝Paper

We extend magnitude-preserving techniques from the EDM2 architecture to Diffusion Transformers (DiT), ensuring stable training by maintaining activation magnitudes and controlling weight growth throughout the architecture. Additionally, we incorporate power function-based exponential moving averages, enabling flexible post-training reconstruction with adjustable decay parameters. Experiments on DiT-XS/2 and DiT-S/4 show significant improvements in FID-10K, highlighting the effectiveness of our approach. Despite increased computational overhead, our methods offer a scalable and modular solution for transformer-based diffusion models.

Fig 1. DiT-S/4 samples without (left) and with (right) magnitude preserving layers.

This project builds upon key concepts from the following research papers:

Peebles & Xie (2023) explore the application of transformer architectures to diffusion models, achieving state-of-the-art performance on various generation tasks;
Karras et al. (2024) introduce the idea of preserving the magnitude of features during the diffusion process, enhancing the stability and quality of generated outputs.

Training

python train.py --data-path /path/to/data --results-dir /path/to/results --model DiT-S/2 --num-steps 400_000 <map feature flags>

Magnitude Preservation Flags

Customize the training process by enabling the following flags:

--use-cosine-attention - Controls weight growth in attention layers.
--use-weight-normalization - Applies magnitude preservation in linear layers.
--use-forced-weight-normalization - Controls weight growth in linear layers.
--use-mp-residual - Enables magnitude preservation in residual connections.
--use-mp-silu - Uses a magnitude-preserving version of SiLU nonlinearity.
--use-no-layernorm - Disables transformer layer normalization.
--use-mp-pos-enc - Activates magnitude-preserving positional encoding.
--use-mp-embedding - Uses magnitude-preserving embeddings.

Sampling

python sample.py --result-dir /path/to/results/<dir> --class-label <class label>

Citation

@misc{bill_jensen_2025,
    title={The Art of Balance: Magnitude Preservation in Diffusion Transformers},
    author={Bill, Eric Tillmann and Jensen, Cristian Perez},
    howpublished = {\url{https://github.com/ericbill21/map-dit}},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
diffusion		diffusion
src		src
visuals		visuals
.gitignore		.gitignore
README.md		README.md
download_data.py		download_data.py
sample.py		sample.py
sample_ema.py		sample_ema.py
sample_fid.py		sample_fid.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Art of Balance: Magnitude Preservation in Diffusion Transformers

📝Paper

Training

Magnitude Preservation Flags

Sampling

Citation

About

Contributors 2

Languages

ericbill21/map-dit

Folders and files

Latest commit

History

Repository files navigation

The Art of Balance: Magnitude Preservation in Diffusion Transformers

📝Paper

Training

Magnitude Preservation Flags

Sampling

Citation

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages