| Models | SMDM | LLaDa | AR (Ours) | MDLM (Ours) | Eso-LM (Ours) | Duo (Ours) |
|---|---|---|---|---|---|---|
| Params | 1B | 8B | 1.7B | 1.7B | 1.7B | 1.7B |
| GSM8K (↑) | 58.5 | 70.7 | 62.9 | 58.8 | 33.4 | 65.8 |
@misc{sahoo2026scalingmaskeddiffusionlanguage,
title={Scaling Beyond Masked Diffusion Language Models},
author={Subham Sekhar Sahoo and Jean-Marie Lemercier and Zhihan Yang and Justin Deschenaux and Jingyu Liu and John Thickstun and Ante Jukic},
year={2026},
eprint={2602.15014},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.15014},
}