🤗
HuggingFaceUniform-state diffusion beats Masked diffusion on text and image generation!
Concurrently, Sahoo et al. (2026) show that Duo surpasses autoregressive models at the 1.7B scale on maths and reasoning (GSM8K).
Standard discrete diffusion uses ancestral sampling: at each step, tokens are updated using the reverse posterior.
We introduce \(\Psi\)-posteriors, a superposition of a Predictor (reverse posterior) and a Corrector (forward process), that preserve the diffusion marginals while enabling error correction.
As shown in Fig. 1, \(\Psi\)-samplers consistently improve with more sampling steps (unlike ancestral sampling which plateaus), and Duo++ outperforms MDLM on both text (OpenWebText) and image (CIFAR-10) generation.
Duo++ exploits the sparsity of the low-temperature softmax used by Duo. By simulating only the top-\(k\) entries using order statistics (\(k\) as small as 2), we reduce peak memory by 33% (94 GiB → 63 GiB) and end-to-end training time by 25%, while matching the perplexity and downstream accuracy of Duo.
@inproceedings{
deschenaux2026the,
title={The Diffusion Duality, Chapter {II}: ${\textbackslash}Psi$-Samplers and Efficient Curriculum},
author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=RSIoYWIzaP}
}