DUO, Chapter II

Key Contributions

\(\Psi\)-Samplers: A family of Predictor-Corrector samplers for discrete diffusion that generalize prior methods and apply to any noise process. With these, we outperform MDLM on both text and image generation.
Inference-time scaling: Unlike ancestral sampling which plateaus, \(\Psi\)-samplers continue to improve with more sampling steps.
Efficient Curriculum: 33% less memory and 25% faster training by exploiting softmax sparsity: only \(k\) embeddings needed (as few as 2).

Concurrently, Sahoo et al. (2026) show that Duo surpasses autoregressive models at the 1.7B scale on maths and reasoning (GSM8K).

\(\Psi\)-Samplers

Standard discrete diffusion uses ancestral sampling: at each step, tokens are updated using the reverse posterior.

For Masked diffusion, once a token is unmasked it can never be corrected.
For Uniform-state diffusion, quality plateaus in the high NFE regime.

We introduce \(\Psi\)-posteriors, a superposition of a Predictor (reverse posterior) and a Corrector (forward process), that preserve the diffusion marginals while enabling error correction.

As shown in Fig. 1, \(\Psi\)-samplers consistently improve with more sampling steps (unlike ancestral sampling which plateaus), and Duo⁺⁺ outperforms MDLM on both text (OpenWebText) and image (CIFAR-10) generation.

Efficient Curriculum

(Top) Duo uses linear combinations of all \(K\) embeddings. (Bottom) Duo⁺⁺ exploits softmax sparsity, simulating only the top-\(k\) entries.

Duo⁺⁺ exploits the sparsity of the low-temperature softmax used by Duo. By simulating only the top-\(k\) entries using order statistics (\(k\) as small as 2), we reduce peak memory by 33% (94 GiB → 63 GiB) and end-to-end training time by 25%, while matching the perplexity and downstream accuracy of Duo.

BibTeX

@inproceedings{
  deschenaux2026the,
  title={The Diffusion Duality, Chapter {II}: ${\textbackslash}Psi$-Samplers and Efficient Curriculum},
  author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=RSIoYWIzaP}
}

The Diffusion Duality, Chapter II:
\( \Psi \)-Samplers and Efficient Curriculum

An in-depth tutorial on this paper by Justin.

Key Contributions

\(\Psi\)-Samplers

Efficient Curriculum

BibTeX

The Diffusion Duality, Chapter II: \( \Psi \)-Samplers and Efficient Curriculum

An in-depth tutorial on this paper by Justin.

Key Contributions

\(\Psi\)-Samplers

Efficient Curriculum

BibTeX

The Diffusion Duality, Chapter II:
\( \Psi \)-Samplers and Efficient Curriculum