Discrete Diffusion Reading Group

Exploring diffusion-based generative models on discrete spaces.

Meeting at 1pm ET / 10am PT / 7pm CET / 10:30pm IST on Mondays.

Latest Sessions

View All Sessions
S18 | Language Modeling with Spherical Geometry
1:26:34
May 25, 2026

S18 | Language Modeling with Spherical Geometry

Justin Deschenaux (EPFL) and Jannis Chemseddine (TU Berlin) present their recent works on hyperspherical language modeling. By lifting tokens onto the sphere, they define language flows along SLERP and vMF paths. The vMF path admits a closed-form score, Hyperspherical flows improve code generation over prior flow language models, and at matched NFE, the PC sampler with vMF paths improves accuracy on Sudoku.

Discrete diffusion samples from a factorized distribution that is strictly less expressive than autoregressive models, while Flow Language Models (FLMs) avoid factorized sampling but add Gaussian noise on one-hot vectors, a noise process with no clear semantic interpretation for text. Both papers explore the sphere as a natural geometry for language flows. By lifting tokens onto the hypersphere, the authors develop spherical language modeling via SLERP and vMF paths. The vMF path has a closed-form conditional score, enabling predictor-corrector samplers on the sphere. Training with rotations avoids materializing one-hot vectors, making it cheaper than standard FLM training. On TinyGSM (code generation), prior FLMs reach roughly 0% accuracy while hyperspherical flows reach 12-18%. At matched NFE, a PC sampler with vMF paths improves accuracy on Sudoku. Joint work with Caglar Gulcehre, Gregor Kornhardt, and Gabriele Steidl.

S17 | IDLM: Inverse-distilled Diffusion Language Models
1:26:26
May 18, 2026

S17 | IDLM: Inverse-distilled Diffusion Language Models

IDLM extends Inverse Distillation to discrete diffusion language models, with a uniqueness theorem and gradient-stable relaxations that enable effective training, significantly reducing the number of inference steps while preserving the teacher model's entropy and generative perplexity.

Diffusion Language Models (DLMs) have recently achieved strong results in text generation, but their multi-step sampling makes inference slow and limits practical use. This work extends Inverse Distillation, a technique originally developed for accelerating continuous diffusion models, to the discrete setting. The extension raises two challenges: the inverse distillation objective lacks uniqueness guarantees, which can yield suboptimal solutions, and backpropagation through discrete sampling is non-trivial and often unstable. The authors first prove that their inverse formulation admits a unique solution, ensuring valid optimization, and then introduce gradient-stable relaxations that make training effective. Experiments across multiple DLMs show that IDLM significantly reduces the number of inference steps while preserving the teacher model's entropy and generative perplexity.

S16 | Unifying Masked Diffusion Models with Various Generation Orders and Beyond
38:56
May 11, 2026

S16 | Unifying Masked Diffusion Models with Various Generation Orders and Beyond

Order-expressive masked diffusion (OeMDM) unifies masked diffusion, autoregressive, and block diffusion models in a single framework, and its extension LoMDM jointly learns the generation order and diffusion backbone end-to-end, outperforming prior discrete diffusion baselines on language modeling benchmarks.

Masked diffusion models (MDMs) are a promising alternative to autoregressive models for language generation, but their quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise left-to-right) or learns an ordering policy on top of a pretrained MDM, incurring extra cost and suboptimal two-stage optimization. This paper introduces the order-expressive masked diffusion model (OeMDM), a framework spanning a broad class of diffusion processes with various generation orders that subsumes MDMs, autoregressive models, and block diffusion as special cases. Building on OeMDM, the learnable-order masked diffusion model (LoMDM) jointly learns the generation ordering and the diffusion backbone from scratch with a single objective, enabling context-dependent token ordering at sampling time. Empirically, LoMDM outperforms a range of discrete diffusion baselines across multiple language modeling benchmarks.

Featured Videos

View All Videos
How did diffusion LLMs get so fast?
22:14
February 9, 2026

How did diffusion LLMs get so fast?

Techniques for accelerating diffusion LLMs, from self-distillation and curriculum learning to KV caching and block diffusion

This video discusses techniques for making diffusion LLMs faster, including self-distillation through time, curriculum learning, confidence scores for unmasking, guided diffusion (FlashDLM), approximate KV caching (dLLM-Cache, dKV-Cache), and block diffusion.

But How Do Diffusion Language Models Actually Work?
12:27
August 3, 2025

But How Do Diffusion Language Models Actually Work?

Jia-Bin Huang explores several ideas for applying diffusion models to language modeling

Most Large Language Models (LLMs) today are based on Autoregressive models (i.e., they predict texts in a left-to-right order). But diffusion models offer iterative refinement, flexible control, and faster sampling. In this video, we explore several ideas for applying diffusion models to language modeling.

Simple Diffusion Language Models
15:07
July 3, 2024

Simple Diffusion Language Models

Quick introduction to Masked Diffusion Language Models (MDLM) by Alexander Rush

Quick introduction to Masked Diffusion Language Models (MDLM) by Alexander Rush

About the Reading Group

Diffusion LLMs are faster, more controllable successors to traditional LLMs and are rapidly gaining adoption. This reading group builds a community for exchanging and debating emerging ideas in this space. While our primary focus is discrete diffusion models for language, we also welcome work on other modalities and applications, such as molecular design, drug discovery, and beyond.

Meet the Organizers

Subham Sekhar Sahoo

Subham Sahoo

Holds a Ph.D. from Cornell Tech, where he specialized in Diffusion Language Models. He has made foundational contributions to the field, with his work deployed at scale by Google, NVIDIA, and ByteDance across language generation and drug discovery.

Justin Deschenaux

Justin Deschenaux

PhD student in Machine Learning at EPFL, advised by Prof. Caglar Gulcehre. Previously interned at Apple MLR. His research interests include diffusion language models, fast generative models, and generalization.

Zhihan Yang

Zhihan Yang

PhD student at Cornell CS. Previously completed his Bachelor's degrees in Mathematics and Statistics at Carleton College. He is a winner of the CRA Outstanding Undergraduate Researcher Award and his research focuses on principled, controllable, and efficient generative models.