GaussianDiffusion: Learning Image Generation Process in GaussianRepresentation Space

Diffusion models have become a leading approach in generative image modeling, but many still operate in dense pixel space, a representation that is computationally intensive and lacks geometric structure. We propose Gaussian-Diffusion, a framework that performs the denoising process entirely in a latent space composed of 2D Gaussians. Each image is encoded as a set of 150 anisotropic Gaussian splats, parameterized by position, covariance, and color. To model their dynamics, we introduce GaussianTransformer, a permutation-equivariant transformer that serves as the denoising network. Evaluated on MNIST and Sprites datasets, our method achieves visual quality comparable to a pixel space U-Net baseline, while reducing the number of sampling steps from 1000 to 200 and the per-step cost from 11.4 GFLOPs to 4 GFLOPs, resulting in an overall 22× improvement in generation time on an A100 GPU. In contrast to latent diffusion models, our approach does not require an auxiliary autoencoder and preserves full editability of the latent. These findings suggest that structured geometric representations can offer efficient and interpretable alternatives to latent and pixel-based diffusion.

Mots clés

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

GaussianDiffusion__Learning_Image_Generation_Process_in_GaussianRepresentation_Space.pdf (2.23 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	CC BY 4.0 - Attribution

Connectez-vous pour contacter le contributeur

https://hal.science/hal-05522830

Soumis le : lundi 23 février 2026-07:39:32

Dernière modification le : vendredi 27 février 2026-03:13:33

Dates et versions

hal-05522830 , version 1 (23-02-2026)

Licence

CC BY 4.0 - Attribution

Identifiants

HAL Id : hal-05522830 , version 1

Citer

Simon Coessens, Arijit Samal, Akash Malhotra, Nacéra Bennacer Seghouani. GaussianDiffusion: Learning Image Generation Process in GaussianRepresentation Space. 2025. ⟨hal-05522830⟩

Exporter

Collections

515 Consultations

85 Téléchargements