Pré-Publication, Document De Travail Année : 2025

GaussianDiffusion: Learning Image Generation Process in GaussianRepresentation Space

Résumé

Diffusion models have become a leading approach in generative image modeling, but many still operate in dense pixel space, a representation that is computationally intensive and lacks geometric structure. We propose Gaussian-Diffusion, a framework that performs the denoising process entirely in a latent space composed of 2D Gaussians. Each image is encoded as a set of 150 anisotropic Gaussian splats, parameterized by position, covariance, and color. To model their dynamics, we introduce GaussianTransformer, a permutation-equivariant transformer that serves as the denoising network. Evaluated on MNIST and Sprites datasets, our method achieves visual quality comparable to a pixel space U-Net baseline, while reducing the number of sampling steps from 1000 to 200 and the per-step cost from 11.4 GFLOPs to 4 GFLOPs, resulting in an overall 22× improvement in generation time on an A100 GPU. In contrast to latent diffusion models, our approach does not require an auxiliary autoencoder and preserves full editability of the latent. These findings suggest that structured geometric representations can offer efficient and interpretable alternatives to latent and pixel-based diffusion.

Fichier principal
Vignette du fichier
GaussianDiffusion__Learning_Image_Generation_Process_in_GaussianRepresentation_Space.pdf (2.23 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-05522830 , version 1 (23-02-2026)

Licence

Identifiants

  • HAL Id : hal-05522830 , version 1

Citer

Simon Coessens, Arijit Samal, Akash Malhotra, Nacéra Bennacer Seghouani. GaussianDiffusion: Learning Image Generation Process in GaussianRepresentation Space. 2025. ⟨hal-05522830⟩
515 Consultations
85 Téléchargements

Partager

  • More