Poster Thu, Oct 23, 2025 • 2:15 PM – 4:15 PM PDT Exhibit Hall I #297

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Yongxin Zhu · Bocheng Li · Yifei Xin · Zhihua Xia · Linli Xu

Project Page [ Poster]

Abstract

Vector Quantization (VQ) is a widely used method for converting continuous representations into discrete codes, which has become fundamental in unsupervised representation learning. However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for large-scale training. Existing methods designed to mitigate representation collapse typically design complex optimization strategies or reduce the dimensionality of latent space at the expense of model capacity, which do not fully resolve the core issue. In this study, we analyze the representation collapse in VQ models and identify its primary cause as the disjoint optimization of the codebook, where only a small subset of code vectors are updated through gradient descent. To address this issue, we propose \textbf{Sim}ple\textbf{VQ}, a novel method that reparameterizes the code vectors through a linear transformation layer based on a learnable latent basis. This transformation optimizes the \textit{entire linear space} spanned by the codebook, rather than merely updating \textit{single code vectors} selected by the nearest-neighbor search in vanilla VQ models. Although it is commonly understood that the multiplication of two linear matrices is equivalent to applying a single linear layer, our approach works surprisingly well in resolving the collapse issue in VQ models with just one linear layer. We validate the efficacy of SimVQ through extensive experiments across various modalities, including image and audio data with different model architectures. The results show that SimVQ not only effectively addresses the problem of representation collapse but also proves highly adaptable and easy to implement, suggesting its broad applicability in diverse machine learning contexts.

Chat is not available.