Poster
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma · Xiaopei Yang · Ming Gui · Yusong Li · Felix Krause · Johannes Schusterbauer · Björn Ommer
[
Abstract
]
Abstract:
The human perception of style and content is inherently subjective and varies widely. Likewise, computer vision models learn diverse latent representations of these attributes. While generative models focus on stylization and content transfer, discriminative approaches aim to capture effective representations of style and content. However, explicitly defining these attributes remains inherently difficult. To address this, we propose a method that implicitly discovers style and content representations within a semantic-rich compact space, avoiding spatial token constraints. Leveraging flow matching, our framework effectively separates style and content without predefined definitions, offering a structured yet flexible representation that can be directly applied to any precomputed CLIP embeddings. To further facilitate this, we have curated a dataset of $510{,}000$ samples ($51$ styles $\times$ $10{,}000$ content samples) for training and evaluating our model. While our method provides a strong foundation for representation learning, it is also adaptable for controllable generation tasks. We demonstrated our implicitly learned style and content representations can generalize well to ImageNet-1k and WikiArt in a zero-shot fashion. We showcase promising visual results involving various styles and contents. \textit{We will release the code and the curated dataset.}
Live content is unavailable. Log in and register to view live content