Poster
CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation
Yi Liu · Shengqian Li · Zuzeng Lin · Feng Wang · Si Liu
The current conditional autoregressive image generation methods have shown promising results, yet their potential remains largely unexplored in the practical unsupervised image translation domain, which operates without explicit cross-domain correspondences.A critical limitation stems from the discrete quantization inherent in traditional Vector Quantization-based frameworks, which disrupts gradient flow between the Variational Autoencoder decoder and causal Transformer, impeding end-to-end optimization during adversarial training in image space.To tackle this issue, we propose using Softmax Relaxed Quantization, a novel approach that reformulates codebook selection as a continuous probability mixing process via Softmax, thereby preserving gradient propagation. Building upon this differentiable foundation, we introduce CycleVAR, which reformulates image-to-image translation as image-conditional visual autoregressivegeneration by injecting multi-scale source image tokens as contextual prompts, analogous to prefix-based conditioning in language models.CycleVAR exploits two modes to generate the target image tokens, including (1) serial multi-step generation enabling iterative refinement across scales and (2) parallel one-step generation synthesizing all resolution outputs in a single forward pass.Experimental findings indicate that the parallel one-step generation mode attains superior translation quality with quicker inference speed than the serial multi-step mode in unsupervised scenarios.Furthermore, both quantitativeand qualitative results indicate that CycleVAR surpasses previous state-of-the-artunsupervised image translation models, e.g., CycleGAN-Turbo.
Live content is unavailable. Log in and register to view live content