ICCV Poster YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Poster

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng · Xiang Zhang · Zirui Wang · Haiyang Xu · Zeyuan Chen · Bingnan Li · Zhuowen Tu

Exhibit Hall I #1545

[ Abstract ]

Wed 22 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract:

We propose YOLO-Count, a new differentiable open-vocabulary object counting model that addresses both general counting challenges and enables training-free quantity control for text-to-image (T2I) generation. A key contribution is the `cardinality' map, a novel regression target designed to account for object size and location variations. By employing representation alignment and a hybrid supervision scheme, YOLO-Count minimizes the discrepancy between open-vocabulary counting and T2I generation control. The model's differentiable architecture facilitates gradient-based optimization for accurate object counts, leading to enhanced controllability and transparency in T2I systems. Our empirical evaluation demonstrates state-of-the-art counting accuracy and effective quantity control for the T2I generation tasks.

Live content is unavailable. Log in and register to view live content