Skip to yearly menu bar Skip to main content


Poster

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng · Xiang Zhang · Zirui Wang · Haiyang Xu · Zeyuan Chen · Bingnan Li · Zhuowen Tu


Abstract:

We propose YOLO-Count, a new differentiable open-vocabulary object counting model that addresses both general counting challenges and enables training-free quantity control for text-to-image (T2I) generation. A key contribution is the `cardinality' map, a novel regression target designed to account for object size and location variations. By employing representation alignment and a hybrid supervision scheme, YOLO-Count minimizes the discrepancy between open-vocabulary counting and T2I generation control. The model's differentiable architecture facilitates gradient-based optimization for accurate object counts, leading to enhanced controllability and transparency in T2I systems. Our empirical evaluation demonstrates state-of-the-art counting accuracy and effective quantity control for the T2I generation tasks.

Live content is unavailable. Log in and register to view live content