ICCV 2025 Accepted Papers
Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization
Thomas Carr · Depeng Xu · Shuhan Yuan · Aidong Lu
|
||
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang · Cheng Han · James Liang · Wenhao Yang · Dongfang Liu · Luna Zhang · Qifan Wang · Jiebo Luo · Ruixiang Tang
|
||
Voyaging into Unbounded Dynamic Scenes from a Single View
Fengrui Tian · Tianjiao Ding · Jinqi Luo · Hancheng Min · Rene Vidal
|
||
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan · Hanqing Liu · Yao Huang · 小奇 王 · Caixin KANG · Hang Su · Yinpeng Dong · Xingxing Wei
|
||
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park · Can Cui · Yunsheng Ma · Ahmadreza Moradipari · Rohit Gupta · Kyungtae Han · Ziran Wang
|
||
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen · honglin guo · Lanjun Wang · Chenyu Zhang · Weizhi Nie · Anan Liu
|
||
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang · Lanqing Guo · Zhihao Li · Jiaxing Huang · Pichao WANG · Bihan Wen · Jian Wang
|
||
When Schrödinger Bridge Meets Real-World Image Dehazing with Unpaired Training
Yunwei Lan · Zhigao Cui · Xin Luo · Chang Liu · Nian Wang · Menglin Zhang · Yanzhao Su · Dong Liu
|
||
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong · David Fan · Jiachen Zhu · Yunyang Xiong · Xinlei Chen · Koustuv Sinha · Michael Rabbat · Yann LeCun · Saining Xie · Zhuang Liu
|
||
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Hongjae Lee · Myungjun Son · Dongjea Kang · Seung-Won Jung
|
||
Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning
Zongyao Xue · Meina Kan · Shiguang Shan · Xilin Chen
|
||
Training-Free Industrial Defect Generation with Diffusion Models
Ruyi Xu · Yen-Tzu Chiu · Tai-I Chen · Oscar Chew · Yung-Yu Chuang · Wen-Huang Cheng
|
||
Auto-Regressive Transformation for Image Alignment
Kanggeon Lee · Soochahn Lee · Kyoung Mu Lee
|
||
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Yuping Wang · Xiangyu Huang · Xiaokang Sun · Mingxuan Yan · Shuo Xing · Zhengzhong Tu · Jiachen Li
|
||
HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang · Jiayi Ma · Jianwei Guo · Wei Hu · Zhaoshuai Qi · Fei HUI · Jiaqi Yang · Yanning Zhang
|
||
Reverse Convolution and Its Applications to Image Restoration
Xuhong Huang · Shiqi Liu · Kai Zhang · Ying Tai · Jian Yang · Hui Zeng · Lei Zhang
|
||
SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation
Hao Ban · Gokul Ram Subramani · Kaiyi Ji
|
||
Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion
Haowen Bai · Jiangshe Zhang · Zixiang Zhao · Lilun Deng · Yukun Cui · Shuang Xu
|
||
GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer
Xin Jin · Haisheng Su · Cong Ma · Kai Liu · Wei Wu · Fei HUI · Junchi Yan
|
||
PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View
Longliang Liu · Miaojie Feng · Junda Cheng · Jijun Xiang · Xuan Zhu · Xin Yang
|
||
Understanding Personal Concept in Open-Vocabulary Semantic Segmentation
Sunghyun Park · Jungsoo Lee · Shubhankar Borse · Munawar Hayat · Sungha Choi · Kyuwoong Hwang · Fatih Porikli
|
||
Lark: Low-Rank updates after knowledge localization for Few-shot Class-Incremental Learning
Jinxin Shi · Jiabao Zhao · Yifan Yang · Xingjiao Wu · Jiawen Li · Liang He
|
||
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
Tianyi Wei · Yifan Zhou · Dongdong Chen · Xingang Pan
|
||
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin · Hanjia Lyu · Xian Xu · Jiebo Luo
|
||
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo · Yanyan Li · Gim Hee Lee
|
||
Learning Large Motion Estimation from Intermediate Representations with a High-Resolution Optical Flow Dataset Featuring Long-Range Dynamic Motion
Hoonhee Cho · Yuhwan Jeong · Kuk-Jin Yoon
|
||
VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs
Qiucheng Wu · Handong Zhao · Michael Saxon · Trung Bui · William Yang Wang · Yang Zhang · Shiyu Chang
|
||
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Tianyi Zhao · Boyang Liu · Yanglei Gao · Yiming Sun · Maoxun Yuan · Xingxing Wei
|
||
Scaling and Taming Adversarial Training with Synthetic Data
Juntao Wu · Xianting Huang · Yu Chen · Shuai Pang · Ke Wang
|
||
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang · Yang Liu · Weixing Chen · Jingzhou Luo · Ziliang Chen · Ling Pan · Guanbin Li · Liang Lin
|
||
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
Xuemeng Yang · Licheng Wen · Tiantian Wei · Yukai Ma · Jianbiao Mei · Xin Li · Wenjie Lei · Daocheng Fu · Pinlong Cai · Min Dou · Liang He · Yong Liu · Botian Shi · Yu Qiao
|
||
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi · Yu Li · Weicheng Meng · Chaofeng Chen · Chen Hui · Shaohui Liu
|
||
Omni-scene Perception-oriented Point Cloud Geometry Enhancement for Coordinate Quantization
Wang Liu · Wei Gao
|
||
MorphoGen: Efficient Unconditional Generation of Long-Range Projection Neuronal Morphology via a Global-to-Local Framework
Tianfang Zhu · Hongyang Zhou · Anan LI
|
||
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Jiahui Geng · Qing Li
|
||
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi · Eddy Ilg · Margret Keuper · Hideki Tanaka · Masao Utiyama · Raj Dabre · Steffen Eger · Simone Paolo Ponzetto
|
||
DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Yiren Song · Xiaokang Liu · Mike Zheng Shou
|
||
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan · Huaibo Huang · Mingrui Chen · Ran He
|
||
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan · Vibashan VS · Rama Chellappa · Vishal Patel
|
||
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu · Congqi Cao · Yifan Zhang · Yanning Zhang
|
||
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
Hongyang Wei · Shuaizheng Liu · Chun Yuan · Lei Zhang
|
||
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li · Chinthani Sugandhika · Ee Yeo Keat · Eric Peh · Hao Zhang · HONG YANG · Deepu Rajan · Basura Fernando
|
||
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang · Yi Yang
|
||
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing · Yang Fei · Yingqing He · Jingye Chen · Jiaxin Xie · Xiaowei Chi · Qifeng Chen
|
||
Exploring View Consistency for Scene-Adaptive Low-Light Light Field Image Enhancement
Shuo Zhang · Chen Gao · Youfang Lin
|
||
Unleashing Vectset Diffusion Model for Fast Shape Generation
Zeqiang Lai · Zhao Yunfei · Zibo Zhao · Haolin Liu · Fu-Yun Wang · Huiwen Shi · Xianghui Yang · Qingxiang Lin · Jingwei Huang · Lliu Yuhong · Jie Jiang · Chunchao Guo · Xiangyu Yue
|
||
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An Lun Liu · Yu-Wei Chao · Yi-Ting Chen
|
||
Visual Intention Grounding for Egocentric Assistant
Pengzhan Sun · Junbin Xiao · Tze Ho Elden Tse · Yicong Li · Arjun Akula · Angela Yao
|
||
GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views
Hang Yang · Le Hui · Jianjun Qian · Jin Xie · Jian Yang
|
||
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Minchao Jiang · Shunyu Jia · Jiaming Gu · Xiaoyuan Lu · Guangming Zhu · Anqi Dong · zhang liang
|
||
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang · Yuanfan Guo · Rolandos Alexandros Potamias · Jiankang Deng · Hang Xu · Chao Ma
|
||
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
Quang Nguyen · Nhat Le · Baoru Huang · Minh VU · Chengcheng Tang · Van Nguyen · Ngan Le · Thieu Vo · Anh Nguyen
|
||
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliablity
Seungju Yoo · Hyuk Kwon · Joong-Won Hwang · Kibok Lee
|
||
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan · qingyan wei · Jiayi Tang · Jiajun Li · Yulin Wang · Xuming Hu · Huiqi Li · Linfeng Zhang
|
||
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel · Anna Hilsmann · Peter Eisert
|
||
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai · Wenxuan Cheng · Jiedong Zhuang · Jiang-Jiang Liu · Hongshen Zhao · Zhenhua Feng · Wankou Yang
|
||
Unified Video Generation via Next-Set Prediction in Continuous Domain
Zhanzhou Feng · Qingpei Guo · Xinyu Xiao · Ruihan Xu · Ming Yang · Shiliang Zhang
|
||
$G^{2}D$: Boosting Multimodal Learning with Gradient-Guided Distillation
Mohammed Rakib · Arunkumar Bagavathi
|
||
Text-guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan · Chong Sun · Canmiao Fu · Zhipeng Huang · Chun Yuan · Chen Li
|
||
Low-Light Image Enhancement using Event-Based Illumination Estimation
Lei Sun · Yuhan Bao · Jiajun Zhai · Jingyun Liang · YULUN ZHANG · Kaiwei Wang · Danda Pani Paudel · Luc Gool
|
||
SAS: Segment Any 3D Scene with Integrated 2D Priors
Zhuoyuan Li · Jiahao Lu · Jiacheng Deng · Hanzhi Chang · Lifan Wu · Yanzhe Liang · Tianzhu Zhang
|
||
MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration
Zhehui Wu · Yong Chen · Naoto Yokoya · Wei He
|
||
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Yabo Zhang · xinpeng zhou · Yihan Zeng · Hang Xu · Hui Li · Wangmeng Zuo
|
||
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li · pengyu Liu · Dan Guo · Fei Wang · zhiliang wu · Hehe Fan · Meng Wang
|
||
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang · Jinlong Peng · Qingdong He · Hao Yang · Ying Jin · Jiafu Wu · Xiaobin Hu · Yanjie Pan · Zhenye Gan · Mingmin Chi · Bo Peng · Yabiao Wang
|
||
HyPiDecoder: Hybrid Pixel Decoder for Efficient Segmentation and Detection
Fengzhe Zhou · Humphrey Shi
|
||
MAVias: Mitigate any Visual Bias
Ioannis Sarridis · Christos Koutlis · Symeon Papadopoulos · Christos Diou
|
||
Joint Diffusion Models in Continual Learning
Paweł Skierś · Kamil Deja
|
||
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
Xiang Xu · Lingdong Kong · Song Wang · Chuanwei Zhou · Qingshan Liu
|
||
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen · Min Shi · Gong Zhang · Humphrey Shi
|
||
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang · Clint Sebastian · Wenbin He · Liu Ren
|
||
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Quang-Binh Nguyen · Minh Luu · Quang Nguyen · Anh Tran · Khoi Nguyen
|
||
Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration
Dongyue Wu · Zilin Guo · Jialong Zuo · Nong Sang · Changxin Gao
|
||
Less is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen · Xurui Zhou · Rui Shao · Yibo Lyu · Kaiwen Zhou · Shuai Wang · WenTao Li · Yinchuan Li · Zhongang Qi · Liqiang Nie
|
||
Background Invariance Testing According to Semantic Proximity
Zukang Liao · Min Chen
|
||
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do · Munchurl Kim
|
||
Optical Model-Driven Sharpness Mapping for Autofocus in Small Depth-of-Field and Severe Defocus Scenarios
ChenLiang Fan · Mingpei Cao · Chih Hung · Yuesheng Zhu
|
||
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing
Mohamed El Amine Boudjoghra · Ivan Laptev · Angela Dai
|
||
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
Xinyao Liao · Xianfang Zeng · Liao Wang · Gang YU · Guosheng Lin · Chi Zhang
|
||
MBTI: Masked Blending Transformers with Implicit Positional Encoding for Frame-rate Agnostic Motion Estimation
Jungwoo Huh · Yeseung Park · Seongjean Kim · Jungsu Kim · Sanghoon Lee
|
||
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov · Di Chang · Minh Tran · Hongkun Gong · Ashutosh Chaubey · Mohammad Soleymani
|
||
PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
Clinton A Mo · Kun Hu · Chengjiang Long · Dong Yuan · Wan-Chi Siu · Zhiyong Wang
|
||
Find Any Part in 3D
Ziqi Ma · Yisong Yue · Georgia Gkioxari
|
||
Inpaint4Drag: Drag-based Image Editing via Bidirectional Warping and Inpainting
Jingyi Lu · Kai Han
|
||
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen · Sungmin Eum · Doheon Lee · Rohit Shete · Chiao-Yi Wang · Heesung Kwon · Shuvra Bhattacharyya
|
||
$\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models
Dahye Kim · Xavier Thomas · Deepti Ghadiyaram
|
||
From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning
Sen Wang · Shao Zeng · Tianjun Gu · zhizhong zhang · Ruixin Zhang · Shouhong Ding · Jingyun Zhang · Jun Wang · Xin TAN · Yuan Xie · Lizhuang Ma
|
||
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
Yue Fan · Xiaojian Ma · Rongpeng Su · Jun Guo · Rujie Wu · Xi Chen · Qing Li
|
||
DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization
Yukun Huang · Yanning Zhou · Jianan Wang · Kaiyi Huang · Xihui Liu
|
||
Latent Diffusion Models with Masked AutoEncoders
Junho Lee · Jeongwoo Shin · Hyungwook Choi · Joonseok Lee
|
||
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions
Dubing Chen · Jin Fang · Wencheng Han · Xinjing Cheng · Junbo Yin · Cheng-zhong Xu · Fahad Khan · Jianbing Shen
|
||
GameFactory: Creating New Games with Generative Interactive Videos
Jiwen Yu · Yiran Qin · Xintao Wang · Pengfei Wan · Di ZHANG · Xihui Liu
|
||
From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning
Yexin Huang · Yongbin Lin · Lishengsa Yue · Zhihong Yao · Jie Wang
|
||
Generative Zoo
Tomasz Niewiadomski · Anastasios Yiannakidis · Hanz Cuevas Velasquez · Soubhik Sanyal · Michael Black · Silvia Zuffi · Peter Kulits
|
||
Event-Driven Storytelling with Multiple Lifelike Humans in a 3D scene
Donggeun Lim · Jinseok Bae · Inwoo Hwang · Seungmin Lee · Hwanhee Lee · Young Kim Kim
|
||
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang · Pengxiang Yan · Jiyang Liu · Jie Wu · Zhao Wang · Yitong Wang · Liang Lin · Guanbin Li
|
||
Secure On-Device Video OOD Detection Without Backpropagation
Li Li · Peilin Cai · Yuxiao Zhou · Zhiyu Ni · Renjie Liang · QIN YOU · Yi Nian · Zhengzhong Tu · Xiyang Hu · Yue Zhao
|
||
Towards Fine-grained Interactive Segmentation in Images and Videos
Yuan Yao · Qiushi Yang · Miaomiao Cui · Liefeng Bo
|
||
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
Hao Zhang · Haolan Xu · Chun Feng · Varun Jampani · Narendra Ahuja
|
||
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo · Liangbing Zhao · Sayak Paul · Yue Liao · Renrui Zhang · Yi Xin · Peng Gao · Mohamed Elhoseiny · Hongsheng Li
|
||
PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
Mahesh Bhosale · Abdul Wasi · Yuanhao Zhai · Yunjie Tian · Samuel Border · Nan Xi · Pinaki Sarder · Junsong Yuan · David Doermann · Xuan Gong
|
||
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
Ling Lo · Kelvin Chan · Wen-Huang Cheng · Ming-Hsuan Yang
|
||
Multi-Modal Few-Shot Temporal Action Segmentation
Zijia Lu · Ehsan Elhamifar
|
||
Future-Aware Interaction Network For Motion Forecasting
Shijie Li · Chunyu Liu · Xun Xu · Si Yong Yeo · Xulei Yang
|
||
VideoSetBench: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU · Yanjun Sun · Takuma Yagi · Shusaku Egami · Natsuki Miyata · Ken Fukuda · Kensho Hara · Ryusuke Sagawa
|
||
Haze_x0008_Flow: Revisit Haze Physical Model as ODE and Realistic Non-Homogeneous Haze Generation for Real-World Dehazing
Junseong Shin · Seungwoo Chung · Yunjeong Yang · Tae Hyun Kim
|
||
Online Generic Event Boundary Detection
Hyung Rok Jung · Daneul Kim · Seunggyun Lim · Jeany Son · Jonghyun Choi
|
||
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu · wang yuan · Zhen Shen · Xin Gao · Dechao Meng · Li'an Zhuo · Peng Zhang · Bang Zhang · Liefeng Bo
|
||
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta · Meng Zheng · Zhongpai Gao · Benjamin Planche · Anwesa Choudhuri · Terrence Chen · Amit Roy-Chowdhury · Ziyan Wu
|
||
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal · Huanrui Yang · Mark Horton · Yiran Chen
|
||
Flow Stochastic Segmentation Networks
Fabio De Sousa Ribeiro · Omar Todd · Charles Jones · Avinash Kori · Raghav Mehta · Ben Glocker
|
||
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren · Wentao Ma · Huan Yang · Cong Wei · Ge Zhang · Wenhu Chen
|
||
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang · Wei Zhang · Xiao Tan · Sibei Yang · Xiang Wan · Xiaonan Luo · Guanbin Li
|
||
Instance-Level Video Depth in Groups Beyond Occlusions
Yuan Liang · Yang Zhou · Ziming Sun · Tianyi Xiang · Guiqing Li · Shengfeng He
|
||
Wasserstein Style Distribution Analysis and Transform for Stylized Image Generation
Xi Yu · Xiang Gu · Zhihao Shi · Jian Sun
|
||
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim · Jae Young Lee · Gyojin Han · Dong-Jae Lee · Minki Jeong · Junmo Kim
|
||
Adversarial Robust Memory-Based Continual Learner
Xiaoyue Mi · Fan Tang · Zonghan Yang · Danding Wang · Juan Cao · Peng Li · Yang Liu
|
||
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng · Zihao Wang · Shaofei Cai · Anji Liu · Yitao Liang
|
||
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
Yuhui WU · Liyi Chen · Ruibin Li · Shihao Wang · Chenxi Xie · Lei Zhang
|
||
Towards Open-World Generation of Stereo Images and Unsupervised Matching
Feng Qiao · Zhexiao Xiong · Eric Xing · Nathan Jacobs
|
||
Certifiably Optimal Anisotropic Rotation Averaging
Carl Olsson · Yaroslava Lochman · Johan Malmport · Christopher Zach
|
||
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang · Yongchao Feng · Ziqi Liu · Shuai Yang · Qingjie Liu · Yunhong Wang
|
||
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Eric Slyman · Mehrab Tanjim · Kushal Kafle · Stefan Lee
|
||
Client2Vec: Improving Federated Learning by Distribution Shifts Aware Client Indexing
Yongxin Guo · Lin Wang · Xiaoying Tang · Tao Lin
|
||
Monocular Semantic Scene Completion via Masked Recurrent Networks
Xuzhi Wang · Xinran Wu · Song Wang · Lingdong Kong · Ziping Zhao
|
||
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang · Hao Tan · Peng Wang · Haian Jin · Yue Zhao · Sai Bi · Kai Zhang · Fujun Luan · Kalyan Sunkavalli · Qixing Huang · Georgios Pavlakos
|
||
Enhanced Pansharpening via Quaternion Spatial-Spectral Interactions
Dong Li · Chunhui Luo · Yuanfei Bao · Gang Yang · Jie Xiao · Xueyang Fu · Zheng-Jun Zha
|
||
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo · Fan Ma · Hehe Fan
|
||
Focal Plane Visual Feature Generation and Matching on a Pixel Processor Array
Hongyi Zhang · Laurie Bose · Jianing Chen · Piotr Dudek · Walterio Mayol-Cuevas
|
||
Measuring the Impact of Rotation Equivariance on Aerial Object Detection
Xiuyu Wu · Xinhao Wang · Xiubin Zhu · Lan Yang · Jiyuan Liu · Xingchen Hu
|
||
MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network
Jianfei Jiang · Qiankun Liu · Haochen Yu · Hongyuan Liu · Liyong Wang · Jiansheng Chen · Huimin Ma
|
||
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard · Tom Durand · Adrien RAHARY · Maks Ovsjanikov
|
||
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Baik · Hyeonwoo Kim · Hanbyul Joo
|
||
CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling
Trong-Thang Pham · AKASH AWASTHI · Saba Khan · Esteban Marti · Tien-Phat Nguyen · Khoa Vo · Minh Tran · Ngoc Nguyen · Cuong Van · Yuki Ikebe · Anh Nguyen · Anh Nguyen · Zhigang Deng · Carol Wu · Hien Nguyen · Ngan Le
|
||
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke · Suzannah Wistreich · Yanjie Ze · Jiajun Wu
|
||
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou · Dimitrios Karageorgiou · Symeon Papadopoulos · Panagiotis Petrantonakis
|
||
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu · Zhengjian Yao · Lujia Jin · Hangzhou He · Yanye Lu
|
||
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang · Ziwei Niu · Hongyi Wang · Yen-wei Chen · Lanfen Lin
|
||
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Jingjing Ren · Wenbo Li · Zhongdao Wang · Haoze Sun · Bangzhen Liu · Haoyu Chen · Jiaqi Xu · Aoxue Li · Shifeng Zhang · Bin Shao · Yong Guo · Lei Zhu
|
||
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan · Zining Wang · Pei Fu · Zhentao Guo · Wei Shen · Kai zhou · Tiezhu Yue · Chen Duan · Hao Sun · Qianyi Jiang · Junfeng Luo · Xiaokang Yang
|
||
Improving Noise Efficiency in Privacy-preserving Dataset Distillation
Runkai Zheng · Vishnu Dasu · Yinong Wang · Haohan Wang · Fernando De la Torre
|
||
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan · Anabia Sohail · Muzammal Naseer · Naoufel Werghi
|
||
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Yue-Jiang Dong · Wang Zhao · Jiale Xu · Ying Shan · Song-Hai Zhang
|
||
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN · Wenzhen Yuan · Xian Gao · Ye Guo · Daoxin Zhang · Zhe Xu · Yao Hu · Ting Liu · yuzhuo fu
|
||
LDIP: Long Distance Information Propagation for Video Super-Resolution
Michael Bernasconi · Abdelaziz Djelouah · Yang Zhang · Markus Gross · Christopher Schroers
|
||
Blind Noisy Image Deblurring Using Residual Guidance Strategy
heyan liu · Jianing Sun · Jun Liu · Xi-Le Zhao · Tingting WU · Tieyong Zeng
|
||
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou · Jiezhang Cao · Zicheng Zhang · Farong Wen · Jiang Yanwei · Jun Jia · Xiaohong Liu · Xiongkuo Min · Guangtao Zhai
|
||
Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization
Zhaoyang Wu · Fang Liu · Licheng Jiao · Shuo Li · Lingling Li · Xu Liu · Puhua Chen · wenping ma
|
||
End-to-End Entity-Predicate Association Reasoning for Dynamic Scene Graph Generation
LiWei Wang · YanDuo Zhang · Tao Lu · Fang Liu · Huiqin Zhang · Jiayi Ma · Huabing Zhou
|
||
TurboVSR: Fantastic Video Upscalers and Where to Find Them
Zhongdao Wang · Guodongfang Zhao · Jingjing Ren · bailan feng · Shifeng Zhang · Wenbo Li
|
||
STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning
Guilian Chen · Huisi Wu · Jing Qin
|
||
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye · Honglin Lin · Leyan Ou · Dairong Chen · Zihao Wang · Qi Zhu · Conghui He · Weijia Li
|
||
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
Guosheng Zhao · Xiaofeng Wang · Chaojun Ni · Zheng Zhu · Wenkang Qin · Guan Huang · Xingang Wang
|
||
WorldScore: Unified Evaluation Benchmark for World Generation
Haoyi Duan · Hong-Xing Yu · Sirui Chen · Li Fei-Fei · Jiajun Wu
|
||
Diffusion-based Source-biased Model for Single Domain Generalized Object Detection
Jiang Han · Wenfei Yang · Tianzhu Zhang · Yongdong Zhang
|
||
EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang · Bohan Yu · Siqi Yang · Haotian Zhuang · Jieji Ren · Peiqi Duan · Boxin Shi
|
||
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng · Alex Schwing
|
||
Riemannian-Geometric Fingerprints of Generative Models
Hae Jin Song · Laurent Itti
|
||
Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance
Shuchao Pang · Zhenghan Chen · Shen Zhang · Liming Lu · Siyuan Liang · Anan Du · Yongbin Zhou
|
||
Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng · Ruiqi Lei · Di Huang · Zhichao Liao · Fengyuan Piao · Yan Chen · Pingfa Feng · Long ZENG
|
||
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim · Byeongsu Sim
|
||
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Bowen Wang · Zhouqiang Jiang · Yasuaki Susumu · Shotaro Miwa · Tianwei Chen · Yuta Nakashima
|
||
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
Zelong Sun · Dong Jing · Zhiwu Lu
|
||
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
Jianqi Chen · Biao Zhang · Xiangjun Tang · Peter Wonka
|
||
PolarAnything: Diffusion-based Polarimetric Image Synthesis
Kailong Zhang · Youwei Lyu · Heng Guo · Si Li · Zhanyu Ma · Boxin Shi
|
||
Polarimetric Neural Field with Unified Complex-Valued Wavefunction
Chu Zhou · Yixin Yang · Junda Liao · Heng Guo · Boxin Shi · Imari Sato
|
||
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li · Xianfeng Tan · Wenxiang Shang · Yubo Wu · Jian Wang · Xuanhong Chen · Yi Zhang · Zhu Hangcheng · Bingbing Ni
|
||
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin · Alchan Hwang · Yujin Kim · Daneul Kim · Jaesik Park
|
||
Long-Context State-Space Video World Models
Ryan Po · Yotam Nitzan · Richard Zhang · Berlin Chen · Tri Dao · Eli Shechtman · Gordon Wetzstein · Xun Huang
|
||
Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang · YUNSU PARK · Youngbeom Yoo · Yeeun Choi · Seon Joo Kim
|
||
Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Yupeng Hu · Changxing Ding · Chang Sun · Shaoli Huang · Xiangmin Xu
|
||
DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad · Yogesh Rawat
|
||
InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild
Yiyi Ma · Yuanzhi Liang · Xiu Li · Chi Zhang · Xuelong Li
|
||
Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation
Chen Gao · Shuo Zhang · Youfang Lin
|
||
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Sihang Li · Zeyu Jiang · Grace Chen · Chenyang Xu · Siqi Tan · Xue Wang · Irving Fang · Kristof Zyskowski · Shannon McPherron · Radu Iovita · Chen Feng · Jing Zhang
|
||
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu · Qizhi Chen · Zhigang Wang · Yiwen Tang · Yiting Zhang · Chi Yan · Dong Wang · Xuelong Li · Bin Zhao
|
||
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao · Lidong Yu · Zhidan Liu · Jiaxi Zeng · Yuwei Wu · Yunde Jia
|
||
DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo · Lizhuo Luo · Jianru Xu · Jiajun Song · Rongwei Lu · Chen Tang · Zhi Wang
|
||
Emulating Self-attention with Convolution for Efficient Image Super-Resolution
Dongheon Lee · Seokju Yun · Youngmin Ro
|
||
BlinkTrack: Feature Tracking over 80 FPS via Events and Images
Yichen Shen · Yijin Li · Shuo Chen · Guanglin Li · Zhaoyang Huang · Hujun Bao · Zhaopeng Cui · Guofeng Zhang
|
||
GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
Shaowen Tong · Zimin Xia · Alexandre Alahi · Xuming He · Yujiao Shi
|
||
Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints
Dinh-Vinh-Thuy Tran · Ruochen Chen · Shaifali Parashar
|
||
Sequential Gaussian Avatars with Hierarchical Motion Context
Wangze Xu · Yifan Zhan · Zhihang Zhong · Xiao Sun
|
||
Attention to the Burtiness in Visual Prompt Tuning!
Yuzhu Wang · Manni Duan · Shu Kong
|
||
Federated Representation Angle Learning
Liping Yi · Han Yu · Gang Wang · xiaoguang Liu · Xiaoxiao Li
|
||
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei · Yujie Zhong · yingsen zeng · Haoxian Tan · Yong Liu · Hongfa Wang · Yujiu Yang
|
||
How To Make Your Cell Tracker Say "I dunno!"
Richard D Paul · Johannes Seiffarth · David Rügamer · Hanno Scharr · Katharina Nöh
|
||
Visual Surface Wave Tomography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren · Berthy Feng · Jihoon Ahn · Katherine Bouman · Chiara Daraio
|
||
Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack
Xingshuo Han · Xuanye Zhang · Xiang Lan · Haozhao Wang · Shengmin Xu · Shen Ren · Jason Zeng · Ming Wu · Michael Heinrich · Tianwei Zhang
|
||
YOLOE: Real-Time Seeing Anything
Ao Wang · Lihao Liu · Hui Chen · Zijia Lin · Jungong Han · Guiguang Ding
|
||
SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation
Shiqi Huang · Shuting He · Huaiyuan Qin · Bihan Wen
|
||
MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP
Pei An · Jiaqi Yang · Muyao Peng · You Yang · Qiong Liu · Xiaolin Wu · Liangliang Nan
|
||
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation
Jiahao Zhu · Zixuan Chen · Guangcong Wang · Xiaohua Xie · Yi Zhou
|
||
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang · Pingping Liu · Yubing Lu · Mengen Cai · Zijian Zhang · Zhe Zhang · Qiuzhan Zhou
|
||
MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions
Qingyuan Zhou · Yuehu Gong · Weidong Yang · Jiaze Li · Yeqi Luo · Baixin Xu · Shuhao Li · Ben Fei · Ying He
|
||
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
Zenghao Niu · Weicheng Xie · Siyang Song · Zitong YU · Feng Liu · Linlin Shen
|
||
Streamlining Image Editing with Layered Diffusion Brushes
Peyman Gholami · Robert Xiao
|
||
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin · Shuting He · Cheston Tan · Bihan Wen
|
||
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha · Subhankar Roy · Sarthak Mehrotra · Ankit Jha · Moloud Abdar · Biplab Banerjee · Elisa Ricci
|
||
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
Tiancheng SHEN · Jun Hao Liew · Zilong Huang · Xiangtai Li · Zhijie Lin · Jiyang Liu · Yitong Wang · Jiashi Feng · Ming-Hsuan Yang
|
||
MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
Yanchen Liu · Yanan SUN · Zhening Xing · Junyao Gao · Kai Chen · Wenjie Pei
|
||
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua · Lutao Jiang · Ying-Cong Chen · Wufan Zhao
|
||
Stable Diffusion Models are Secretly Good at Visual In-Context Learning
Trevine Oorloff · Vishwanath S · Wele Gedara Chaminda Bandara · Ali Shafahi · Amin Ghiasi · Charan Prakash · Reza Ardekani
|
||
MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction
Zikun Xu · Shaobing Xu
|
||
Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization
Ashutosh Anshul · Shreyas Gopal · Deepu Rajan · Eng Chng
|
||
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee · Tae-Kyun Kim
|
||
A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks
Hang Su · Yunlong Feng · Daniel Gehrig · Panfeng Jiang · Ling Gao · Xavier Lagorce · Laurent Kneip
|
||
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng · Shiyu Hu · Xuchen Li · Dailing Zhang · Meiqi Wu · Jing Zhang · Xiaotang Chen · Kaiqi Huang
|
||
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu · Mehrtash Harandi
|
||
PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation
Zhihao ZHU · Yifan Zheng · Siyu Pan · Yaohui Jin · Yao Mu
|
||
NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Amirhossein Ansari · Ke Wang · Pulei Xiong
|
||
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai · Jiangning Zhang · Haoyang He · Xinwei He · Ao Tong · Zhenye Gan · Chengjie Wang · Zhucun Xue · Yong Liu · Xiang Bai
|
||
MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction
Zijian Dong · Longteng Duan · Jie Song · Michael Black · Andreas Geiger
|
||
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding · Wu Shenxi · Xiangyu Zhao · Yuhang Zang · Haodong Duan · Xiaoyi Dong · Pan Zhang · Yuhang Cao · Dahua Lin · Jiaqi Wang
|
||
Can Knowledge be Transferred from Unimodal to Multimodal? Investigating the Transitivity of Multimodal Knowledge Editing
Lingyong Fang · Xinzhong Wang · Depeng depeng wang · Zongru Wu · Ya Guo · Huijia Zhu · Zhuosheng Zhang · Gongshen Liu
|
||
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin · Yannic Neuhaus · Matthias Hein
|
||
FedAGC: Federated Continual Learning with Asymmetric Gradient Correction
Chengchao Zhang · Fanhua Shang · Hongying Liu · Liang Wan · Wei Feng
|
||
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seunggwan Lee · Hwanhee Jung · ByoungSoo Koh · Qixing Huang · Sang Yoon · Sangpil Kim
|
||
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu · Qize Yang · Yuan-Ming Li · Yi-Xing Peng · Kun-Yu Lin · Xihan Wei · Jian-Fang Hu · Xiaohua Xie · Wei-Shi Zheng
|
||
Prototype-based Contrastive Learning with Stage-wise Progressive Augmentation for Self-Supervised Fine-Grained Learning
BaoFeng Tan · Xiu-Shen Wei · Lin Zhao
|
||
Extrapolated Urban View Synthesis Benchmark
Xiangyu Han · Zhen Jia · Boyi Li · Yan Wang · Boris Ivanovic · Yurong You · Lingjie Liu · Yue Wang · Marco Pavone · Chen Feng · Yiming Li
|
||
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia · Chuanwei Huang · Hongyan Fei · Yeshuang Zhu · Zhiqiang Yuan · Ying Deng · Jiapei Zhang · Jinchao Zhang · Jie Zhou
|
||
RogSplat: Robust Gaussian Splatting via Generative Priors
Hanyang Kong · Xingyi Yang · Xinchao Wang
|
||
Split-and-Combine: Enhancing Style Augmentation for Single Domain Generalization
Lichuan Gu · Shuai Yang · Qianlong Dang · Zhize Wu · LiChuan Gu
|
||
FonTS: Text Rendering With Typography and Style Controls
Wenda SHI · Yiren Song · Dengming Zhang · Jiaming Liu · XINGXING ZOU
|
||
SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM
Yannick Burkhardt · Simon Schaefer · Stefan Leutenegger
|
||
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang · Zizhao Li · Amir Khodabandeh · Kourosh Khoshelham
|
||
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos · Vladan Stojnić · Giorgos Kordopatis-Zilos · Yuta Nakashima · Giorgos Tolias · Noa Garcia
|
||
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang · Yuzhang Shang · Zhihang Yuan · Junyi Wu · Junchi Yan · Yan Yan
|
||
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng · Qiming HOU · Zhong Ren · Kun Zhou
|
||
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Liuyi Wang · Xinyuan Xia · Hui Zhao · Hanqing Wang · Tai Wang · Yilun Chen · Chengju Liu · Qijun Chen · Jiangmiao Pang
|
||
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
Jie Feng · Shengyuan Wang · Tianhui Liu · Yanxin Xi · Yong Li
|
||
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
Haicheng Wang · Zhemeng Yu · Gabriele Spadaro · Chen Ju · Victor Quétu · Shuai Xiao · Enzo Tartaglione
|
||
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
shengyuan zhang · An Zhao · Ling Yang · Zejian Li · Chenye Meng · Haoran Xu · Tianrun Chen · AnYang Wei · Perry GU · Lingyun Sun
|
||
On the Recovery of Cameras from Fundamental Matrices
Rakshith Madhavan · Federica Arrigoni
|
||
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu · Yusu Qian · Chen Chen · Wenze Hu · Zhe Gan · Yinfei Yang
|
||
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang · Anpei Chen · Volodymyr Havrylov · Andreas Geiger · Dan Zhang
|
||
LHM: Animatable Human Reconstruction from a Single Image in One Second
Lingteng Qiu · Xiaodong Gu · Peihao Li · Qi Zuo · Weichao Shen · Junfei Zhang · Kejie Qiu · Weihao Yuan · Guanying Chen · Zilong Dong · Liefeng Bo
|
||
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
zijie wu · Chaohui Yu · Fan Wang · Xiang Bai
|
||
Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation
Pengfei Ren · Jingyu Wang · Haifeng Sun · Qi Qi · Xingyu Liu · Menghao Zhang · Lei Zhang · Jing Wang · Jianxin Liao
|
||
LayerAnimate: Layer-level Control for Animation
Yuxue Yang · Lue Fan · Zuzeng Lin · Feng Wang · Zhaoxiang Zhang
|
||
Everything is a Video: Unifying Modalities through Next-Frame Prediction
G Thomas Hudson · Dean Slack · Thomas Winterbottom · Jamie Stirling · Chenghao Xiao · Junjie Shentu · Noura Al Moubayed
|
||
GenM$^3$: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi · Lijiang LIU · Yong Sun · Zhiyuan Zhang · JINNI ZHOU · Qiang Nie
|
||
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recogntion
Zefeng Qian · Xincheng Yao · Yifei Huang · Chong-Yang Zhang · Jiangyong Ying · Hong Sun
|
||
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps
Chong Cheng · Sicheng Yu · Zijian Wang · Yifan Zhou · Hao Wang
|
||
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang · Qixiang ZHANG · Lehan Wang · Xuanqi Huang · Xiaomeng Li
|
||
Towards Performance Consistency in Multi-Level Model Collaboration
Qi Li · Runpeng Yu · Xinchao Wang
|
||
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu · Yufei Yin · Chenchen Jing · Muzhi Zhu · Hao Chen · Yuling Xi · Bo Feng · Hao Wang · Shiyu Li · Chunhua Shen
|
||
7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
Zhongpai Gao · Benjamin Planche · Meng Zheng · Anwesa Choudhuri · Terrence Chen · Ziyan Wu
|
||
OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM
Jinhong Wang · Shuo Tong · Jintai CHEN · Jian liu · Dongqi Tang · Weiqiang Wang · Wentong Li · Hongxia Xu · Danny Chen · Jian Wu
|
||
Neural Shell Texture Splatting: More Details and Fewer Primitives
Xin Zhang · Anpei Chen · Jincheng Xiong · Pinxuan Dai · Yujun Shen · Weiwei Xu
|
||
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min · Muli Yang · Jinhao Zhang · Yuxuan Wang · Aming WU · Cheng Deng
|
||
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong · Jegyeong Cho · Youngho Yoon · Kuk-Jin Yoon
|
||
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie · Yinhong Liu · Penghao Zhou · Chen Zhao · Jun Zhou · Kai Zhang · Zhenyu Zhang · Jian Yang · Zhenheng Yang · Ying Tai
|
||
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
Michihiro Kuroki · Toshihiko Yamasaki
|
||
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung · Hyoungseob Park · Patrick Rim · Xiaoran Zhang · Jihe He · Ziyao Zeng · Safa Cicek · Byung-Woo Hong · James Duncan · Alex Wong
|
||
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
Chensheng Peng · Ido Sobol · Masayoshi Tomizuka · Kurt Keutzer · Chenfeng Xu · Or Litany
|
||
Golden Noise for Diffusion Models: A Learning Framework
zikai zhou · Shitong Shao · Lichen Bai · Shufei Zhang · zhiqiang xu · Bo Han · Zeke Xie
|
||
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
Hanxiao Jiang · Hao-Yu Hsu · Kaifeng Zhang · Hsin-Ni Yu · Shenlong Wang · Yunzhu Li
|
||
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu · Kehan Li · Dongbai Li · Yue He · Xingxuan Zhang · Peng Cui
|
||
Partially Matching Submap Helps: Uncetainty Modeling and Propagation for Text to Point Cloud Localization
Mingtao Feng · Longlong Mei · Zijie Wu · Jianqiao Luo · Fenghao Tian · Jie Feng · Weisheng Dong · Yaonan Wang
|
||
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Xin Shen · Xinyu Wang · Lei Shen · Kaihao Zhang · Xin Yu
|
||
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo · Friedhelm Hamann · Guillermo Gallego
|
||
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
Bowen Zhang · Sicheng Xu · Chuxin Wang · Jiaolong Yang · Feng Zhao · Dong Chen · Baining Guo
|
||
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu · Kevin Duarte · Mamshad Nayeem Rizve · Chengyuan Xu · Ratheesh Kalarot · Junsong Yuan
|
||
Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu · Lei Zhang · Jiayi Li · Lituan Wang · Yifan Guan · Yu Yan · Leyi Zhang · Hao Song
|
||
Closed Loop Optimal Transport for Unsupervised Action Segmentation
Elena Bueno-Benito · Mariella Dimiccoli
|
||
Spectral Image Tokenizer
Carlos Esteves · Mohammed Suhail · Ameesh Makadia
|
||
ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis
Benjin Zhu · Xiaogang Wang · Hongsheng Li
|
||
Aether: Geometric-Aware Unified World Modeling
Haoyi Zhu · Yifan Wang · Jianjun Zhou · Wenzheng Chang · Yang Zhou · Zizun Li · Junyi Chen · Chunhua Shen · Jiangmiao Pang · Tong He
|
||
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou · Ruiyi Zhang · Huaisheng Zhu · Branislav Kveton · Yufan Zhou · Jiuxiang Gu · Jian Chen · Changyou Chen
|
||
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Yihong Luo · Tianyang Hu · Jiacheng Sun · Yujun Cai · Jing Tang
|
||
HERA: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
YUXUAN LUO · Zhengkun Rong · Lizhen Wang · Longhao Zhang · Tianshu Hu
|
||
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li · Jichang Li · Yinqi Cai · Junye Chen · Xiaonan Luo · Guanbin Li · Rushi Lan
|
||
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Lei Tian · Xiaomin Li · Liqian Ma · Hao Yin · Zirui Zheng · Hefei Huang · Taiqing Li · Huchuan Lu · Xu Jia
|
||
ConstStyle: Robust Domain Generalization with Unified Style Transformation
Nam Duong Tran · Nam Nguyen Phuong · Hieu Pham · Phi Le Nguyen · My Thai
|
||
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization
ZUYU ZHANG · Ning Chen · Yongshan Liu · Qinghua Zhang · Xu Zhang
|
||
LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes
Juliette Marrie · Romain Menegaux · Michael Arbel · Diane Larlus · Julien Mairal
|
||
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
Lanmiao Liu · Esam Ghaleb · asli ozyurek · Zerrin Yumak
|
||
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal
Jinpei Guo · Zheng Chen · Wenbo Li · Yong Guo · YULUN ZHANG
|
||
SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion
Yuxi Xiao · Jianyuan Wang · Nan Xue · Nikita Karaev · Iurii Makarov · Bingyi Kang · Xing Zhu · Hujun Bao · Yujun Shen · Xiaowei Zhou
|
||
SFUOD: Source-Free Unknown Object Detection
Keon-Hee Park · Seun-An Choe · Gyeong-Moon Park
|
||
E-SAM: Training-Free Segment Every Entity Model
WEIMING ZHANG · Dingwen Xiao · Lei Chen · Lin Wang
|
||
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
Jianzhe Gao · Rui Liu · Wenguan Wang
|
||
PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution
Yong Liu · Hang Dong · Jinshan Pan · Qingji dong · Kai Chen · Rongxiang Zhang · Lean Fu · Fei Wang
|
||
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey · Tianyu Zhang · Shu Ishida · John Thompson · Amir Khasahmadi · Joseph Lambourne · Pradeep Kumar Jayaraman · Karl Willis
|
||
Understanding Museum Exhibits using Vision-Language Reasoning
Ada-Astrid Balauca · Sanjana Garai · Stefan Balauca · Rasesh Shetty · Naitik Agrawal · Dhwanil Shah · Yuqian Fu · Xi Wang · Kristina Toutanova · Danda Pani Paudel · Luc Gool
|
||
A$^3$GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting
Zhiyuan Fang · Rengan Xie · Xuancheng Jin · Qi Ye · Wei Chen · Wenting Zheng · Rui Wang · Yuchi Huo
|
||
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi · Minjing Dong · Chang Xu
|
||
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen · Huan Zheng · Yucheng Zhou · Xianfei Li · Wenlong Liao · Tao He · Pai Peng · Jianbing Shen
|
||
ARMO: Autoregressive Rigging for Multi-Category Objects
mingze sun · Shiwei Mao · Keyi Chen · Yurun Chen · Shunlin Lu · Jingbo Wang · Junting Dong · Ruqi Huang
|
||
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
Haoran Chen · Ping Wang · Zihan Zhou · Xu Zhang · Zuxuan Wu · Yu-Gang Jiang
|
||
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen · Jingxi Yu · Zichen Miao · Qiang Qiu
|
||
CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation
Yi Liu · Shengqian Li · Zuzeng Lin · Feng Wang · Si Liu
|
||
Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu · Yuxiao Yang · Jialun Liu · Jinbo Wu · Chen Zhao · Yanye Lu
|
||
Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts
Viet Nguyen · Anh Nguyen · Trung Dao · Khoi Nguyen · Cuong Pham · Toan Tran · Anh Tran
|
||
After the Party: Navigating the Mapping From Color to Ambient Lighting
Florin-Alexandru Vasluianu · Tim Seizinger · Zongwei Wu · Radu Timofte
|
||
Probabilistic Inertial Poser (ProbIP): Uncertainty-aware Human Motion Modeling from Sparse Inertial Sensors
Min Kim · Younho Jeon · Sungho Jo
|
||
Perspective-Invariant 3D Object Detection
Alan Liang · Lingdong Kong · Dongyue Lu · Youquan Liu · Jian Fang · Huaici Zhao · Wei Tsang Ooi
|
||
What If: Understanding Motion Through Sparse Interactions
Stefan A. Baumann · Nick Stracke · Timy Phan · Björn Ommer
|
||
FA: Forced prompt leArning of Vision-Language Models for Out-of-Distribution Detection
Xinhua Lu · Runhe Lai · Yanqi Wu · Kanghao Chen · Wei-Shi Zheng · Ruixuan Wang
|
||
Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
Sebastian Schmidt · Julius Koerner · Dominik Fuchsgruber · Stefano Gasperini · Federico Tombari · Stephan Günnemann
|
||
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry Chen · Yi Wei · Luowei Zhou · Suren Kumar
|
||
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang · Hao Luo · Yuan Sun · Qingsen Yan · Haokui Zhang · Wei Dong · Guoqing Wang · Peng Wang · Yang Yang · Heng Tao Shen
|
||
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song · Chenwei Liang · Yan Xia · Walter Zimmer · Hu Cao · Holger Caesar · Andreas Festag · Alois Knoll
|
||
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin · Ting Lei · Yang Liu
|
||
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen · Ting Han · Changshe Zhang · Xin Luo · Meiliu Wu · Guorong Cai · Jinhe Su
|
||
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi · Louis-Philippe Morency · Hongyu Gong · Michael Zollhöfer · Dimitris Samaras · Alexander Richard
|
||
UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis
Zixiang Ai · Zhenyu Cui · Yuxin Peng · Jiahuan Zhou
|
||
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye · Jun He · Weijia Li · Zhutao Lv · Yi Lin · Jinhua Yu · Haote Yang · Conghui He
|
||
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Xiuyu Yang · Shuhan Tan · Philipp Kraehenbuehl
|
||
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan · Tony Xu · Pushpak Pati · Faruk Ahmed · Maged Goubran · Anne Martel
|
||
Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Jian-Jian Jiang · Xiao-Ming Wu · Yi-Xiang He · Ling-An Zeng · Yilin Wei · Dandan Zhang · Wei-Shi Zheng
|
||
PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation
Xiaoyang Hao · Han Li
|
||
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho · Jeongsoo Choi · Sungnyun Kim · Se-Young Yun
|
||
DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data
Junjie Wu · Jiangtao Xie · Zhaolin Zhang · Qilong Wang · Qinghua Hu · Peihua Li · Sen Xu
|
||
StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth
Zheng Zhang · Lihe Yang · Tianyu Yang · Chaohui Yu · Xiaoyang Guo · Yixing Lao · Hengshuang Zhao
|
||
Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model
Zewei Xin · Qinya Li · Chaoyue Niu · Fan Wu · Guihai Chen
|
||
GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments
Lin Zeng · Boming Zhao · Jiarui Hu · Xujie Shen · Ziqiang Dang · Hujun Bao · Zhaopeng Cui
|
||
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou · Zhanning Gao · Zhili Chen · Maosheng Ye · Qifeng Chen · Tongyi Cao · Honggang Qi
|
||
Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval
Zhe Li · Lei Zhang · Zheren Fu · Kun Zhang · Zhendong Mao
|
||
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
yitong jiang · Jinwei Gu · Tianfan Xue · Ka Chun Cheung · Pavlo Molchanov · Hongxu Yin · Sifei Liu
|
||
Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation
Yulin Wang · Mengting Hu · Hongli Li · Chen LUO
|
||
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng · Yangyang Guo · Qingpei Guo · Ming Yang · Tian Gan · Weili Guan · Liqiang Nie
|
||
Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
Qi Chen · Lingxiao Yang · Yun Chen · Nailong Zhao · Jianhuang Lai · Jie Shao · Xiaohua Xie
|
||
TransiT: Transient Transformer for Non-line-of-sight Videography
Ruiqian Li · Siyuan Shen · Suan Xia · Ziheng Wang · Xingyue Peng · Chengxuan Song · Yingsheng Zhu · Tao Wu · Shiying Li · Jingyi Yu
|
||
Removing Cost Volumes from Optical Flow Estimators
Simon Kiefhaber · Stefan Roth · Simone Schaub-Meyer
|
||
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Jun Zhang · Desen Meng · Zhengming Zhang · Zhenpeng Huang · Tao Wu · Limin Wang
|
||
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
Zhiqiang Yan · Zhengxue Wang · Haoye Dong · Jun Li · Jian Yang · Gim Hee Lee
|
||
SpikePack: Enhanced Information Flow in Spiking Neural Networks with High Hardware Compatibility
Guobin Shen · Jindong Li · Tenglong Li · Dongcheng Zhao · Yi Zeng
|
||
Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning
Fei Zhou · Peng Wang · Lei Zhang · Wei Wei · Chen Ding · Guosheng Lin · Yanning Zhang
|
||
Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma · Yuying Ge · Teng Wang · Yuxin Guo · Yixiao Ge · Ying Shan
|
||
Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion
Yuan Bian · Min Liu · Yunqi Yi · Xueping Wang · Shuai Jiang · Yaonan Wang
|
||
Debiased Teacher for Day-to-Night Domain Adaptive Object Detection
Yiming Cui · Liang Li · Haibing YIN · Yuhan Gao · Yaoqi Sun · Chenggang Yan
|
||
ST-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li · Yiming Zhang · Tao Lin · Xiangrui Liu · Wenxiao Cai · Zheng Liu · Bo Zhao
|
||
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
Jiale Zhao · XINYANG JIANG · Junyao Gao · Yuhao Xue · Cairong Zhao
|
||
Trial-Oriented Visual Rearrangement
Yuyi Liu · Xinhang Song · Tianliang Qi · Shuqiang Jiang
|
||
Mixture-of-Scores: Robust Image-Text Data Quality Score via Three Lines of Code
WU Sitong · Haoru Tan · Yukang Chen · Shaofeng Zhang · Jingyao Li · Bei Yu · Xiaojuan Qi · Jiaya Jia
|
||
Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement
Ruitao Wu · Yifan Zhao · Jia Li
|
||
Geometry Distributions
Biao Zhang · Jing Ren · Peter Wonka
|
||
PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models
pengzhen chen · Yanwei Liu · Xiaoyan Gu · Enci Liu · Zhuoyi Shang · Xiangyang Ji · Wu Liu
|
||
VCA: Video Curious Agent for Long Video Understanding
Zeyuan Yang · Delin Chen · Xueyang Yu · Maohao Shen · Chuang Gan
|
||
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu · Qingnan Fan · Qi Zhang · Jinwei Chen · Huaqi Zhang · Chao Xu · Boxin Shi
|
||
SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
Stathis Galanakis · Alexandros Lattas · Stylianos Moschoglou · Bernhard Kainz · Stefanos Zafeiriou
|
||
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang · Huiyu Duan · Yu Zhao · Juntong Wang · Guangtao Zhai · Xiongkuo Min
|
||
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Hao Li · Ju Dai · Feng Zhou · Kaida Ning · Lei Li · Junjun Pan
|
||
DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection
Xiaolei Wang · Xiaoyang Wang · Huihui Bai · ENG LIM · Jimin XIAO
|
||
SynCity: Training-Free Generation of 3D Cities
Paul Engstler · Aleksandar Shtedritski · Iro Laina · Christian Rupprecht · Andrea Vedaldi
|
||
HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
Junyi Guo · Jingxuan Zhang · Fangyu Wu · Huanda Lu · Qiufeng Wang · Wenmian Yang · ENG LIM · Dongming Lu
|
||
Driving Scene Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang · Zijie Pan · Yuankun Yang · Xiatian Zhu · Li Zhang
|
||
AJAHR: Amputated Joint Aware 3D Human Mesh Recovery
hyunjin cho · Giyun choi · Jongwon Choi
|
||
Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images
Changha Shin · Woong Oh Cho · Seon Joo Kim
|
||
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li · Zeyu Cai · Michael Black · Haiwen Feng · Yuliang Xiu
|
||
FaceShield: Defending Facial Image against Deepfake Threats
Jaehwan Jeong · Sumin In · Sieun Kim · Shin yi · Jongheon Jeong · Sang Yoon · Jaewook Chung · Sangpil Kim
|
||
Engage for All: Making Ordinary Image Descriptions Appealing Again!
Yuyan (Yolanda) Chen · Yifan Jiang · Li Zhou · Jinghan Cao · Yu Guan · Ming Yang · Qingpei Guo
|
||
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park · Hyojun Go · Hyelin Nam · Byung-Hoon Kim · Hyungjin Chung · Changick Kim
|
||
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li · Yiming Zhu · Yifan Huang · Wei Zhang · Yingzhe He · Jie Shi · Xiaolin Hu
|
||
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang · Zhitong Xiong · Chenying Liu · Adam Stewart · Thomas Dujardin · Nikolaos Ioannis Bountos · Angelos Zavras · Franziska Gerken · Ioannis Papoutsis · Laura Leal-Taixé · Xiao Xiang Zhu
|
||
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong · Jun Hao Liew · Zilong Huang · Jiashi Feng · Xihui Liu
|
||
Boundary Probing for Input Privacy Protection When Using LMM Services
Xiaofei Hui · Haoxuan Qu · Ping Hu · Hossein Rahmani · Jun Liu
|
||
How Can Objects Help Video-Language Understanding?
Zitian Tang · Shijie Wang · Junho Cho · Jaewook Yoo · Chen Sun
|
||
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Sanghun Jung · Jingjing Zheng · Ke Zhang · Nan Qiao · Albert Y. C. Chen · Lu Xia · Chi Liu · Yuyin Sun · Xiao Zeng · Hsiang-Wei Huang · Byron Boots · Min Sun · Cheng-Hao Kuo
|
||
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
Donglin Di · He Feng · Wenzhang SUN · Yongjia Ma · Hao Li · Chen Wei · Lei Fan · Tonghua Su · Xun Yang
|
||
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu · Yixin Chen · Yu Liu · Jiaxiang Tang · Junfeng Ni · Diwen Wan · Gang Zeng · Siyuan Huang
|
||
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Mark YU · Wenbo Hu · Jinbo Xing · Ying Shan
|
||
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
Alexey Kravets · Da Chen · Vinay Namboodiri
|
||
Hierarchical Divide-and-Conquer Grouping for Classification Adaptation of Pre-Trained Models
Ziqian Lu · Yunlong Yu · Qinyue Tong · Jun Liu
|
||
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal · Eldon Schoop · Xinlei Pan · Ari Seff · Anuj Mahajan · Di Feng · Ruijia Cheng · Andres Teran · Esteban Gomez · Abhishek Sundararajan · Forrest Huang · Amanda Swearngin · Mohana Moorthy · Jeffrey Nichols · Alexander Toshev
|
||
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Jiazhe Guo · Yikang Ding · Xiwu Chen · Shuo Chen · Bohan Li · Yingshuang Zou · Xiaoyang Lyu · Feiyang Tan · Xiaojuan Qi · Zhiheng Li · Hao Zhao
|
||
An Empirical Study of Autoregressive Pre-training from Videos
Jathushan Rajasegaran · Ilija Radosavovic · Rahul Ravishankar · Yossi Gandelsman · Christoph Feichtenhofer · Jitendra Malik
|
||
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang · Jianglin Lu · Yitian Zhang · Yun Fu
|
||
TopicGeo: An Efficient Unified Framework for Geolocation
Xin Wang · Xinlin Wang · Shuiping Gou
|
||
GlassWizard: Harvesting Diffusion Priors for Glass Surface Detection
Wenxue Li · Tian Ye · Xinyu Xiong · Jinbin Bai · feilong tang · Wenxuan Song · Zhaohu Xing · Lie Ju · Guanbin Li · Lei Zhu
|
||
MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion
Zebin He · Mx Yang · Shuhui Yang · Yixuan Tang · Tao Wang · Kaihao Zhang · Guanying Chen · Lliu Yuhong · Jie Jiang · Chunchao Guo · Wenhan Luo
|
||
Generating Physically Stable and Buildable LEGO Designs from Text
Ava Pun · Kangle Deng · Ruixuan Liu · Deva Ramanan · Changliu Liu · Jun-Yan Zhu
|
||
SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement
Liwen Xiao · Zhiyu Pan · Zhicheng Wang · Zhiguo Cao · Wei Li
|
||
InfoBridge: Balanced Multimodal Integration through Conditional Dependency Modeling
Chenxin Li · Yifan Liu · Panwang Pan · Hengyu Liu · Xinyu Liu · Wuyang Li · Cheng Wang · Weihao Yu · Yiyang LIN · Yixuan Yuan
|
||
TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction
Zewei Zhou · Zhihao Zhao · Tianhui Cai · Zhiyu Huang · Bolei Zhou · Jiaqi Ma
|
||
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang · Liang Pan · Zhiyang Dou · Jidong Mei · Zhouyingcheng Liao · Yifan Wu · Yuke Lou · Jingbo Wang · Lei Yang · Taku Komura
|
||
Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors
Katja Schwarz · Norman Müller · Peter Kontschieder
|
||
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
JUNHAO WEI · YU ZHE · Jun Sakuma
|
||
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
Xiangyue Zhang · Jianfang Li · Jiaxu Zhang · Ziqiang Dang · Jianqiang Ren · Liefeng Bo · Zhigang Tu
|
||
The Devil is in the Spurious Correlation: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou · Fanyue Wei · Lixin Duan · Angela Yao · Wen Li
|
||
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Yuanhao Cai · He Zhang · Kai Zhang · Yixun Liang · Mengwei Ren · Fujun Luan · Qing Liu · Soo Ye Kim · Jianming Zhang · Zhifei Zhang · Yuqian Zhou · YULUN ZHANG · Xiaokang Yang · Zhe Lin · Alan Yuille
|
||
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo · Yingying Zhang · Xue Yang · Kang Wu · Qi Zhu · Lei Liang · Jingdong Chen · Yansheng Li
|
||
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
Xiao Liang · Di Wang · Zhicheng Jiao · Ronghan Li · Pengfei Yang · Quan Wang · Tat-Seng Chua
|
||
When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski · Mingzhi Lyu · Jiayou Lu · Nupur Kapur · Adams Kong
|
||
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding
Tongtong Cheng · Rongzhen Li · Yixin Xiong · Tao Zhang · Jing Wang · Kai Liu
|
||
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
shaojin wu · Mengqi Huang · wenxu wu · Yufeng Cheng · Fei Ding · Qian HE
|
||
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn · Selim Tekin · Fatih Ilhan · Sihao Hu · Tiansheng Huang · Yichang Xu · Margaret Loper · Ling Liu
|
||
I Am Big, You Are Little; I Am Right, You Are Wrong
David A Kelly · Akchunya Chanchal · Nathan Blake
|
||
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding
Tianci Wen · Zhiang Liu · Yongchun Fang
|
||
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang · Zhengxuan Wei · Yuchen Zhu · Cheng Shi · Guanbin Li · Liang Lin · Sibei Yang
|
||
Pseudo-Interaction: a Hybrid-Tower Paradigm for Text-to-Video Retrieval
Bangxiang Lan · Ruobing Xie · Ruixiang Zhao · Xingwu Sun · Zhanhui Kang · Gang Yang · Xirong Li
|
||
Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
Carter Sifferman · Yiquan Li · Yiming Li · Fangzhou Mu · Michael Gleicher · Mohit Gupta · Yin Li
|
||
Timestep-Aware Diffusion Model for Extreme Image Rescaling
Ce Wang · Zhenyu Hu · Wanjie Sun · Zhenzhong Chen
|
||
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang · Mu Cai · Bingxin Xu · Yong Jae Lee · Yan Yan
|
||
PointGAC: Geometric-Aware Codebook for Masked Point Modeling
Abiao Li · Chenlei Lv · Guofeng Mei · Yifan Zuo · Jian Zhang · Yuming Fang
|
||
Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation
Xincheng Shuai · Henghui Ding · Zhenyuan Qin · Hao Luo · Xingjun Ma · Dacheng Tao
|
||
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si · Zhiyi Shi · Xuehui Wang · Yichen Xiao · Xiaokang Yang · Wei Shen
|
||
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang · Guoguang Du · Xiaochuan Li · Qi Jia · Liang Jin · Lu Liu · Jingjing Wang · Cong Xu · Zhenhua Guo · Yaqian Zhao · Xiaoli Gong · Rengang Li · Baoyu Fan
|
||
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You · Shanglin Li · Yuanrui Sun · Jiangchuanwei Jiangchuanwei · Mingyu Guo · ChaoFeng ChaoFeng · Ran Jiao
|
||
SuMa: A Subspace Mapping Approach for Complete and Effective Concept Erasure in Text-to-Image Diffusion Models
Kien Nguyen · Anh Tran · Cuong Pham
|
||
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun · Tossaporn Saengja · Nontawat Tritrong · Pitchaporn Rewatbowornwong · Pramook Khungurn · Supasorn Suwajanakorn
|
||
TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon · Masato Ishii · Akio Hayakawa · Zhi Zhong · Shusuke Takahashi · Takashi Shibuya · Yuki Mitsufuji
|
||
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Marcin Przewięźlikowski · Randall Balestriero · Wojciech Jasiński · Marek Śmieja · Bartosz Zieliński
|
||
MSQ: Memory-Efficient Bit Sparsification Quantization
Seokho Han · Seo Yoon · Jinhee Kim · Dongwei Wang · Kang Jeon · Huanrui Yang · Jong Hwan Ko
|
||
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang · Chunyu Wang · Lingling Zhao · Junjie Wang
|
||
Error Recognition in Procedural Videos using Generalized Task Graph
Shih-Po Lee · Ehsan Elhamifar
|
||
PanSt3R: Multi-view consistent panoptic segmentation
Lojze Zust · Yohann Cabon · Juliette Marrie · Leonid Antsfeld · Boris Chidlovskii · Jerome Revaud · Gabriela Csurka
|
||
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan · Fei Kong · Hao Cheng · James Diffenderfer · Bhavya Kailkhura · Lichao Sun · Xiaofeng Zhu · Xiaoshuang Shi · Kaidi Xu
|
||
On the Provable Importance of Gradients for Language-Assisted Image Clustering
Bo Peng · Jie Lu · Guangquan Zhang · Zhen Fang
|
||
CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen · Satya Narayan Shukla · Mahmoud Azab · Aashu Singh · Qifan Wang · David Yang · ShengYun Peng · Hanchao Yu · Shen Yan · Xuewen Zhang · Baosheng He
|
||
Harnessing Input-adaptive Inference for Efficient VLN
Dongwoo Kang · Akhil Perincherry · Zachary Coalson · Aiden Gabriel · Stefan Lee · Sanghyun Hong
|
||
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency
Haotian Wang · Aoran Xiao · Xiaoqin Zhang · Meng Yang · Shijian Lu
|
||
Gradient Decomposition and Alignment for Incremental Object Detection
Wenlong Luo · Shizhou Zhang · De Cheng · Yinghui Xing · Guoqiang Liang · PENG WANG · Yanning Zhang
|
||
Heatmap Regression without Soft-Argmax for Facial Landmark Detection
Chiao-An Yang · Raymond Yeh
|
||
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Bingqing Zhang · Zhuo Cao · Heming Du · Yang Li · Xue Li · Jiajun Liu · Sen Wang
|
||
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
Yang JingYi · Xun Lin · Zitong YU · Liepiao Zhang · Xin Liu · Hui Li · Xiaochen Yuan · Xiaochun Cao
|
||
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari · Arash Harirpoush · Hassan Rivaz · Yiming Xiao
|
||
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Nan Chen · Mengqi Huang · Yihao Meng · Zhendong Mao
|
||
Generating Multi-Image Synthetic Data for Text-to-Image Customization
Nupur Kumari · Xi Yin · Jun-Yan Zhu · Ishan Misra · Samaneh Azadi
|
||
Self-Calibrating Gaussian Splatting for Large Field-of-View Reconstruction
Youming Deng · Wenqi Xian · Guandao Yang · Leonidas Guibas · Gordon Wetzstein · Steve Marschner · Paul Debevec
|
||
FLSeg: Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation
Zichun Su · Zhi Lu · Yutong Wu · Shen renfei · Songfeng Lu
|
||
Straighten Viscous Rectified Flow via Noise Optimization
Jimin Dai · Jiexi Yan · Jian Yang · lei luo
|
||
Is Visual in-Context Learning for Compositional Medical Tasks within Reach?
Simon Reiß · Zdravko Marinov · Alexander Jaus · Constantin Seibold · M. Sarfraz · Erik Rodner · Rainer Stiefelhagen
|
||
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
Du Chen · Liyi Chen · Zhengqiang ZHANG · Lei Zhang
|
||
Looking in the mirror: A faithful counterfactual explanation method for interpreting deep image classification models
Townim Chowdhury · Vu Phan · Kewen Liao · Nanyu Dong · Minh-Son To · Anton Hengel · Johan Verjans · Zhibin Liao
|
||
Boosting Class Representation via Semantically Related Instances for Robust Long-Tailed Learning with Noisy Labels
Yuhang Li · Zhuying Li · Yuheng Jia
|
||
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
Yuanhong Yu · Xingyi He · Chen Zhao · Junhao Yu · Jiaqi Yang · Ruizhen Hu · Yujun Shen · Xing Zhu · Xiaowei Zhou · Sida Peng
|
||
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji · Jiacheng Zhang · Jie Wu · Shilong Zhang · Shoufa Chen · Chongjian GE · Peize Sun · Weifeng Chen · Wenqi Shao · Xuefeng Xiao · Weilin Huang · Ping Luo
|
||
CryoFastAR: Fast Cryo-EM Ab initio Reconstruction Made Easy
Jiakai Zhang · Shouchen Zhou · Haizhao Dai · Xinhang Liu · Peihao Wang · Zhiwen Fan · Yuan Pei · Jingyi Yu
|
||
From Reusing to Forecasting: Accelerating Diffusion Models with Taylor Seers
Jiacheng Liu · Chang Zou · Yuanhuiyi Lyu · Junjie Chen · Linfeng Zhang
|
||
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Yuseung Lee · Jihyeon Je · Chanho Park · Mikaela Uy · Leonidas Guibas · Minhyuk Sung
|
||
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
Wei 廖伟 · Chunyan Xu · Chenxu Wang · Zhen Cui
|
||
Test-Time Prompt Tuning for Zero-Shot Depth Completion
Chanhwi Jeong · Inhwan Bae · Jin-Hwi Park · Hae-Gon Jeon
|
||
Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines
Jiayuan Chen · Thai-Hoang Pham · Yuanlong Wang · Ping Zhang
|
||
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
In Cho · Youngbeom Yoo · Subin Jeon · Seon Joo Kim
|
||
Head2Body: Body pose generation from Multi-sensory Head-mounted Inputs
Minh Tran · Hongda Mao · Qingshuang Chen · Yelin Kim
|
||
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo · Chuanhao Yan · Xingqian Xu · Yulin Wang · Kai Wang · Gao Huang · Humphrey Shi
|
||
3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation
Tianrui Lou · Xiaojun Jia · Siyuan Liang · Jiawei Liang · Ming Zhang · Yanjun Xiao · Xiaochun Cao
|
||
VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction
Martin De La Gorce · Charlie Hewitt · Tibor Takács · Robert Gerdisch · Zafiirah Hosenie · Givi Meishvili · Marek Kowalski · Thomas J. Cashman · Antonio Criminisi
|
||
Toward Material-Agnostic System Identification from Videos
Yizhou Zhao · Haoyu Chen · Chunjiang Liu · Zhenyang Li · Charles Herrmann · Junhwa Hur · Yinxiao Li · Ming-Hsuan Yang · Bhiksha Raj · Min Xu
|
||
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu · Zanlin Ni · Yeguo Hua · Xin Deng · Xiao Ma · Cheng Zhong · Gao Huang
|
||
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang · Jun Chen · Dannong Xu · Junjie Fei · Xiaoqian Shen · Liangbing Zhao · Chun-Mei Feng · Mohamed Elhoseiny
|
||
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Liuyue Xie · Jiancong Guo · Ozan Cakmakci · Andre Araujo · Laszlo A. A. Jeni · zhiheng jia
|
||
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai · Hu Han · Yuxiang Wei · Shiguang Shan · Xilin Chen
|
||
DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
Chen Lin · Weizhi Du · Zhixiang Min · Baochen She · Enrique Dunn · Sonya Hanson
|
||
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Kaidong Zhang · Rongtao Xu · Ren Pengzhen · Junfan Lin · Hefeng Wu · Liang Lin · Xiaodan Liang
|
||
P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong · Chengjian Feng · Feng yan · Fanfan Liu · Liming Zheng · Lin Ma
|
||
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel · Thomas Wimmer · Christian Theobalt · Christian Rupprecht · Adam Kortylewski
|
||
GVDepth: Zero-shot monocular depth estimation for ground vehicles based on probabilistic cue fusion
Karlo Koledic · Luka Petrovic · Ivan Marković · Ivan Petrovic
|
||
Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images
Simon Niedermayr · Christoph Neuhauser · Rüdiger Westermann
|
||
Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
Quanmin Liang · Qiang Li · Shuai Liu · Xinzi Cao · Jinyi Lu · Feidiao Yang · Wei Zhang · Kai Huang · Yonghong Tian
|
||
Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels
Chenyu Mu · Yijun Qu · Jiexi Yan · Erkun Yang · Cheng Deng
|
||
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Junyuan Zhang · Qintong Zhang · Bin Wang · Linke Ouyang · Zichen Wen · Ying Li · Ka-Ho Chow · Conghui He · Wentao Zhang
|
||
KinMo: Kinematic-aware Human Motion Understanding and Generation
Pengfei Zhang · Pinxin Liu · Pablo Garrido · Hyeongwoo Kim · Bindita Chaudhuri
|
||
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling
Qirui Wu · Denys Iliash · Daniel Ritchie · Manolis Savva · Angel Chang
|
||
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen · Jugang Fan · Zhuowei Yu · Bohan Zhuang · Mingkui Tan
|
||
Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation
Zheyun Qin · Deng Yu · Chuanchen Luo · Zhumin Chen
|
||
One-Step Specular Highlight Removal with Adapted Diffusion Models
Mahir Atmis · LEVENT KARACAN · Mehmet SARIGÜL
|
||
D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei · Qizhong Tan · Guangming Lu · Jiandong Tian · Jun Yu
|
||
AstroLoc: Robust Space to Ground Image Localizer
Gabriele Berton · Alex Stoken · Carlo Masone
|
||
RetinexMCNet: A Memory Controller Dominated Network for Low-Light Video Enhancement Based on Retinex
Meiao Wang · Xuejing Kang · Yaxi Lu · Jie Xu
|
||
MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion
peilin Tao · Hainan Cui · Diantao Tu · Shuhan Shen
|
||
TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration
Gong Meiqi · Hao Zhang · Xunpeng Yi · Linfeng Tang · Jiayi Ma
|
||
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo · Fan Ma · Linchao Zhu · Tianyi Wang · Fengyun Rao · Yi Yang
|
||
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao · Yuhan Wang · Yucheng Tang · Daguang Xu · Kang Wang · Yang Yang · Zongwei Zhou · Yuyin Zhou
|
||
Balanced Image Stylization with Style Matching Score
Yuxin Jiang · Liming Jiang · Shuai Yang · Jia-Wei Liu · Ivor Tsang · Mike Zheng Shou
|
||
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim · Wooseok Seo · Junwan Kim · Seungho Park · Sooyeon Park · Youngjae Yu
|
||
MCID: Multi-aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang · Zexi Jia · Hongyan Fei · Yeshuang Zhu · Zhiqiang Yuan · Ying Deng · Jiapei Zhang · Xiaoyue Duan · Jinchao Zhang · Jie Zhou
|
||
MV-Adapter: Multi-View Consistent Image Generation Made Easy
Zehuan Huang · Yuan-Chen Guo · Haoran Wang · Ran Yi · Lizhuang Ma · Yan-Pei Cao · Lu Sheng
|
||
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao · Isaac Chung · Imene Kerboua · Jamie Stirling · Xin Zhang · Márton Kardos · Roman Solomatin · Noura Al Moubayed · Kenneth Enevoldsen · Niklas Muennighoff
|
||
Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
Ziwei Wang · Sameera Ramasinghe · Chenchen Xu · Julien Monteil · Loris Bazzani · Thalaiyasingam Ajanthan
|
||
Modeling Saliency Dataset Bias
Matthias Kuemmerer · Harneet Singh Khanuja · Matthias Bethge
|
||
Not All Degradations Are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolution
hongjun wang · Jiyuan Chen · Zhengwei Yin · Xuan Song · Yinqiang Zheng
|
||
Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance
Jiaqi Jin · Siwei Wang · Zhibin Dong · Xihong Yang · Xinwang Liu · En Zhu · Kunlun He
|
||
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung · Frangil Ramirez · Juhyung Ha · Yi-Hsuan Tsai · Yi-Ting Chen · David Crandall
|
||
FreeDance: Towards Harmonic Free-Number Group Dance Generation via a Unified Framework
Yiwen Zhao · Yang Wang · Liting Wen · Hengyuan Zhang · Xingqun Qi
|
||
ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users
Xiangyu Yin · Boyuan Yang · Weichen Liu · Qiyao Xue · Abrar Alamri · Goeran Fiedler · Wei Gao
|
||
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li · Ruohan Zong · Yifan Liu · Ruichen Yao · Yaokun Liu · Yang Zhang · Dong Wang
|
||
AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction
Bin Rao · Haicheng Liao · Yanchen Guan · Chengyue Wang · Bonan Wang · Jiaxun Zhang · Zhenning Li
|
||
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn · sari sadiya · Jörg Schlötterer · Florian Buettner · Christin Seifert · Gemma Roig
|
||
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang · Jiahui Yang · Jianqin Yin · Zhenbo Luo · Jian Luan
|
||
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He · Qihang Yu · Qihao Liu · Liang-Chieh (Jay) Chen
|
||
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Wenlun Zhang · Yunshan Zhong · Shimpei Ando · Kentaro Yoshioka
|
||
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Diven Adversarial Prompts
Yufan Liu · Wanqian Zhang · Huashan Chen · Lin Wang · Xiaojun Jia · Zheng Lin · Weiping Wang
|
||
MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu · Jiawei Shi · Zheng Dang · Yuchao Dai
|
||
Learning A Unified Template for Gait Recognition
Panjian Huang · Saihui Hou · Junzhou Huang · Yongzhen Huang
|
||
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das · Davide Talon · Yiming Wang · Massimiliano Mancini · Elisa Ricci
|
||
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
Hongyu Wen · Yiming Zuo · Venkat Subramanian · Patrick Chen · Jia Deng
|
||
Transparent Vision: A Theory of Hierarchical Invariant Representations
Shuren Qi · Yushu Zhang · CHAO WANG · Zhihua Xia · Xiaochun Cao · FENGLEI FAN
|
||
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo · Jie Hu · Yansong Qu · Liujuan Cao
|
||
MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed · Jingtao Li · Weiming Zhuang · Chen Chen · Lingjuan Lyu
|
||
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni · Xiaofeng Wang · Zheng Zhu · Weijie Wang · Haoyun Li · Guosheng Zhao · Jie Li · Wenkang Qin · Guan Huang · Wenjun Mei
|
||
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong · Necati Cihan Camgoz · Richard Bowden
|
||
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta · Shanthika Naik · Preet Savalia · Sujay Kumar Ingle · Avinash Sharma
|
||
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
Wanshui Gan · Fang Liu · Hongbin Xu · Ningkai Mo · Naoto Yokoya
|
||
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu · Peijin Wang · Hanbo Bi · Boyuan Tong · Zhaozhi Wang · Wenhui Diao · Hao Chang · Yingchao Feng · Ziqi Zhang · Yaowei Wang · Qixiang Ye · Kun Fu · Xian Sun
|
||
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao · Mingyang Wang · Zhou Yu · Wenwen Pan · Yan Yang · Tao Wei · Hongyuan Zhang · Ning Mao · Chen Wei · Jun Yu
|
||
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou · Mingwei Li · Zongxin Yang · Yi Yang
|
||
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Yuhao Sun · Yihua Zhang · Gaowen Liu · Hongtao Xie · Sijia Liu
|
||
Monocular Facial Appearance Capture in the Wild
Yingyan Xu · Kate Gadola · Prashanth Chandran · Sebastian Weiss · Markus Gross · Gaspard Zoss · Derek Bradley
|
||
Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints
Jens Kreber · Joerg Stueckler
|
||
Boost 3D Reconstruction using Diffusion-based Intrinsic Estimation
Junyuan Deng · Wei Yin · Xiaoyang Guo · Qian Zhang · Xiaotao Hu · Weiqiang Ren · XIAOXIAO LONG · Ping Tan
|
||
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma · Xiaopei Yang · Ming Gui · Yusong Li · Felix Krause · Johannes Schusterbauer · Björn Ommer
|
||
Training-free Geometric Image Editing on Diffusion Models
Hanshen Zhu · Zhen Zhu · Kaile Zhang · Yiming Gong · Yuliang Liu · Xiang Bai
|
||
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu · Zhanxuan Hu · Yu Duan · Ercheng Pei · Yonghang Tai
|
||
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
Jiong Yin · Liang Li · Jiehua Zhang · Yuhan Gao · Chenggang Yan · Xichun Sheng
|
||
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Dongyeun Lee · jiwan hur · Hyounguk Shon · Jae Young Lee · Junmo Kim
|
||
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang · Yue Liao · RONG KANG · Fengyun Rao · Yibo Yang · Si Liu
|
||
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan · Songhua Liu · Xingyi Yang · Qiaochu Xue · Xinchao Wang
|
||
Inverse 3D microscopy rendering for cell shape inference with active mesh
Sacha Ichbiah · Anshuman SINHA · Fabrice Delbary · Hervé Turlier
|
||
NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals
Jiro Abe · Gaku Nakano · Kazumine Ogura
|
||
Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion
Jiawei Liang · Siyuan Liang · Tianrui Lou · Ming Zhang · liwenjin liwenjin · Dunqiu fan · Xiaochun Cao
|
||
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Wenqi Ouyang · Zeqi Xiao · Danni Yang · Yifan Zhou · Shuai Yang · Lei Yang · Jianlou Si · Xingang Pan
|
||
GECO: Geometrically consistent embedding with lightspeed inference
Regine Hartwig · Dominik Muhle · Riccardo Marin · Daniel Cremers
|
||
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi · Ziyi Wu · Igor Gilitschenski
|
||
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan · jiawei zhang · Xin Jin · Ziheng Zhang · Zheng Xiong · Dongqing Zou · Jimmy Ren · Chun-Le Guo · Chongyi Li
|
||
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu · Sinan Du · Yiyan Qi · Siwen Lu · Chengjin Xu · Chun Yuan · Jian Guo
|
||
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
Jaeseok Byun · Seokhyeon Jeong · Wonjae Kim · Sanghyuk Chun · Taesup Moon
|
||
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren · Qihang Yu · Ju He · Xiaohui Shen · Alan Yuille · Liang-Chieh (Jay) Chen
|
||
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu · Yuxuan Zhang · Lianghui Zhu · Tianheng Cheng · Lei Liu · Heng Liu · Longjin Ran · Xiaoxin Chen · Wenyu Liu · Xinggang Wang
|
||
SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting
Haiyang Ying · Matthias Zwicker
|
||
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu · Tianxiang Ma · Bingchuan Li · Zhuowei Chen · Jiawei Liu · Gen Li · SiYu Zhou · Qian HE · Xinglong Wu
|
||
Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching
Zhaoyang Li · Yuan Wang · Guoxin Xiong · Wangkai Li · Yuwen Pan · Tianzhu Zhang
|
||
CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance
Jinming Li · Yichen Zhu · Zhibin Tang · Junjie Wen · Minjie Zhu · Xiaoyu Liu · Chengmeng Li · Ran Cheng · Yaxin Peng · Yan Peng · Feifei Feng
|
||
DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance
Huu Phu · Yu-Wei Chen · Yi-Cheng Liao · Chi-Wei Hsiao · Han-Yang Wang · Wei-Chen Chiu · Ching-Chun Huang
|
||
GT-Loc: Unifying When and Where in Images through a Joint Embedding Space
David G. Shatwell · Ishan Rajendrakumar Dave · Swetha Sirnam · Mubarak Shah
|
||
LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
Xiaohang Zhan · Dingming Liu
|
||
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon · Minjong Lee · Sangdon Park · Dongwoo Kim
|
||
S$^{2}$ M$^{2}$: Scalable Stereo Matching Model for Reliable Depth Estimation
JUNHONG MIN · YOUNGPIL JEON · Jimin Kim · Minyong Choi
|
||
Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang · Zeyu Zhang · Yunzhong Hou · Zhuowan Li · Gaowen Liu · Ali Payani · Yuan-Sen Ting · Liang Zheng
|
||
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta · Anirban Roy · Rama Chellappa · Nathaniel D. Bastian · Alvaro Velasquez · Susmit Jha
|
||
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh · Shen Zheng · Robert Tamburo · Khiem Vuong · Juan Padilla · Hailiang Zhu · Nicholas Dunn · Michael Cardei · Christoph Mertz · Srinivasa Narasimhan
|
||
MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
XINJIE ZHANG · Zhening Liu · Yifan Zhang · Xingtong Ge · Dailan He · Tongda Xu · Yan Wang · Zehong Lin · Shuicheng YAN · Jun Zhang
|
||
Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin · Yi Yao · Chan-Feng Hsu · Hongxia Xie · Hong-Han Shuai · Wen-Huang Cheng
|
||
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
Yuran Wang · Yingping Liang · Yutao Hu · Ying Fu
|
||
Unsupervised RGB-D Point Cloud Registration for Scenes with Low Overlap and Photometric Inconsistency
yejun Shou · Haocheng Wang · Lingfeng Shen · Qian Zheng · Gang Pan · Yanlong Cao
|
||
CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Xiao Lin · Yun Peng · Liuyi Wang · xianyou zhong · Minghao Zhu · Jingwei Yang · Yi Feng · Chengju Liu · Qijun Chen
|
||
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang · Wei-Hong Li · Yiyuan Zhang · Minghong Cai · Xiangyu Yue
|
||
Randomized Autoregressive Visual Generation
Qihang Yu · Ju He · Xueqing Deng · Xiaohui Shen · Liang-Chieh (Jay) Chen
|
||
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
Chen Zhao · Xuan Wang · Tong Zhang · Saqib Javed · Mathieu Salzmann
|
||
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang · Langyu Wang · Yingying Chen · Yiyuan Zhang · Ming Tang · Jinqiao Wang
|
||
Estimating 2D Camera Motion with Hybrid Motion Basis
Haipeng Li · Tianhao Zhou · Zhanglei Yang · WuYi WuYi · Chen Yan · Zijing Mao · Shen Cheng · Bing Zeng · Shuaicheng Liu
|
||
Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification
Yuan Tian · Shuo Wang · Rongzhao Zhang · Zijian Chen · Yankai Jiang · Chunyi Li · Xiangyang Zhu · Fang Yan · Qiang Hu · Xiaosong Wang · Guangtao Zhai
|
||
Tracking Tiny Drones against Clutter: Large-Scale Infrared Benchmark with Motion-Centric Adaptive Algorithm
Jiahao Zhang · Zongli Jiang · Gang Wang · Jinli Zhang · Yixin Wei · Liang Li · Yizheng Wang
|
||
Exploring Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions
Yuwen Pan · Rui Sun · Wangkai Li · Tianzhu Zhang
|
||
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
Yiyuan Zhang · Handong Li · Jing Liu · Xiangyu Yue
|
||
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He · bo cheng · Yuhang Ma · QingxiangJia QingxiangJia · Shanyuan Liu · Ao Ma · Xiaoyu Wu · Liebucha Wu · Dawei Leng · Yuhui Yin
|
||
Knowledge Distillation for Learned Image Compression
Yunuo Chen · Zezheng Lyu · Bing He · Ning Cao · Gang chen · Guo Lu · Wenjun Zhang
|
||
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt Tuning
Chikai Shang · Mengke Li · Yiqun Zhang · Zhen Chen · Jinlin Wu · Fangqing Gu · Yang Lu · Yiu-ming Cheung
|
||
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
CHEN LIANG · Zhicheng Shi · Wenguan Wang · Yi Yang
|
||
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
Weitian Wang · Shubham rai · Cecilia Parra · Akash Kumar
|
||
Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization
Qingwang Zhang · Yingying Zhu
|
||
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang · Xin Li · Qiang Li · Zhiwei Wang
|
||
Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization
ying zhou · Lanning Zhang · Xidian University Fei · Hangzhou Institute of Technology, Xidian University Ziyun · KTH Royal Institute of Technology Maoying · University of Technology Sydney Jinlan · Hangzhou Dianzi University Nannan
|
||
Environment-Agnostic Pose: Generating Environment-independent Object Representations for 6D Pose Estimation
Shaobo Zhang · Yuhang Huang · Wanqing Zhao · Wei Zhao · Ziyu Guan · Jinye Peng
|
||
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang · Yanyuan Qiao · Qunbo Wang · Zike Yan · Qi Wu · Zhihua Wei · Jing Liu
|
||
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang · Huayang Huang · Ye Zhu · Olga Russakovsky · Yu Wu
|
||
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo · Zengmao Ding · Xiaobin Hu · Bang Li · Donghao Luo · AndyPianWu AndyPianWu · Chaoyang Wang · Chengjie Wang · Taisong Jin · SevenShu SevenShu · Yunsheng Wu · Yongge Liu · Rongrong Ji
|
||
DLF: Extreme Image Compression with Dual-generative Latent Fusion
Naifu Xue · Zhaoyang Jia · Jiahao Li · Bin Li · Yuan Zhang · Yan Lu
|
||
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes.
Chuyan Zhang · Kefan Wang · Yun Gu
|
||
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu · Renda Li · Yong Wang
|
||
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
Alexander Mai · Peter Hedman · George Kopanas · Dor Verbin · David Futschik · Qiangeng Xu · Falko Kuester · Jonathan Barron · Yinda Zhang
|
||
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
Jinseok Bae · Inwoo Hwang · Young-Yoon Lee · Ziyu Guo · Joseph Liu · Yizhak Ben-Shabat · Young Kim Kim · Mubbasir Kapadia
|
||
Improved Noise Schedule for Diffusion Training
Tiankai Hang · Shuyang Gu · Jianmin Bao · Fangyun Wei · Dong Chen · Xin Geng · Baining Guo
|
||
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li · Chuanxia Zheng · Christian Rupprecht · Andrea Vedaldi
|
||
Towards Visual Localization Interoperability: Cross-Feature for Collaborative Visual Localization and Mapping
Alberto Jaenal · Paula Carbó Cubero · Jose Araujo · André Mateus
|
||
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang · Yuxiang Nie · Yongjie Ye · Yanjie Wang · SHUAI LI · Haiyang Yu · Jinghui Lu · Can Huang
|
||
MoSiC: Optimal-Transport Motion Trajectories for Dense Self-Supervised Learning
Mohammadreza Salehi · Shashanka Venkataramanan · Ioana Simion · Stratis Gavves · Cees Snoek · Yuki Asano
|
||
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai · Wei-Cheng Chang · Jie-Ying Lee · Chih-Hai Su · Yu-Lun Liu
|
||
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng · Junqiang Wu · Yan Li · Tingting Gao · Di ZHANG · Huiyuan Fu
|
||
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning
Zhewei Dai · Shilei Zeng · Haotian Liu · Xurui Li · Feng Xue · Yu Zhou
|
||
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang · Lichao Chen · Jiayi Ji · Liujuan Cao · Shengchuan Zhang · Rongrong Ji
|
||
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah · Nicolas Richet · Ismail Ayed · Eric Granger
|
||
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng · Yijiang Li · Wanpeng Zhang · Sipeng Zheng · Hao Luo · Zihao Yue · Zongqing Lu
|
||
ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
Shadi Hamdan · Chonghao Sima · Zetong Yang · Hongyang Li · Fatma Guney
|
||
Self-Equilibrated Online Data Balancing for Enhanced Concept Composition in Generation Models
Yukai Shi · Jiarong Ou · Rui Chen · Haotian Yang · Jiahao Wang · Xin Tao · Pengfei Wan · Di ZHANG · Kun Gai
|
||
LLM Thought Divergence and Convergence for Dialogue-Based Image Generation Control
Hui Li
|
||
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan · Fabian Caba Heilbron · Bernard Ghanem · Josef Sivic · Bryan Russell
|
||
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
Chancharik Mitra · Brandon Huang · Tianning Chai · Zhiqiu Lin · Assaf Arbelle · Rogerio Feris · Leonid Karlinsky · Trevor Darrell · Deva Ramanan · Roei Herzig
|
||
PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection
Mahdiyar Molahasani · Azadeh Motamedi · Michael Greenspan · Il-Min Kim · Ali Etemad
|
||
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai · Wenxuan Cheng · Jiang-Jiang Liu · Sen Yang · Wenxiao Cai · Yanpeng Sun · Wankou Yang
|
||
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman · Mohammad Yaqub · Dwarikanath Mahapatra
|
||
FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
Minghan LI · Chenxi Xie · Yichen Wu · Lei Zhang · Mengyu Wang
|
||
DoppDrive: Doppler-Driven Temporal Aggregation for Improved Radar Object Detection
Yuval Haitman · Oded Bialer
|
||
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao · Xingyu Ni · Ziyu Wang · Feng Cheng · Ziyan Yang · Lu Jiang · Bohan Wang
|
||
Enpowering Your Pansharpening Models with Generalizability: Unified Distribution is All You Need
Yongchuan Cui · Peng Liu · HUI ZHANG
|
||
SUV: Suppressing Undesired Video Content via Semantic Modulation Based on Text Embeddings
Xiang Lv · Mingwen Shao · Lingzhuang Meng · Chang Liu · Yecong Wan · Xinyuan Chen
|
||
$\textit{FaceLift}$: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu · Yi Zhou · Ming-Hsuan Yang · Zhixin Shu
|
||
Dual Domain Control via Active Learning for Remote Sensing Domain Incremental Object Detection
Jiachen Sun · De Cheng · Xi Yang · Nannan Wang
|
||
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint
Enis Simsar · Alessio Tonioni · Yongqin Xian · Thomas Hofmann · Federico Tombari
|
||
I2V3D: Controllable image-to-video generation with 3D guidance
Zhiyuan Zhang · Dongdong Chen · Jing Liao
|
||
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang · Yan Teng · Yingchun Wang · Xingjun Ma
|
||
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei · Rama Chellappa
|
||
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich · Elona Dupont · Dimitrios Mallis · Kseniya Cherenkova · Anis Kacem · Djamila Aouada
|
||
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
Muhammad Anwar Ma'sum · Mahardhika Pratama · Savitha Ramasamy · Lin Liu · H Habibullah · Ryszard Kowalczyk
|
||
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
JIAHE ZHAO · rongkun Zheng · Yi Wang · Helin WANG · Hengshuang Zhao
|
||
Adversarial Purification via Super-Resolution and Diffusion
Mincheol Park · Cheonjun Park · Seungseop Lim · Mijin Koo · Hyunwuk Lee · Won Woo Ro · Suhyun Kim
|
||
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei · Shaofeng Yin · Qingchao Chen · Yuxin Peng · Yang Liu
|
||
ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Sandro Papais · Letian Wang · Brian Cheong · Steven Waslander
|
||
DialNav: Multi-turn Dialog Navigation with a Remote Guide
Leekyeung Han · Hyunji Min · Gyeom Hwangbo · Jonghyun Choi · Paul Hongsuck Seo
|
||
Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
Edgar Sucar · Zihang Lai · Eldar Insafutdinov · Andrea Vedaldi
|
||
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung · Jun-Hyeon Bak · Yujin Jeong · Gyugeun Lee · Jinwoo Ahn · Eun-Sol Kim
|
||
Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
Changsheng Gao · Yifan Ma · Qiaoxi Chen · Xu yenan · Dong Liu · Weisi Lin
|
||
MeasureXpert: Automatic Anthropometric Measurement Extraction from Two Unregistered, Partial, Posed, and Dressed Body Scans
Ran Zhao · Xinxin Dai · Pengpeng Hu · Vasile Palade · Adrian Munteanu
|
||
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin · Cheng Sun · Fu-En Yang · Min-Hung Chen · Yen-Yu Lin · Yu-Lun Liu
|
||
Bridging Class Imbalance and Partial Labeling via Spectral-Balanced Energy Propagation for Skeleton-based Action Recognition
Yandan Wang · Chenqi Guo · Yinglong Ma · Jiangyan Chen · Yuan Gao · Weiming Dong
|
||
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou · Gim Hee Lee
|
||
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge · Zuhong Liu · Longteng Fan · Yifan Jiang · Jiaqi Su · Yiming Li · Zhejun Zhang · Siheng Chen
|
||
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
Aoxiong Yin · Kai Shen · Yichong Leng · Xu Tan · Xinyu Zhou · Juncheng Li · Siliang Tang
|
||
Agreement aware and dissimilarity oriented GLOM
Ru Zeng · Yan Song · Yang ZHANG · yanlinghu yanlinghu · Hui Yu
|
||
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava · Xiang Zhang · He Wen · Chenru Wen · Zhuowen Tu
|
||
Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
JianHui Zhang · Shen Cheng · Qirui Sun · Jia Liu · Wang Luyang · chaoyu feng · Chen Fang · LEI LEI · Jue Wang · Shuaicheng Liu
|
||
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic · Siwei Zhang · Gen Li · KAIFENG ZHAO · Lea Müller · Siyu Tang
|
||
Towards Annotation-Free Evaluation: KPAScore for Human Keypoint Detection
Xiaoxiao Wang · Chunxiao Li · Peng Sun · Boming Miao · Yunjian Zhang · Yao Zhu
|
||
A Conditional Probability Framework for Compositional Zero-shot Learning
Peng Wu · Qiuxia Lai · Hao Fang · Guo-Sen Xie · Yilong Yin · Xiankai Lu · Wenguan Wang
|
||
Spatial-Temporal Forgery Trace based Forgery Image Identification
Yilin Wang · Zunlei Feng · Jiachi Wang · Hengrui Lou · Binjia Zhou · Jie Lei · Mingli Song · Yijun Bei
|
||
SMP-Attack: Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout
Wen Yang · Guodong Liu · Di Ming
|
||
ContextFace: Generating Facial Expressions from Emotional Contexts
minjung kim · Minsang Kim · Seung Baek
|
||
HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network
Juhyung Ha · Vibhas Vats · Alimoor Reza · Soon-heung Jung · David Crandall
|
||
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang · Yixuan Li · yanhong zeng · Yuwei Guo · Dahua Lin · Tianfan Xue · Bo Dai
|
||
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
Junjie Shan · Ziqi Zhao · Jialin Lu · Rui Zhang · SM Yiu · Ka-Ho Chow
|
||
Passing the Driving Knowledge Test
Maolin Wei · Wanzhou Liu · Eshed Ohn-Bar
|
||
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang · I-Chao Shen · Yufeng Wang · Yi-Hsuan Tsai · Yi Yang · Shuchang Zhou · Wenrui Ding · Takeo Igarashi · Ming-Hsuan Yang
|
||
IM360: Large-scale Indoor Mapping with 360 Cameras
Dongki Jung · Jaehoon Choi · Yonghan Lee · Dinesh Manocha
|
||
Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning
Muhammad Aqeel · Shakiba Sharifi · Marco Cristani · Francesco Setti
|
||
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud · Sergey Lavrushkin · Alexey Kirillov · Dmitriy Vatolin
|
||
Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
Fan Li · Xuanbin Wang · Xuan Wang · Zhaoxiang Zhang · yuelei xu
|
||
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection
Zhixin Cheng · Jiacheng Deng · Xinjun Li · Xiaotian Yin · Bohao Liao · Baoqun Yin · Wenfei Yang · Tianzhu Zhang
|
||
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
Laura Niss · Kevin Vogt-Lowell · Theodoros Tsiligkaridis
|
||
Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision
Tianma Shen · Aditya Shrish Puranik · James Vong · Vrushabh Deogirikar · Ryan Fell · Julianna Dietrich · Maria Kyrarini · Christopher Kitts · David Jeong
|
||
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang · Yaxiong Wang · Li Zhu · Zhedong Zheng
|
||
Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids
Jiancheng Zhao · Yifan Zhan · Qingtian Zhu · Mingze Ma · Muyao Niu · Zunian Wan · Xiang Ji · Yinqiang Zheng
|
||
Mitigating Object Hallucinations via Sentence-Level Early Intervention
Shangpin Peng · Senqiao Yang · Li Jiang · Zhuotao Tian
|
||
Group-wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing
Seungjin Jung · Kanghee Lee · Yonghyun Jeong · Haeun Noh · Jungmin Lee · Jongwon Choi
|
||
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang · Chengjian Feng · Baihui Xiao · Feng yan · ZEQUN JIE · Yujie Zhong · Xiaodan Liang · Lin Ma
|
||
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang · Yao Feng · Alpár Cseke · Nitin Saini · Nathan Bajandas · Nicolas Heron · Michael Black
|
||
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
Yuhwan Jeong · Yunseo Yang · Youngho Yoon · Kuk-Jin Yoon
|
||
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Ahmed Nassar · Matteo Omenetti · Maksym Lysak · Nikolaos Livathinos · Christoph Auer · Lucas Morin · Rafael Teixeira de Lima · Yusik Kim · A. Gurbuz · Michele Dolfi · Peter Staar
|
||
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan · Yunhe Feng · Yapeng Tian · James Liang · Yuewei Lin · Yan Huang · Heng Fan
|
||
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen Zhou · Hang Gao · Vikram Voleti · Aaryaman Vasishta · Chun-Han Yao · Mark Boss · Philip Torr · Christian Rupprecht · Varun Jampani
|
||
LEGION: Learning to Ground and Explain for Synthetic Image Detection
Hengrui Kang · Siwei Wen · Zichen Wen · Junyan Ye · Weijia Li · Peilin Feng · Baichuan Zhou · Bin Wang · Dahua Lin · Linfeng Zhang · Conghui He
|
||
Music Grounding by Short Video
Zijie Xin · Minquan Wang · Jingyu Liu · Quan Chen · Ye Ma · Peng Jiang · Xirong Li
|
||
OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu · Guanyu Chen · Ye Li · Xi Yuxin · Zeyu Mu · Ruichen Wang · Tianren Zhang · Haichuan Gao · Feng Chen
|
||
Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning
Wenxuan Bao · Ruxi Deng · Ruizhong Qiu · Tianxin Wei · Hanghang Tong · Jingrui He
|
||
LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild
Jiaying Ying · Heming Du · Kaihao Zhang · Lincheng Li · Xin Yu
|
||
Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning
Linlan Huang · Xusheng Cao · Haori Lu · Yifan Meng · Fei Yang · Xialei Liu
|
||
Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching
Zhankai Li · Weiping Wang · jie li · Shigeng Zhang · Yunan Hu · Song Guo
|
||
Unified Multi-Agent Trajectory Modeling with Masked Trajectory Diffusion
songru Yang · Zhenwei Shi · Zhengxia Zou
|
||
Improving Rectified Flow with Boundary Conditions
Xixi Hu · Runlong Liao · Bo Liu · Keyang Xu · Yeqing Li · Eugene Ie · Hongliang Fei · qiang liu
|
||
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang · Anoop Cherian · Cristian Rodriguez-Opazo · Weijian Deng · Stephen Gould
|
||
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu · Linghe Kong · Guihai Chen
|
||
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI · Ngoc Huynh · Phúc Lê Khắc · Wamiq Para · Ankit Singh · Sanath Narayan
|
||
Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras
Petr Hruby · Marc Pollefeys
|
||
Motion Synthesis with Sparse and Flexible Keyjoint Control
Inwoo Hwang · Jinseok Bae · Donggeun Lim · Young Kim Kim
|
||
Controllable Latent Space Augmentation for Digital Pathology
Sofiène Boutaj · Marin Scalbert · Pierre Marza · Florent Couzinie-Devy · Maria Vakalopoulou · Stergios Christodoulidis
|
||
From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong · Lihe Ding · Xiao Chen · Yaokun Li · Yuxin WANG · Yucheng Wang · Qi WANG · Jaehyeok Kim · Chenjian Gao · Zhanpeng Huang · Zibin Wang · Tianfan Xue · Dan Xu
|
||
Self-Supervised Speed of Sound Recovery for Aberration-Corrected Photoacoustic Computed Tomography
Tianao Li · Manxiu Cui · Cheng Ma · Emma Alexander
|
||
Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations
jing Yang · Qunliang Xing · Mai Xu · Minglang Qiao
|
||
Scaling 3D Compositional Models for Robust Classification and Pose Estimation
Xiaoding Yuan · Prakhar Kaushik · Guofeng Zhang · Artur Jesslen · Adam Kortylewski · Alan Yuille
|
||
One-Shot Knowledge Transfer for Scalable Person Re-Identification
Longhua Li · Lei Qi · Xin Geng
|
||
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu · Bao-liang Lu · Wei-Long Zheng
|
||
DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads
Xiaoxi Liang · Yanbo Fan · Qiya Yang · Xuan Wang · Wei Gao · Ge Li
|
||
Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition
Zhengyuan Peng · Jianqing Xu · Yuge Huang · Jinkun Hao · Shouhong Ding · zhizhong zhang · Xin TAN · Lizhuang Ma
|
||
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu · Xinke Li · Yuning Zhang · Lu Qi · Xin Li · Wenping Wang · Chongshou Li · Xueting Li · Ming-Hsuan Yang
|
||
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching
Yihong Luo · Tianyang Hu · Yifan Song · Jiacheng Sun · Zhenguo Li · Jing Tang
|
||
Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
Maximilian Ulmer · Wout Boerdijk · Rudolph Triebel · Maximilian Durner
|
||
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Yuxuan Zhang · Yirui Yuan · Yiren Song · Haofan Wang · Jiaming Liu
|
||
Unified Adversarial Augmentation for Improving Palmprint Recognition
Jianlong Jin · Chenglong Zhao · Ruixin Zhang · Sheng Shang · Yang Zhao · Jun Wang · Jingyun Zhang · Shouhong Ding · Wei Jia · Yunsheng Wu
|
||
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
Chao Zhou · Tianyi Wei · Nenghai Yu
|
||
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena · Tommaso Apicella · Stefano Rosa · Pietro Morerio · ALESSIO DEL BUE · Lorenzo Natale
|
||
Adversarial Reconstruction Feedback for Robust Fine-grained Generalization
Shijie Wang · Jian Shi · Haojie Li
|
||
CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching
Minjoo Ki · Dae Jung Kim · Kisung Kim · Seon Joo Kim · Jinhan Lee
|
||
Thermal Polarimetric Multi-view Stereo
Takahiro Kushida · Kenichiro Tanaka
|
||
MoFRR: Mixture of Diffusion Models for Face Retouching Restoration
Jiaxin Liu · Qichao Ying · Zhenxing Qian · Sheng Li · Runqi Zhang · Jian liu · Xinpeng Zhang
|
||
VideoAds: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
Zheyuan Zhang · Wanying Dou · Linkai Peng · Hongyi Pan · Ulas Bagci · Boqing Gong
|
||
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng · Ruiliang Lyu · Xiaotao Gu · Xiao Liu · Jiazheng Xu · Yida Lu · Jiayan Teng · Zhuoyi Yang · Yuxiao Dong · Jie Tang · Hongning Wang · Minlie Huang
|
||
ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
Leonard Bruns · Axel Barroso-Laguna · Tommaso Cavallari · Áron Monszpart · Sowmya Munukutla · Victor Prisacariu · Eric Brachmann
|
||
Rethink Sparse Signals for Pose-guided Text-to-image Generation
Wenjie Xuan · Jing Zhang · Juhua Liu · Bo Du · Dacheng Tao
|
||
AIComposer: Any Style and Content Image Composition via Feature Integration
Haowen Li · Zhenfeng Fan · Zhang Wen · Zhengzhou Zhu · Yunjin Li
|
||
Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification
Daqian Shi · Xiaolei Diao · Xu Chen · Cedric John
|
||
TimeBooth: Disentangled Facial Invariant Representation for Diverse and Personalized Face Aging
Zepeng Su · zhulin liu · Zongyan Zhang · Tong Zhang · C.L.Philip Chen
|
||
RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications
Sijia Chen · Bin Song
|
||
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng · Leijiang Gu · Xun Yang · Zhangling Duan · Zenglin Shi · Meng Wang
|
||
Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
Han Wang · Shengyang Li · Jian Yang · Yuxuan Liu · Yixuan Lv · Zhuang Zhou
|
||
DIP: Unsupervised Dense In-Context Post-training of Visual Representations
Sophia Sirko-Galouchenko · Spyros Gidaris · Antonin Vobecky · Andrei Bursuc · Nicolas THOME
|
||
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Zhiqi Ge · Juncheng Li · Xinglei Pang · Minghe Gao · Kaihang Pan · Wang Lin · Hao Fei · Wenqiao Zhang · Siliang Tang · Yueting Zhuang
|
||
DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering
Rongjia Zheng · Qing Zhang · Chengjiang Long · Wei-Shi Zheng
|
||
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
Zedong Wang · Siyuan Li · Dan Xu
|
||
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim · Jaekyun Ko · Muhammad Kashif Ali · Tae Hyun Kim
|
||
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang · Junsheng Zhou · Haotian Geng · Wenyuan Zhang · Liang Han
|
||
HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss
Ke Zhang · Yi Huang · Wei Liu · Yuanyuan Wang · Vishal Patel · Le Lu · Xu Han · Dakai Jin · Ke Yan
|
||
Enhanced Event-based Dense Stereo via Cross-Sensor Knowledge Distillation
haihao zhang · Yunjian Zhang · Jianing Li · Lin Zhu · Meng Lv · Yao Zhu · Yanwei Liu · Xiangyang Ji
|
||
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang · Xi Chen · Xiaogang Xu · Sihui Ji · Yu Liu · Yujun Shen · Hengshuang Zhao
|
||
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities
Haoning Wu · Ziheng Zhao · Ya Zhang · Yanfeng Wang · Weidi Xie
|
||
MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang · YaoYang Liu · Bin Xia · Bohao PENG · Zexin Yan · Eric Lo · Jiaya Jia
|
||
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei · Zeinab Taghavi · Sepehr Rezaee · Masoud Hadi · Moein Madadi · Mackenzie Mathis
|
||
Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee · Jiwan Seo · Minwoo Choi · Kiljoon Han · Jaehoon Jeong · Zane Durante · Ehsan Adeli · Sang Hyun Park · Sunghoon Im
|
||
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang · Rui Shao · Gongwei Chen · Miao Zhang · Kaiwen Zhou · Weili Guan · Liqiang Nie
|
||
STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
Xiaohang Yang · Qing Wang · Jiahao Yang · Gregory Slabaugh · Shanxin Yuan
|
||
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu · Wenwei Zhang · Lumin Xu · Sheng Jin · Zhonghua Wu · Qingyi Tao · Wentao Liu · Wei Li · Chen Change Loy
|
||
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models
hongji yang · Wencheng Han · Yucheng Zhou · Jianbing Shen
|
||
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang · Xu Zhao
|
||
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha · Tianyu Li · Guoqing Wang · Peng Wang · Yangyang Wu · Yang Yang · Heng Tao Shen
|
||
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li · Juncan Deng · Chengxuan Wang · Kedong Xu · Rongtao Deng · Hong Gu · Haibin Shen · Kejie Huang
|
||
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Xiangdong Zhang · Shaofeng Zhang · Junchi Yan
|
||
Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Nuoye Xiong · Anqi Dong · Ning Wang · Cong Hua · Guangming Zhu · Lin Mei · peiyi shen · zhang liang
|
||
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao · Haoye Dong · Yuyang Yin · ZEQUN JIE · Yinlong Qian · Yao Zhao · Humphrey Shi · Yunchao Wei
|
||
On the Robustness Tradeoff in Fine-Tuning
Kunyang Li · Jean-Charles Noirot Ferrand · Ryan Sheatsley · Blaine Hoak · Yohan Beugin · Eric Pauley · Patrick McDaniel
|
||
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
Ziyu Guo · Young-Yoon Lee · Joseph Liu · Yizhak Ben-Shabat · Victor Zordan · Mubbasir Kapadia
|
||
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou · Yicong Hong · Zun Wang · Chongyang Zhao · Mohit Bansal · Qi Wu
|
||
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Jeonghoon Park · Juyoung Lee · Chaeyeon Chung · Jaeseong Lee · Jaegul Choo · Jindong Gu
|
||
RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors
Sicong Du · Jiarun Liu · Qifeng Chen · Hao-Xiang Chen · Tai-Jiang Mu · Sheng Yang
|
||
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning.
Daniel DeAlcala · Aythami Morales · Julian Fierrez · Gonzalo Mancera · Ruben Tolosana · Javier Ortega-Garcia
|
||
LOTA: Bit-Planes Guided AI-Generated Image Detection
Renxi Cheng · Hongsong Wang · Yang Zhang · Chaolei Han · Jie Gui
|
||
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish · Muhammad Akhtar Munir · Syed Shah · Kartik Kuckreja · Fahad Khan · Paolo Fraccaro · Alexandre Lacoste · Salman Khan
|
||
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li · Biao Yang · Qiang Liu · Shuo Zhang · Zhiyin Ma · Liang Yin · Deng Linger · Yabo Sun · Yuliang Liu · Xiang Bai
|
||
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Kaisi Guan · Zhengfeng Lai · Yuchong Sun · Peng Zhang · Wei Liu · Xiaojiang Liu · Meng Cao · Ruihua Song
|
||
AccidentalGS: 3D Gaussian Splatting from Accidental Camera Motion
Mao Mao · Xujie Shen · Guyuan Chen · Boming Zhao · Jiarui Hu · Hujun Bao · Zhaopeng Cui
|
||
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Ruofan Wang · Juncheng Li · Yixu Wang · Bo Wang · Xiaosen Wang · Yan Teng · Yingchun Wang · Xingjun Ma · Yu-Gang Jiang
|
||
Proactive Scene Decomposition and Reconstruction
Baicheng Li · Zike Yan · Dong Wu · Hongbin Zha
|
||
Joint Asymmetric Loss for Learning with Noisy Labels
Jialiang Wang · Xianming Liu · Xiong Zhou · Gangfeng Hu · Deming Zhai · Junjun Jiang · Xiangyang Ji
|
||
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
Hengjia Li · Haonan Qiu · Shiwei Zhang · Xiang Wang · Yujie Wei · Zekun Li · Yingya Zhang · Boxi Wu · Deng Cai
|
||
Uncalibrated Structure from Motion on a Sphere
Jonathan Ventura · Viktor Larsson · Fredrik Kahl
|
||
LookOut: Real-World Humanoid Egocentric Navigation
Boxiao Pan · Adam Harley · Francis Engelmann · Karen Liu · Leonidas Guibas
|
||
DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting
Jingyi Pan · Dan Xu · Qiong Luo
|
||
Self-Supervised Sparse Sensor Fusion for Long Range Perception
Edoardo Palladin · Samuel Brucker · Filippo Ghilotti · Praveen Narayanan · Mario Bijelic · Felix Heide
|
||
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu · Han Zhang · Zhantao Yang · Qianyu Peng · Zhao Pu · Huangji Wang · Fan Cheng
|
||
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu · Ziming Li · Xuesong Shi · Chaoyi Xu · Yizhou Wang · He Wang
|
||
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Chen Ziwen · Hao Tan · Kai Zhang · Sai Bi · Fujun Luan · Yicong Hong · Li Fuxin · Zexiang Xu
|
||
MemDistill: Distilling LiDAR Knowledge into Memory for Camera-Only 3D Object Detection
Donghyeon Kwon · Youngseok Yoon · Hyeongseok Son · Suha Kwak
|
||
Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Yunqi Miao · Zhiyu Qu · Mingqi Gao · Changrui Chen · Jifei Song · Jungong Han · Jiankang Deng
|
||
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu · Yihao Meng · Hao Ouyang · Yue Yu · Bolin Zhao · Daniel Cohen-Or · Huamin Qu
|
||
Liberated-GS: 3D Gaussian Splatting Independent from SfM Point Clouds
Weihong Pan · Xiaoyu Zhang · Hongjia Zhai · Xiaojun Xiang · Hanqing Jiang · Guofeng Zhang
|
||
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel · xilong zhou · Umar Iqbal · Pramod Rao · Pulkit Gera · Jan Kautz · Vladislav Golyanik · Christian Theobalt
|
||
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu · Zhebei Shen · Zhongqi Yue · Yang Wu · Bosheng Qin · Wenqiao Zhang · Yunfei Li · Juncheng Li · Siliang Tang · Yueting Zhuang
|
||
Zero-shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern · Sasikarn Khwanmuang · Matthias Nießner · Supasorn Suwajanakorn
|
||
Inverse Image-Based Rendering for Light Field Generation from Single Images
Hyunjun Jung · Hae-Gon Jeon
|
||
Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention
Shiwei Zhang · Qi Zhou · Wei Ke
|
||
PoseAnchor: Robust Root Position Estimation for 3D Human Pose Estimation
Jun-Hee Kim · Jumin Han · Seong-Whan Lee
|
||
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection
Bin Yang · Yulin Zhang · Hong-Yu Zhou · Sibei Yang
|
||
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer · Samuel Rota Bulò · Yung-Hsu Yang · Nikhil Keetha · Lorenzo Porzi · Norman Müller · Katja Schwarz · Jonathon Luiten · Marc Pollefeys · Peter Kontschieder
|
||
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani · Xiaomeng Ye · Muhammad Zubair Irshad · Zsolt Kira
|
||
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo · Mohan Kankanhalli
|
||
Fine-Grained 3D Gaussian Head Avatars Modeling from Static Captures via Joint Reconstruction and Registration
Yuan Sun · Xuan Wang · Cong Wang · WeiLi Zhang · Yanbo Fan · Yu Guo · Fei Wang
|
||
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin · Hyung Jin Chang · Eunwoo Kim
|
||
StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning
Chuxin Wang · Yixin Zha · Wenfei Yang · Tianzhu Zhang
|
||
Robust Dataset Condensation using Supervised Contrastive Learning
Nicole Kim · Hwanjun Song
|
||
C4D: 4D Made from 3D through Dual Correspondences
Shizun Wang · Zhenxiang Jiang · Xingyi Yang · Xinchao Wang
|
||
Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition
Guanghui Shi · xuefeng liang · WenjieLi WenjieLi · Xiaoyu Lin
|
||
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner · Paula Usinger · Julius Nehring-Wirxel · Gregor Kobsik · Victor Czech · Yanjiang He · Isaak Lim · Leif Kobbelt
|
||
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou · DINGKANG LIANG · Sifan Tu · Xiwu Chen · Yikang Ding · Dingyuan Zhang · Feiyang Tan · Hengshuang Zhao · Xiang Bai
|
||
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
Junyi Wu · Zhiteng Li · Zheng Hui · YULUN ZHANG · Linghe Kong · Xiaokang Yang
|
||
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang · Mengqi Huang · Yijing Tu · Zhendong Mao
|
||
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian · Zheng Lu · Mingqi Gao · Zheng Liu · Bo Zhao
|
||
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin · Li Kang · Xiufeng Song · Zhenfei Yin · Xiaohong Liu · Xihui Liu · Ruimao Zhang · LEI BAI
|
||
DACoN: DINO for Anime Colorization with Any Number of Reference Images
Kazuma Nagata · Naoshi Kaneko
|
||
Robust Test-Time Adaptation for Single Image Denoising Using Deep Gaussian Prior
Qing Ma · Pengwei Liang · Xiong Zhou · Jiayi Ma · Junjun Jiang · Zhe Peng
|
||
CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds
Feng Yang · Yichao Cao · Xiu Su · Dan Niu · Xuanpeng Li
|
||
Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
Tuo Chen · Jie Gui · Minjing Dong · Ju Jia · Lanting Fang · Jian liu
|
||
Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs
Liwei Che · Qingze Liu · Jing Jia · Weiyi Qin · Ruixiang Tang · Vladimir Pavlovic
|
||
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
Junzhe Lu · Jing Lin · Hongkun Dou · Ailing Zeng · Yue Deng · Xian Liu · Zhongang Cai · Lei Yang · YULUN ZHANG · Haoqian Wang · Ziwei Liu
|
||
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
Yan Xia · Yunxiang Lu · Rui Song · Oussema Dhaouadi · Joao F. Henriques · Daniel Cremers
|
||
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
Fangyikang Wang · Hubery Yin · Lei Qian · Yinan Li · SHAOBIN ZHUANG · Huminhao Zhu · Yilin Zhang · Yanlong Tang · Chao Zhang · Hanbin Zhao · Hui Qian · Chen Li
|
||
LGA-Net: Learning Local and Global Affinities for Sparse Scribble based Image Colorization
Hongjin Lyu · Bo Li · Paul Rosin · Yu-Kun Lai
|
||
WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection
Haodong Zhu · Wenhao Dong · Linlin Yang · Hong Li · Yuguang Yang · Yangyang Ren · Qingcheng Zhu · Zichao Feng · CHANGBI LI · Shaohui Lin · Runqi Wang · Xiaoyan Luo · Baochang Zhang
|
||
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu · Ke Zou · Li-Ming Zhan · ZEXIN LU · Xiaoyu DONG · Chengqiang Xie · Yidi Chen · Jiannong Cao · Xiao-Ming Wu · Huazhu Fu
|
||
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma · Qirong Peng · Xu Guo · Chen Chen · Haonan Lu · Zhenyu Yang
|
||
Long Context Tuning for Video Generation
Yuwei Guo · Ceyuan Yang · Ziyan Yang · Zhibei Ma · Zhijie Lin · Zhenheng Yang · Dahua Lin · Lu Jiang
|
||
QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation
Jiahui Yang · Yongjia Ma · Donglin Di · Hao Li · Chen Wei · Xie Yan · Jianxun Cui · Xun Yang · Wangmeng Zuo
|
||
RoMo: Robust Motion Segmentation Improves Structure from Motion
Lily Goli · Sara Sabour · Mark Matthews · Marcus Brubaker · Dmitry Lagun · Alec Jacobson · David Fleet · Saurabh Saxena · Andrea Tagliasacchi
|
||
CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation from Arbitrary-Length Sequences
Miaowei Wang · Changjian Li · Amir Vaxman
|
||
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan · Nikhil Parthasarathy · Catalin Ionescu · Drew Hudson · Alexander Lerchner · Andrew Zisserman · Mehdi Sajjadi · Joao Carreira
|
||
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training
Zhenxin Li · Shihao Wang · Shiyi Lan · Zhiding Yu · Zuxuan Wu · Jose M. Alvarez
|
||
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu · Yue Wu · Meng Chu · Zhifei Ren · Zizheng Huang · Pei Chu · Ruijie Zhang · Yinan He · Qirui Li · Songze Li · Zhenxiang Li · Zhongying Tu · Conghui He · Yu Qiao · Yali Wang · Yi Wang · Limin Wang
|
||
EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching
Pengjie Zhang · Lin Zhu · Xiao Wang · Lizhi Wang · Hua Huang
|
||
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue · Jinouwen Zhang · Yazhe Niu · Dazhong Shen · Bingqi Ma · Yu Liu · Jing Yang
|
||
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella · Davide Talon · Ziyue Liu · Zanxi Ruan · Yiming Wang · Marco Cristani
|
||
LocalDyGS : Multi-view Global Dynamic Scene Modeling through Adaptive Local Feature Decoupling
Jiahao Wu · Rui Peng · Jianbo Jiao · Jiayu Yang · Luyang Tang · Kaiqiang Xiong · Jie Liang · Jinbo Yan · runling liu · Ronggang Wang
|
||
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
Yi Chen · Yuying Ge · Weiliang Tang · Yizhuo Li · Yixiao Ge · Mingyu Ding · Ying Shan · Xihui Liu
|
||
HumorDB: Can AI understand graphical humor?
Vedaant V Jain · Gabriel Kreiman · Felipe Feitosa
|
||
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu · Siyuan Meng · Yanting Gao · Song Mao · Pinlong Cai · Guohang Yan · Yirong Chen · Zilin Bian · DING WANG · Botian Shi
|
||
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen · Zhengrong Yue · Siran Chen · Zikang Wang · Yang Liu · Peng Li · Yali Wang
|
||
G$^{2}$SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection
Chengyu Tao · Xuanming Cao · Juan Du
|
||
Tensor-aggregated LoRA in Federated Fine-tuning
Zhixuan Li · Binqian Xu · Xiangbo Shu · Jiachao Zhang · Yazhou Yao · Guo-Sen Xie · Jinhui Tang
|
||
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou · Dayu Li · Jinshan Pan · Juncheng Zhou · Jinglei Shi · Jufeng Yang
|
||
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
LI XIAOJIE · Ronghui Li · Shukai Fang · Shuzhao Xie · Xiaoyang Guo · Jiaqing Zhou · Junkun Peng · Zhi Wang
|
||
Bi-Level Optimization for Self-Supervised AI-Generated Face Detection
Mian Zou · Nan Zhong · Baosheng Yu · Yibing Zhan · Kede Ma
|
||
RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation
Junwen Huang · Shishir Reddy Vutukur · Peter Yu · Nassir Navab · Slobodan Ilic · Benjamin Busam
|
||
Denoising Token Prediction in Masked Autoregressive Models
Ting Yao · Yehao Li · Yingwei Pan · Zhaofan Qiu · Tao Mei
|
||
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer · Dorian Henning · Artemij Amiranashvili · Douglas Morrison · Mariliza Tzes · Ingmar Posner · Marc Matvienko · Alessandro Rennola · Anton Milan
|
||
Teleportraits: Training-Free People Insertion into Any Scene
Jialu Gao · Joseph K J · Fernando De la Torre
|
||
Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy
Yaxin Xiao · Qingqing Ye · Li Hu · Huadi Zheng · Haibo Hu · Zi Liang · Haoyang LI · JIAOYIJIE JIAOYIJIE
|
||
Neural Multi-View Uncalibrated Photometric Stereo without Photometric Stereo Cues
Xu Cao · Takafumi Taketomi
|
||
Multimodal LLM Guided Exploration and Active Mapping using Fisher Information
Wen Jiang · BOSHU LEI · Katrina Ashton · Kostas Daniilidis
|
||
Language Decoupling with Fine-grained Knowledge Guidance for Referring Multi-object Tracking
guangyao li · Siping Zhuang · Yajun Jian · Yan Yan · Hanzi Wang
|
||
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang · Kun-Yu Lin · Chaolei Tan · Jianguo Zhang · Wei-Shi Zheng · Jian-Fang Hu
|
||
MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing
Haoxuan Li · Ziya Erkoç · Lei Li · Daniele Sirigatti · Vladislav Rosov · Angela Dai · Matthias Nießner
|
||
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints
Debasmit Das · Hyoungwoo Park · Munawar Hayat · Seokeon Choi · Sungrack Yun · Fatih Porikli
|
||
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu · Han Cai · Junyu Chen · Zhuoyang Zhang · Enze Xie · Jincheng YU · Junsong Chen · Jinyi Hu · Yao Lu · Song Han
|
||
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger · Snehal Jauhri · Vignesh Prasad · Georgia Chalvatzaki
|
||
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
Yufei Zhang · Zijun Cui · Jeffrey Kephart · Qiang Ji
|
||
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Kaichen Zhang · Yifei Shen · Bo Li · Ziwei Liu
|
||
Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
Wontae Kim · Keuntek Lee · Nam Ik Cho
|
||
MDP-Omni: Parameter-free Multimodal Depth Prior-based Sampling for Omnidirectional Stereo Matching
Eunjin Son · HyungGi Jo · Wookyong Kwon · Sang Jun Lee
|
||
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
ying ba · Tianyu Zhang · Yalong Bai · Wenyi Mo · Tao Liang · Bing Su · Ji-Rong Wen
|
||
Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
Shengqi Liu · Yuhao Cheng · Zhuo Chen · Xingyu Ren · Wenhan Zhu · Lincheng Li · Mengxiao Bi · Xiaokang Yang · Yichao Yan
|
||
PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian · Jinyu Miao · Xinyu Jiao · Ziang Luo · Zheng Fu · Yining Shi · Yunlong Wang · Kun Jiang · Diange Yang
|
||
Simultaneous Motion And Noise Estimation with Event Cameras
Shintaro Shiba · Yoshimitsu Aoki · Guillermo Gallego
|
||
Enhancing Transformers Through Conditioned Embedded Tokens
Hemanth Saratchandran · Simon Lucey
|
||
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu · Siwei Nie · Minlong Lu · Xudong Yang · Xiaobo Zhang · Peng Zhang
|
||
Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP
Trevor Canham · SaiKiran Tedla · Michael Murdoch · Michael Brown
|
||
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu · Chuanxia Zheng · Frank Guan · Andrea Vedaldi · Tat-Jen Cham
|
||
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang · Wenliang Zheng · Aashrith Madasu · Peng Shi · Ryo Kamoi · Hao Zhou · Zhuoyang Zou · Shu Zhao · Sarkar Snigdha Sarathi Das · Vipul Gupta · Xiaoxin Lu · Nan Zhang · Ranran Zhang · Avitej Iyer · Renze Lou · Wenpeng Yin · Rui Zhang
|
||
SpecGuard: Spectral Projection-based Advanced Invisible Watermarking
Inzamamul Alam · Md Islam · Simon Woo · Khan Muhammad
|
||
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang · Mingli Zhu · Jiarong Ou · Rui Chen · Xin Tao · Pengfei Wan · Baoyuan Wu
|
||
CaliMatch: Adaptive Calibration for Improving Safe Semi-supervised Learning
Jinsoo Bae · Seoung Bum Kim · Hyungrok Do
|
||
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
Zeyu Wang · Jizheng Zhang · Haiyu Song · Mingyu Ge · Jiayu Wang · Haoran Duan
|
||
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei · Jiacong Wang · Haochen Wang · Xiangtai Li · Jun Hao Liew · Jiashi Feng · Zilong Huang
|
||
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya · Elijah Cole · Oisin Mac Aodha · Grant Horn · Subhransu Maji
|
||
SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
Zihui Gao · Jia-Wang Bian · Guosheng Lin · Hao Chen · Chunhua Shen
|
||
Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning
Yafei Zhang · Lingqi Kong · Huafeng Li · Jie Wen
|
||
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco · Rahul Ramesh · Randall Balestriero · Pratik Chaudhari
|
||
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Byungjun Kim · Shunsuke Saito · Giljoo Nam · Tomas Simon · Jason Saragih · Hanbyul Joo · Junxuan Li
|
||
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen · Rolandos Alexandros Potamias · Shizhe Chen · Cordelia Schmid
|
||
TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity
Yuzhuo Chen · Zehua Ma · Han Fang · Weiming Zhang · Nenghai Yu
|
||
Temporal-aware Query Routing for Real-time Video Instance Segmentation
Zesen Cheng · Kehan Li · Yian Zhao · Hang Zhang · Chang Liu · Jie Chen
|
||
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang · Zhe Li · Xingqun Qi · Mengze Li · Muyi Sun · Siye Wang · Man Zhang · Sirui Han
|
||
WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image
Jiwoo Park · Tae Choi · Youngjun Jun · Seong Jae Hwang
|
||
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han · Yihao Zhao · Yanhao Ge · Mingyu You
|
||
Semi-ViM: Bidirectional State Space Model for Mitigating Label Imbalance in Semi-Supervised Learning
Hongyang He · Hongyang Xie · Haochen You · Victor Sanchez
|
||
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh · Hwanhee Jung · JongWook Kim · Seunggwan Lee · Innfarn Yoo · Andreas Lugmayr · Seunggeun Chi · Karthik Ramani · Sangpil Kim
|
||
Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows
Xianglin Qiu · Xiaoyang Wang · Zhen Zhang · Jimin XIAO
|
||
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu · Minghe Weng · Zekang Xiao · Rui Jiang · Wei Su · Guangcong Zheng · Luping Luping · Xi Li
|
||
When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
Hanqing Liu · Shouwei Ruan · Yao Huang · Shiji Zhao · Xingxing Wei
|
||
Unfolding-Associative Encoder-Decoder Network with Progressive Alignment for Pansharpening
Shijie Fang · Hongping Gan
|
||
Attention to Neural Plagiarism: Diffusion models Can Plagiarize Your Copyrighted Images!
zihang zou · Boqing Gong · Liqiang Wang
|
||
Human-Object Interaction from Human-Level Instructions
Zhen Wu · Jiaman Li · Pei Xu · Karen Liu
|
||
Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning
Hualong Ke · Yachao Zhang · Jiangming Shi · FangyongWang FangyongWang · Yuan Xie · Yanyun Qu
|
||
Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim · seunghwan Lee · Aecheon Jung · Bogon Ryu · Sungeun Hong
|
||
Boosting Multimodal Learning via Disentangled Gradient Learning
Shicai Wei · Chunbo Luo · Yang Luo
|
||
DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
Songsong Duan · Xi Yang · Nannan Wang
|
||
Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection
Lei Fan · Junjie Huang · Donglin Di · Anyang Su · Tianyou Song · Maurice Pagnucco · Yang Song
|
||
GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting
Yusen XIE · Zhenmin Huang · Jin Wu · Jun Ma
|
||
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text.
Bingchao Wang · Zhiwei Ning · Jianyu Ding · Xuanang Gao · Yin Li · Dongsheng Jiang · JIE YANG · Wei Liu
|
||
SAM4D: Segment Anything in Camera and LiDAR Streams
Jianyun Xu · Song Wang · Ziqian Ni · Chunyong Hu · Sheng Yang · Jianke Zhu · Qiang Li
|
||
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Yunshan Zhong · Yuyao Zhou · Yuxin Zhang · Wanchen Sui · Shen Li · Yong Li · Fei Chao · Rongrong Ji
|
||
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
Guanxing Lu · Baoxiong Jia · Puhao Li · Yixin Chen · Ziwei Wang · Yansong Tang · Siyuan Huang
|
||
GSOT3D: Towards Generic 3D Single Object Tracking in the Wild
Yifan Jiao · Yunhao Li · Junhua Ding · Qing Yang · Song Fu · Heng Fan · Libo Zhang
|
||
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele · Boyang Sun · Francis Engelmann · Marc Pollefeys · Leonidas Guibas
|
||
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration
Javier Tirado-Garín · Javier Civera
|
||
OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS
Han Ling · Yinghui Sun · Xian Xu · Quansen Sun
|
||
Causality-guided Prompt Learning for Vision-language Models via Visual Granulation
Mengyu Gao · Qiulei Dong
|
||
HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation
Chenzhong Gao · Wei Li · Desheng Weng
|
||
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Congyi Fan · Jian Guan · Xuanjia Zhao · Dongli Xu · Youtian Lin · Tong Ye · Pengming Feng · Haiwei Pan
|
||
$\text{CO}_2$-Net: A Physics-Informed Spatio-Temporal Model for Global Surface $\text{CO}_2$ Reconstruction
Hao Zheng · Yuting Zheng · Hanbo Huang · Chaofan Sun · Enhui Liao · Lin Liu · Yi Han · Hao Zhou · Shiyu Liang
|
||
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi · Haoying Sun · Yaofei Wu · Junchi Yan · Haoran Zhang · Lifang Wu · Liang Wang · Chang Wen Chen
|
||
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li · Konstantinos Kallidromitis · Akash Gokul · Arsh Koneru · Yusuke Kato · Kazuki Kozuka · Aditya Grover
|
||
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa · Jihyun Lee · Shunsuke Saito · Jason Saragih · Fabian Prada · Yichen Xu · Shoou-I Yu · Ryosuke Furuta · Yoichi Sato · Takaaki Shiratori
|
||
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li · Zhihao Xu · Chenming Wu · Zhao Yang · Yumeng Zhang · Jiang-Jiang Liu · Haibao Yu · Xiaoqing Ye · YuAn Wang · Shirui Li · Xun Sun · Ji Wan · Jun Wang
|
||
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas · Deepti Ghadiyaram
|
||
Beyond Perspective: Neural 360-Degree Video Compression
Andy Regensky · Marc Windsheimer · Fabian Brand · Andre Kaup
|
||
Progressive Artwork Outpainting via Latent Diffusion Models
Dae-Young Song · Jung-Jae Yu · Donghyeon Cho
|
||
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam · DongHyun Moon · Sang-Chul Lee
|
||
GUAVA:Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang · Yunfei Liu · Lijian Lin · Ye Zhu · Yang Li · Minghan Qin · Yu Li · Haoqian Wang
|
||
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
Taeuk Jang · Hoin Jung · Xiaoqian Wang
|
||
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR
Christophe Bolduc · Yannick Hold-Geoffroy · Jean-Francois Lalonde
|
||
StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
Shakiba Kheradmand · Delio Vicini · George Kopanas · Dmitry Lagun · Kwang Moo Yi · Mark Matthews · Andrea Tagliasacchi
|
||
HQCLIP: Leveraging Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI · Guangting Wang · Xiaoxiao Ma · Ke Mei · Fengyun Rao · Huaian Chen · Yi Jin
|
||
Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving
Zixian Guo · Ming Liu · Qilong Wang · Zhilong Ji · Jinfeng Bai · Lei Zhang · Wangmeng Zuo
|
||
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Wei Suo · Ji Ma · Mengyang Sun · Lin Wu · PENG WANG · Yanning Zhang
|
||
Evidential Knowledge Distillation
Liangyu Xiang · Junyu Gao · Changsheng Xu
|
||
From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras
Youngho Kim · Hoonhee Cho · Kuk-Jin Yoon
|
||
VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition
Shuting Dong · Mingzhi Chen · Feng Lu · Hao Yu · Guanghao Li · Zhe Wu · Ming Tang · Chun Yuan
|
||
Physical Degradation Model-Guided Interferometric Hyperspectral Reconstruction with Unfolding Transformer
Yuansheng Li · Yunhao Zou · Linwei Chen · Ying Fu
|
||
LBM: Latent Bridge Matching for Fast Image-to-Image Translation
Clément Chadebec · Onur Tasar · Sanjeev Sreetharan · Benjamin Aubin
|
||
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang · Haoxin Yang · Yan Cai · Xuemiao Xu · Huaidong Zhang · Shengfeng He
|
||
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He · Zi-Xin Zou · Chia Hao Chen · Yuan-Chen Guo · Ding Liang · Chun Yuan · Wanli Ouyang · Yan-Pei Cao · Yangguang Li
|
||
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang · Bo Dang · Wanchun Li · Wei Chen · Yansheng Li
|
||
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liming Jiang · Qing Yan · Yumin Jia · Zichuan Liu · Hao Kang · Xin Lu
|
||
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu · Yuanqi Su · Xiaoning Zhang · Longjun Gao · Yu Xue · Le Wang
|
||
Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation
Xueqing Deng · Linjie Yang · Qihang Yu · Chenglin Yang · Liang-Chieh (Jay) Chen
|
||
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang · Dayoung Gong · Manjin Kim · Minsu Cho
|
||
Towards Robustness of Person Search against Corruptions
Woojung Son · Yoonki Cho · Guoyuan An · Chanmi Lee · Sung-eui Yoon
|
||
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu · Chengqun Yang · Zili Lin · Fei Xu · Yifan Liu · Congsheng Xu · Yiyi Zhang · Jie Qin · Xingdong Sheng · Yunhui Liu · Xin Jin · Yichao Yan · Wenjun Zeng · Xiaokang Yang
|
||
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu · Sheng Zhang · Harshit Soora · Furong Huang · Heng Huang · Pratap Tokekar · Ruohan Gao
|
||
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment
Kejia Zhang · Juanjuan Weng · Zhiming Luo · Shaozi Li
|
||
Time-Aware Auto White Balance in Mobile Photography
Mahmoud Afifi · Luxi Zhao · Abhijith Punnappurath · Mohamed Abdelsalam · Ran Zhang · Michael Brown
|
||
Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong · Wei Zhang · Yaohui Jin · Xiaoyu Shen
|
||
GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering
Kai Ye · Chong Gao · Guanbin Li · Wenzheng Chen · Baoquan Chen
|
||
ForCenNet: Foreground-Centric Network for Document Image Rectification
Peng Cai · liqiang liqiang · Kaicheng Yang · guodong guodong · lijia lijia · zhounan zhounan · Xiang An · Ninghua Yang · Jiankang Deng
|
||
Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment
Renye Yan · Jikang Cheng · Yaozhong Gan · Shikun Sun · You Wu · Yunfan Yang · Ling Liang · JinLong Lin · Yeshuang Zhu · Jie Zhou · Jinchao Zhang · Junliang Xing · Yimao Cai · Ru Huang
|
||
StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors
Xiaokun Sun · Zeyu Cai · Ying Tai · Jian Yang · Zhenyu Zhang
|
||
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao · Chen-Wei Xie · Hao Tang · Tingyu Weng · Xiaofeng Wang · Yun Zheng · Xingang Wang
|
||
Importance-Based Token Merging for Efficient Image and Video Generation
Haoyu Wu · Jingyi Xu · Hieu Le · Dimitris Samaras
|
||
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yan Wu · Korrawe Karunratanakul · Zhengyi Luo · Siyu Tang
|
||
Dataset Distillation with Feature Matching through the Wasserstein Metric
Haoyang Liu · Peiran Wang · Yijiang Li · Tiancheng Xing · Vibhu Dalal · Luwei LI · Jingrui He · Haohan Wang
|
||
Cultural Gaps in the Long Tail of Text-to-Image Models
Aniket Rege · Zinnia Nie · Unmesh Raskar · Mahesh Ramesh · Zhuoran Yu · Aditya Kusupati · Yong Jae Lee · Ramya Vinayak
|
||
Subjective Camera: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion
Haoyang Chen · Dongfang Sun · Caoyuan Ma · Shiqin Wang · Kewei Zhang · Zheng Wang · Zhixiang Wang
|
||
Information-Bottleneck Driven Binary Neural Network for Change Detection
Kaijie Yin · Zhiyuan Zhang · Shu Kong · Tian Gao · Cheng-zhong Xu · Hui Kong
|
||
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
Feng yan · Fanfan Liu · Yiyang Huang · ZechaoGuan ZechaoGuan · Liming Zheng · Yufeng Zhong · Chengjian Feng · Lin Ma
|
||
Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts
Mingqi Fang · Ziguang Li · Lingyun Yu · Quanwei Yang · Hongtao Xie · Yongdong Zhang
|
||
Ensemble Foreground Management for Unsupervised Object Discovery
Ziling Wu · Armaghan Moemeni · Praminda Caleb-Solly
|
||
ERNet: Efficient Non-Rigid Registration Network for Point Sequences
Guangzhao He · Yuxi Xiao · Zhen Xu · Xiaowei Zhou · Sida Peng
|
||
LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment
Juelin Zhu · Shuaibang Peng · Long Wang · Hanlin Tan · Yu Liu · Maojun Zhang · Shen Yan
|
||
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim · Hyungjin Chung · Byung-Hoon Kim
|
||
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki · Qi Dai · Lee Hyoseok · Chong Luo · Tae-Hyun Oh
|
||
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li · Yiyuan Zhang · Longteng Guo · Xiangyu Yue · Jing Liu
|
||
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem · Filippo Aleotti · Jamie Watson · Zawar Qureshi · Abdelrahman Eldesokey · Peter Wonka · Gabriel Brostow · Sara Vicente · Guillermo Garcia-Hernando
|
||
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Guowei Xu · Peng Jin · ZiangWu ZiangWu · Li Hao · Yibing Song · Lichao Sun · Li Yuan
|
||
RegionFocus: Visual Test-time Scaling for GUI Agents
Tiange Luo · Lajanugen Logeswaran · Justin Johnson · Honglak Lee
|
||
Advancing Visual Large Language Model for Multi-granular Versatile Perception
WentaoXiang WentaoXiang · Haoxian Tan · Cong Wei · Yujie Zhong · Dengjie Li · Yujiu Yang
|
||
Object-level Correlation for Few-Shot Segmentation
chunlin wen · Yu Zhang · Jie Fan · Hongyuan Zhu · Xiu-Shen Wei · Yijun Wang · Zhiqiang Kou · Shuzhou Sun
|
||
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
Shaoyuan Xie · Lingdong Kong · Yuhao Dong · Chonghao Sima · Wenwei Zhang · Qi Chen · Ziwei Liu · Liang Pan
|
||
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem · Piotr Teterwak · Kate Saenko · Bryan Plummer
|
||
Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz · Jason Y. Zhang · Pratul Srinivasan · Ruiqi Gao · Arthur Brussee · Aleksander Holynski · Ricardo Martin Brualla · Jonathan Barron · Philipp Henzler
|
||
WalkVLM: Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan · Ting Zhang · Yeshuang Zhu · Jiapei Zhang · Ying Deng · Zexi Jia · Peixiang Luo · Xiaoyue Duan · Jie Zhou · Jinchao Zhang
|
||
Punching Bag vs. Punching Person: Motion Transferability in Videos
Raiyaan Abdullah · Jared Claypoole · Michael Cogswell · Ajay Divakaran · Yogesh Rawat
|
||
PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing
Fu-Jen Tsai · Yan-Tsung Peng · Yen-Yu Lin · Chia-Wen Lin
|
||
Auto-Vocabulary Semantic Segmentation
Osman Ülger · Maksymilian Kulicki · Yuki Asano · Martin Oswald
|
||
Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning
Jieyi Tan · Chengwei Zhang · Bo Dang · Yansheng Li
|
||
Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis
Kaiyang Ji · Ye Shi · Zichen Jin · Kangyi Chen · Lan Xu · Yuexin Ma · Jingyi Yu · Jingya Wang
|
||
Debiased Curriculum Adaptation for Safe Transfer Learning in Chest X-ray Classification
Mingyang Liu · Xinyang Chen · Yang Shu · Xiucheng Li · Weili Guan · Liqiang Nie
|
||
Rethinking DPO-style Diffusion Aligning Frameworks
XUN WU · Shaohan Huang · Lingjie Jiang · Furu Wei
|
||
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin · Huu-Tai Phung · Yi-Chen Yao · Kuan-Wei Ho · Yi-Hsin Chen · Yu-Hsiang Lin · Alessandro Gnutti · Wen-Hsiao Peng
|
||
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon · Jong Ye
|
||
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation
Peiran Xu · Xicheng Gong · Yadong Mu
|
||
Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions
Yuanhong Zheng · Ruixuan Yu · Jian Sun
|
||
DIVE: Taming DINO for Subject-Driven Video Editing
Yi Huang · Wei Xiong · He Zhang · Chaoqi Chen · Jianzhuang Liu · Mingfu Yan · Shifeng Chen
|
||
SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection
Shunjie Yuan · Xinghua Li · Xuelin Cao · Haiyan Zhang · Mengyao Zhu · Robert Deng
|
||
Generalizable 4D Human Object Interaction Synthesis by Composing Interaction Primitives
Kai Jia · Tengyu Liu · Mingtao Pei · Yixin Zhu · Siyuan Huang
|
||
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception
yunjiang xu · Yupeng Ouyang · Lingzhi Li · Jin Wang · Benyuan Yang
|
||
Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection
Yujeong Chae · Heejun Park · Hyeonseong Kim · Kuk-Jin Yoon
|
||
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
Chen Yi Lu · Mehrab Tanjim · Ishita Dasgupta · Somdeb Sarkhel · Gang Wu · Saayan Mitra · Somali Chaterji
|
||
HAMSt3R: Human Aware Multi-view Stereo 3D Reconstruction
Sara Rojas Martinez · Matthieu Armando · Bernard Ghanem · Philippe Weinzaepfel · Vincent Leroy · Grégory Rogez
|
||
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
Tianyi Wang · Shuaicheng Niu · Harry Cheng · xiao zhang · Yinglong Wang
|
||
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
xinyi zheng · Steve Zhang · Weizhe Lin · Fan Zhang · Walterio Mayol-Cuevas · Yunze Liu · Junxiao Shen
|
||
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong · Xu Zheng · Chenfei Liao · Yuanhuiyi Lyu · Jialei Chen · Shengyang Wu · Linfeng Zhang · Xuming Hu
|
||
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
Xinlong Ding · Hongwei Yu · Jiawei Li · Feifan Li · Yu Shang · Bochao Zou · Huimin Ma · Jiansheng Chen
|
||
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
Han-Hung Lee · Qinghong Han · Angel Chang
|
||
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park · Hyungmin Kim · Sangwoo kim · Wonseok Jeon · Juyoung Yang · Byeongwook Jeon · Yoonseon Oh · Jungwook Choi
|
||
What to Distill? Fast Knowledge Distillation with Adaptive Sampling
Byungchul Chae · Seonyeong Heo
|
||
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti · Lorenzo Bianchi · Nicola Messina · Fabio Carrara · Marcella Cornia · Lorenzo Baraldi · Fabrizio Falchi · Rita Cucchiara
|
||
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
Junhyeog Yun · Minui Hong · Gunhee Kim
|
||
Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement
Junyu Lou · Xiaorui Zhao · Kexuan Shi · Shuhang Gu
|
||
Bayesian-Inspired Space-Time Superpixels
Kent Gauen · Stanley Chan
|
||
AnimalClue: Recognizing Animals by their Traces
Risa Shinoda · Nakamasa Inoue · Iro Laina · Christian Rupprecht · Hirokatsu Kataoka
|
||
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu · Diankun Zhang · Zongchuang Zhao · Jianfeng Cui · DINGKANG LIANG · Chong Zhang · Dingyuan Zhang · Hongwei Xie · BING WANG · Xiang Bai
|
||
Beyond the Limits: Overcoming Negative Correlation of Activation-Based Training-Free NAS
Haidong Kang · Lianbo Ma · Pengjun Chen · Guo Yu · Xingwei Wang · Min Huang
|
||
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
Rui Yang · Huining Li · Yiyi Long · Xiaojun Wu · Shengfeng He
|
||
Learning Normals of Noisy Points by Local Gradient-Aware Surface Filtering
Qing Li · Huifang Feng · Xun Gong · Liang Han
|
||
ROAR: Reducing Inversion Error in Generative Image Watermarking
Hanyi Wang · Han Fang · Shi-Lin Wang · Ee-Chien Chang
|
||
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
Jinhong Ni · Chang-Bin Zhang · Qiang Zhang · Jing Zhang
|
||
Federated Continual Instruction Tuning
Haiyang Guo · Fanhu Zeng · Fei Zhu · Wenzhuo Liu · Da-Han Wang · Jian Xu · Xu-Yao Zhang · Cheng-Lin Liu
|
||
4D Visual Pre-training for Robot Learning
Chengkai Hou · Yanjie Ze · Yankai Fu · Zeyu Gao · Songbo Hu · Yue Yu · Shanghang Zhang · Huazhe Xu
|
||
Spatially-Varying Autofocus
Yingsi Qin · Aswin Sankaranarayanan · Matthew O'Toole
|
||
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng · Zeze Tao · Huibing Wang · Meng Wang · Yang Wang
|
||
OpenSubstance: A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes
Fan Pei · jinchen bai · Xiang Feng · Zoubin Bi · Kun Zhou · Hongzhi Wu
|
||
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim · Seunghoon Jeong · Giseop Kim · Myung-Hwan Jeon · Eunji Jun · Ayoung Kim
|
||
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao · Lihe Ding · Rui Han · Zhanpeng Huang · Zibin Wang · Tianfan Xue
|
||
When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection
Hongliang hongliang · Yongxiang Liu · Canyu Mo · Weijie Li · Bowen Peng · Li Liu
|
||
DyGS-SLAM: Real-Time Accurate Localization and Gaussian Reconstruction for Dynamic Scenes
Xinggang Hu · Chenyangguang Zhang · Mingyuan Zhao · Yuanze Gui · Xiangkui Zhang · Xiangyang Ji
|
||
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park · Hyunjae Oh · Mincheol Park · Hyunchan Moon · Minsik Kim · Suhyun Kim · Myung Kuk Yoon · Won Woo Ro
|
||
LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning
Jiang Yuan · ji ma · Bo Wang · Guanzhou Ke · Weiming Hu
|
||
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang · Xuanyu Yi · Haohan Weng · Qingshan Xu · xiaokang wei · Xianghui Yang · Chunchao Guo · Long Chen · Hanwang Zhang
|
||
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
Yiyuan Zhang · Handong Li · Jing Liu · Xiangyu Yue
|
||
Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry
Matan Kichler · Shai Bagon · Mark Sheinin
|
||
SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching
Xiangzeng Liu · CHI WANG · GuangluShi GuangluShi · Xiaodong Zhang · Qiguang Miao · Miao Fan
|
||
Adversarial Training for Probabilistic Robustness
YI ZHANG · Yuhang Chen · Zhen Chen · Wenjie Ruan · Xiaowei Huang · Siddartha Khastgir · Xingyu Zhao
|
||
Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment
Qingqian Yang · Peishen Yan · Xiaoyu Wu · Jiaru Zhang · Tao Song · Yang Hua · Hao Wang · Liangliang Wang · Haibing Guan
|
||
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni · Meet Soni · Sirisha Rambhatla
|
||
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang · Runsen Xu · Chenhang Cui · Tai Wang · Dahua Lin · Jiangmiao Pang
|
||
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury · Sayan Nag · Subhrajyoti Dasgupta · Yaoting Wang · Mohamed Elhoseiny · Ruohan Gao · Dinesh Manocha
|
||
CF3: Compact and Fast 3D Feature Fields
Hyunjoon Lee · Joonkyu Min · Jaesik Park
|
||
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan · Xinchen Yan · Vincent Casser · Abhijit Kundu
|
||
Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai · Yen-Liang Lin · Minxu Peng · Larry Davis · Ashwin Chandramouli · Junsong Yuan · David Doermann
|
||
RayGaussX: Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis
Hugo Blanc · Jean-Emmanuel Deschaud · Alexis Paljic
|
||
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
Jiaqi Han · Haotian Ye · Puheng Li · Minkai Xu · James Zou · Stefano Ermon
|
||
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
Tuo Xiang · Xuemiao Xu · Bangzhen Liu · Jinyi Li · Yong Li · Shengfeng He
|
||
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
Mengchen Zhang · Tong Wu · Jing Tan · Ziwei Liu · Gordon Wetzstein · Dahua Lin
|
||
Diagnosing Pretrained Models for Out-of-distribution Detection
Haipeng Xiong · Kai Xu · Angela Yao
|
||
Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space
Yingping Liang · Yutao Hu · Wenqi Shao · Ying Fu
|
||
Self-supervised Learning of Hybrid Part-aware 3D Representation of 2D Gaussians and Superquadrics
Zhirui Gao · Renjiao Yi · Yuhang Huang · Wei Chen · Chenyang Zhu · Kai Xu
|
||
CoralSRT: Revisiting Coral Reef Semantic Segmentation by Feature Rectifying via Self-supervised Guidance
Zheng Ziqiang · Wong Kwan · Binh-Son Hua · Jianbo Shi · Sai-Kit Yeung
|
||
ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
Hyun Jun Yook · Ga Jhun · Cho Hyun · Min Jeon · Donghyun Kim · Tae Kim · Youn Lee
|
||
PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
Jeonghyeok Do · Sungpyo Kim · Geunhyuk Youk · Jaehyup Lee · Munchurl Kim
|
||
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Zongyan Han · Mohamed El Amine Boudjoghra · Jiahua Dong · Jinhong Wang · Rao Anwer
|
||
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang · Zhen Han · Chaojie Mao · Jingfeng Zhang · Yulin Pan · Yu Liu
|
||
UDC-VIX: A Real-World Video Dataset for Under-Display Cameras
Kyusu Ahn · Jisoo Kim · Sangik Lee · HyunGyu Lee · Byeonghyun Ko · Chanwoo Park · Jaejin Lee
|
||
Super Resolved Imaging with Adaptive Optics
Robin Swanson · Esther Y. H. Lin · Masen Lamb · Suresh Sivanandam · Kyros Kutulakos
|
||
Controllable and Expressive One-Shot Video Head Swapping
Chaonan Ji · Jinwei Qi · Peng Zhang · Bang Zhang · Liefeng Bo
|
||
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
Donald Shenaj · Ondrej Bohdal · Mete Ozay · Pietro Zanuttigh · Umberto Michieli
|
||
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li · Yanqing Liu · Haoqin Tu · Cihang Xie
|
||
PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation
Fei Xie · Zhongdao Wang · Weijia Zhang · Chao Ma
|
||
FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
Wenbin Teng · Gonglin Chen · Haiwei Chen · Yajie Zhao
|
||
MikuDance: Animating Character Art with Mixed Motion Dynamics
Jiaxu Zhang · Xianfang Zeng · Xin Chen · Wei Zuo · Gang YU · Zhigang Tu
|
||
A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu · Jian Zhang · Minghao Guo · Youpeng Wen · Haoting Yang · Min Lin · Jianzheng Huang · Zhe Li · Kaidong Zhang · Liqiong Wang · Yuxuan Kuang · Meng Cao · Feng Zheng · Xiaodan Liang
|
||
InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
Kefan Chen · Sergiu Oprea · Justin Theiss · Sreyas Mohan · Srinath Sridhar · Aayush Prakash
|
||
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
Ronglai Zuo · Rolandos Alexandros Potamias · Evangelos Ververas · Jiankang Deng · Stefanos Zafeiriou
|
||
Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising
Xiangbin Wei · Yuanfeng Wang · Ao XU · Lingyu Zhu · Dongyong Sun · Keren Li · Yang Li · Qi Qin
|
||
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Jiayuan Zhu · Junde Wu · Cheng Ouyang · Konstantinos Kamnitsas · Alison Noble
|
||
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu · Tengbo Yu · Haoyuan Deng · Season Chen · Yansong Tang · Ziwei Wang
|
||
Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework
Rohan Sharma · Changyou Chen · Feng-Ju Chang · Seongjun Yun · Xiaohu Xie · Rui Meng · Dehong Xu · Alejandro Mottini · qingjun cui
|
||
Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
Mengyu Yang · Yiming Chen · Haozheng Pei · Siddhant Agarwal · Arun Vasudevan · James Hays
|
||
Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generation
Ruoxi Guo · Huaijin Pi · Zehong Shen · Qing Shuai · zechenhu zechenhu · Zhumei Wang · Yajiao Dong · Ruizhen Hu · Taku Komura · Sida Peng · Xiaowei Zhou
|
||
MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
Shaojie Ma · Yawei Luo · Wei Yang · Yi Yang
|
||
Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness
Junhao Dong · Jiao Liu · Xinghua Qu · YEW-SOON ONG
|
||
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos · Cordelia Schmid · Josef Sivic
|
||
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu · Lin Liu · Shuo Wang · Jinda Lu · Kezhou Chen · Yanbin Hao
|
||
FRET: Feature Redundancy Elimination for Test Time Adaptation
Linjing You · Jiabao Lu · Xiayuan Huang · Xiangli Nie
|
||
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
Ciyu Ruan · Ruishan Guo · Zihang GONG · Jingao Xu · Wenhan Yang · Xinlei Chen
|
||
Geometric Alignment and Prior Modulation for View-Guided Point Cloud Completion on Unseen Categories
Jingqiao Xiu · Yicong Li · Na Zhao · Han Fang · Xiang Wang · Angela Yao
|
||
Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion
Mutian Xu · Chongjie Ye · Haolin Liu · Yushuang Wu · Jiahao Chang · Xiaoguang Han
|
||
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma · Yexin Liu · Yaofu LIU · Xianfeng Wu · Mingzhe Zheng · Zihao Wang · Ser-Nam Lim · Harry Yang
|
||
DDB: Diffusion Driven Balancing to Address Spurious Correlations
Aryan Yazdan Parast · Basim Azam · Naveed Akhtar
|
||
Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Yunchuan Guan · Yu Liu · Ke Zhou · Zhiqi Shen · Jenq-Newng Hwang · Serge Belongie · Lei Li
|
||
SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies
Liang Han · Xu Zhang · Haichuan Song · Kanle Shi · Liang Han · Zhizhong Han
|
||
Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma · Neeraj Matiyali · Siddharth Srivastava · Gaurav Sharma
|
||
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang · Haifeng Huang · Yuzhang Shang · Mubarak Shah · Yan Yan
|
||
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Medeiros · Atif Belal · Srikanth Muralidharan · Eric Granger · Marco Pedersoli
|
||
Addressing Attribute Leakage in Text Embeddings for Image Editing with Diffusion Models
Sunung Mun · Jinhwan Nam · Sunghyun Cho · Jungseul Ok
|
||
Recognizing Actions from Robotic View for Natural Human-Robot Interaction
Ziyi Wang · Peiming Li · Hong Liu · Zhichao Deng · Can Wang · Jun Liu · Junsong Yuan · Mengyuan Liu
|
||
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
Hao Li · Xiang Chen · Jiangxin Dong · Jinhui Tang · Jinshan Pan
|
||
CVPT: Cross Visual Prompt Tuning
Lingyun Huang · Jianxu Mao · Junfei YI · Ziming Tao · Yaonan Wang
|
||
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai · Tsung-Wei Huang · Shiv Gehlot · Brandon Feng · Sachin Shah · Guan-Ming Su · Christopher Metzler
|
||
To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models
Julia Machnio · Mads Nielsen · Mostafa Mehdipour Ghazi
|
||
Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao · Jingwen Xuan · Jiamin Lou · Yonghui Yu · Wenwu Yang
|
||
IntrinsicControlNet: Cross-distribution Image Generation with Real and Unreal
Jiayuan Lu · Rengan Xie · Zixuan Xie · Zhizhen Wu · Dianbing Xi · Qi Ye · Rui Wang · Hujun Bao · Yuchi Huo
|
||
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Qi Qin · Le Zhuo · Yi Xin · Ruoyi Du · Zhen Li · Bin Fu · Yiting Lu · Xinyue Li · Dongyang Liu · Xiangyang Zhu · Will Beddow · Erwann Millon · Victor Perez · Wenhai Wang · Yu Qiao · Bo Zhang · Xiaohong Liu · Hongsheng Li · Chang Xu · Peng Gao
|
||
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
JIXUAN FAN · Wanhua Li · Yifei Han · Tianru Dai · Yansong Tang
|
||
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Weida Wang · Changyong He · Jin Zeng · Di Qiu
|
||
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang · Jiawei Kong · Wenbo Yu · Bin Chen · Jiawei Li · Hao Wu · Shu-Tao Xia · Ke Xu
|
||
PASD: A Pixel-Adaptive Swarm Dynamics Approach for Unsupervised Low-Light Image Enhancement
Shuai Jin · Yuhua Qian · Feijiang Li · Guoqing Liu · Xinyan Liang
|
||
$\Phi$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang · Yihan Zhuang · Qian Guo · Haodong Yang · Xuelin Qian · Gong Cheng · Junwei Han · Zhongling Huang
|
||
A Real-world Display Inverse Rendering Dataset
Seokjun Choi · Hoon-Gyu Chung · Yujin Jeon · Giljoo Nam · Seung-Hwan Baek
|
||
Revisiting Point Cloud Completion: Are We Ready For The Real-World?
Stuti Pathak · Prashant Kumar · Dheeraj Baiju · Nicholus Mboga · Gunther Steenackers · Rudi Penne
|
||
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang · Peijia Li · Furao Shen · Jian Zhao
|
||
A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter · Asaf Shul · Matan Cohen · Dana Berman · Yael Pritch · Alex Rav-Acha · Yedid Hoshen
|
||
Unbiased Missing-modality Multimodal Learning
Raiting Dai · Chenxi Li · Yandong Yan · Lisi Mo · Ke Qin · Tao He
|
||
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
Sijie Li · Chen Chen · Jungong Han
|
||
Di$\mathtt{[M]}$O: Distilling Masked Diffusion Models into One-step Generator
Yuanzhi Zhu · Xi WANG · Stéphane Lathuilière · Vicky Kalogeiton
|
||
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen · Dongyun Zou · Wenkun He · Junsong Chen · Enze Xie · Song Han · Han Cai
|
||
NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion
Zihao Xu · Yuzhi Tang · Bowen Xu · Qingquan Li
|
||
Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li · Mykola Lavreniuk · Jian Shi · Shariq Bhat · Peter Wonka
|
||
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye · Minghang Yu · Chunyan Xu · Zhen Cui
|
||
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee · Kyungho Bae · Kyle Min · Gyeong-Moon Park · Jinwoo Choi
|
||
Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning
Yiyang Chen · Shanshan Zhao · Lunhao Duan · Changxing Ding · Dacheng Tao
|
||
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang · Binzhu Xie · Zhonghao Yan · Yuli Zhang · Donghao Zhou · Xiaofei Chen · Shi Qiu · Jiaqi Liu · Guoyang Xie · Zhichao Lu
|
||
One Encoder to Rule them All: Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators
Parag Dutta · Mohd Ayyoob · Shalabh Bhatnagar · Ambedkar Dukkipati
|
||
Scoring, Remember, and Reference: Catching Camouflaged Objects in Videos
Yuang Feng · Shuyong Gao · Fuzhen Yan · Yicheng Song · Lingyi Hong · Junjie Hu · Wenqiang Zhang
|
||
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hanling Zhang · Rundong Su · Zhihang Yuan · Pengtao Chen · Mingzhu Shen · Yibo Fan · Shengen Yan · Guohao Dai · Yu Wang
|
||
OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving
Kota Shimomura · Masaki Nambata · Atsuya Ishikawa · Ryota Mimura · Takayuki Kawabuchi · Takayoshi Yamashita · Koki Inoue
|
||
Planar Affine Rectification from Local Changes of Scale and Orientation
Yuval Nissan · Marc Pollefeys · Daniel Barath
|
||
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya · Haozhe Liu · Sen He · Ding Liu · Menglin Jia · Chenyang Zhang · Michael Ryoo · Tian Xie
|
||
Deep Adaptive Unfolded Network via Spatial Morphology Stripping and Spectral Filtration for Pan-sharpening
Hebaixu Wang · Jiayi Ma
|
||
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee · Peng Gao · Yongqi Xu · Wentao Fan
|
||
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo · Zeyu HU · Na Zhao · De Wen Soh
|
||
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik · Huseyin Coskun · Zeynep Akata · Sergey Tulyakov · Jian Ren · Anil Kag
|
||
Hi-Gaussian: Hierarchical Gaussians under Normalized Spherical Projection for Single-View 3D Reconstruction
Binjian Xie · Pengju Zhang · Hao Wei · Yihong Wu
|
||
Probabilistic Point Clouds from Single-Photon LiDARs for Robust 3D Inference
Bhavya Goyal · Felipe Gutierrez-Barragan · Wei Lin · Andreas Velten · Yin Li · Mohit Gupta
|
||
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Weiyi You · Mingyang Zhang · Leheng Zhang · Xingyu Zhou · Kexuan Shi · Shuhang Gu
|
||
Fine-Tuning Visual Autogressive Models for Subject-Driven Generation
Jiwoo Chung · Sangeek Hyun · Hyunjun Kim · Eunseo Koh · Minkyu Lee · Jae-Pil Heo
|
||
Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning
Zhi-Wei Xia · Kun-Yu Lin · Yuan-Ming Li · Wei-Jin Huang · Xian-Tuo Tan · Wei-Shi Zheng
|
||
Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination
Chao Pan · Ke Tang · Li Qing · Xin Yao
|
||
Predict, Optimize, Distill: A Self-Improving Cycle for 4D Object Understanding
Mingxuan Wu · Huang Huang · Justin Kerr · Chung Min Kim · Anthony Zhang · Brent Yi · Angjoo Kanazawa
|
||
From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision
Chuang Yu · Jinmiao Zhao · Yunpeng Liu · Sicheng Zhao · Yimian Dai · Xiangyu Yue
|
||
EA-Vit: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu · Wangbo Zhao · Huiwen Zhang · Yuhao Zhou · Weidong Tang · Shuo Wang · Zhihang Yuan · Yuzhang Shang · Xiaojiang Peng · Kai Wang · Dawei Yang
|
||
TryOn-Refiner: Conditional Rectified-flow-based TryOn Refiner for More Accurate Detail Reconstruction
Wen Qian
|
||
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
Siyuan Yan · Ming Hu · Yiwen Jiang · Xieji Li · Hao Fei · Philipp Tschandl · Harald Kittler · Zongyuan Ge
|
||
Generative Video Bi-flow
Chen Liu · Tobias Ritschel
|
||
TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
Jinxi Li · Ziyang Song · Bo Yang
|
||
Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model
Daehee Park · Monu Surana · Pranav Desai · Ashish Mehta · Reuben John · Kuk-Jin Yoon
|
||
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai · Brian Price · Qing Liu · Luis Figueroa · Daniil Pakhomov · Zhihong Ding · Scott Cohen · Ming-Hsuan Yang
|
||
UnZipLoRA: Separating Content and Style from a Single Image
Chang Liu · Viraj Shah · Aiyu Cui · Svetlana Lazebnik
|
||
Continuous-Time Human Motion Field from Events
Ziyun Wang · Ruijun Zhang · Zi-Yan Liu · Yufu Wang · Kostas Daniilidis
|
||
Serialization based Point Cloud Oversegmentation
chenghui Lu · Dilong Li · Jianlong Kwan · Ziyi Chen · Haiyan Guan
|
||
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
Shuai Tan · Bill Gong · Bin Ji · Ye Pan
|
||
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou · Yunpeng Luo · Yuanchen Wu · Ke Sun · Jiayi Ji · Ke Yan · Shouhong Ding · Xiaoshuai Sun · Yunsheng Wu · Rongrong Ji
|
||
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao · Ting Zhang · Jiayu Sun · Mi Tian · Hua Huang
|
||
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
Dahee Kwon · Sehyun Lee · Jaesik Choi
|
||
MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning
Ylli Sadikaj · Hongkuan Zhou · Lavdim Halilaj · Stefan Schmid · Steffen Staab · Claudia Plant
|
||
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun · Yanjing Li · Linlin Yang · Xianbin Cao · Baochang Zhang
|
||
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Gen Li · Nikolaos Tsagkas · Jifei Song · Ruaridh Mon-Williams · Sethu Vijayakumar · Kun Shao · Laura Sevilla-Lara
|
||
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo · Thao Nguyen · Xun Huang · Siddharth Iyer · Yijun Li · Yuchen Liu · Abhishek Tandon · Eli Shechtman · Krishna Kumar Singh · Yong Jae Lee · Bolei Zhou · Yuheng Li
|
||
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
HAILONG YAN · Ao Li · Xiangtao Zhang · Zhe Liu · Zenglin Shi · Ce Zhu · Le Zhang
|
||
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao · Zijun Wei · Jason Kuen · Kangning Liu · Lingzhi Zhang · Jiuxiang Gu · HyunJoon Jung · Liangyan Gui · Yu-Xiong Wang
|
||
STIV: Scalable Text and Image Conditioned Video Generation
Zongyu Lin · Wei Liu · Chen Chen · Jiasen Lu · Wenze Hu · Tsu-Jui Fu · Jesse Allardice · Zhengfeng Lai · Liangchen Song · Bowen Zhang · cha chen · Yiran Fei · Lezhi Li · Yizhou Sun · Kai-Wei Chang · Yinfei Yang
|
||
ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration
Andrea Conti · Matteo Poggi · Valerio Cambareri · Martin Oswald · Stefano Mattoccia
|
||
CAFA: a Controllable Automatic Foley Artist
Roi Benita · Michael Finkelson · Tavi Halperin · Gleb Sterkin · Yossi Adi
|
||
Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
Jun Li · Jinpeng Wang · Chaolei Tan · Niu Lian · Long Chen · Yaowei Wang · Min zhang · Shu-Tao Xia · Bin Chen
|
||
MCOP: Multi-UAV Collaborative Occupancy Prediction
Zefu Lin · Wenbo Chen · Xiaojuan Jin · Yuran Yang · Lue Fan · YIXIN ZHANG · Yufeng Zhang · Zhaoxiang Zhang
|
||
StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions
Bo-Hsu Ke · You-Zhe Xie · Yu-Lun Liu · Wei-Chen Chiu
|
||
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić · Christoph Reich · Felix Wimbauer · Oliver Hahn · Christian Rupprecht · Stefan Roth · Daniel Cremers
|
||
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang · Henglin Liu · Yizhou Lin · Kaer Huang · Chubin Chen · Jie Guo · Tong-Yee Lee · Xiu Li
|
||
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
Runyang Feng · Hyung Jin Chang · Tze Ho Elden Tse · Boeun Kim · Yi Chang · Yixing Gao
|
||
On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos · Nikolaos Ioannis Bountos · Dimitrios Michail · Xiao Xiang Zhu · Gustau Camps-Valls · Ioannis Papoutsis
|
||
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Chuanyu Fu · Yuqi Zhang · Kunbin Yao · Guanying Chen · Yuan Xiong · Chuan Huang · Shuguang Cui · Xiaochun Cao
|
||
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez · Luisa Polania Cabrera · Yi Yang · Chuhan Zhang · Rishabh Kabra · Anurag Arnab · Mehdi Sajjadi
|
||
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham · Angie Boggust · Sofian Chaybouti · Hendrik Strobelt · Hilde Kuehne
|
||
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
Xiao Zhang · Fei Wei · Yong Wang · Wenda Zhao · Feiyi Li · Xiangxiang Chu
|
||
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu · I Chen · Jindong Gu · Jipeng Zhang · Renjie Pi · Qifeng Chen · Philip Torr · Ashkan Khakzar · Fabio Pizzati
|
||
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou · Yue Li · Shun Zhang · Kai Li · Shiying Wang · Pin Tao · Junliang Xing · congyan lang
|
||
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin · Jisu Nam · Jiyoung Kim · Dahyun Chung · Yeong-Seok Kim · Joonhyung Park · HeonJeong Chu · Seungryong Kim
|
||
SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior
Bo Zhao · Haoran Wang · Jinghui Wang · Hanzhang Wang · Huan Yang · Wei Ji · Hao Liu · Xinyan Xiao
|
||
Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting
Seunggeun Chi · Pin-Hao Huang · Enna Sachdeva · Kwonjoon Lee
|
||
C$^2$MIL: Synchronizing Semantic and Topological Causalities in Multiple Instance Learning for Robust and Interpretable Survival Analysis
Min Cen · Zhenfeng Zhuang · Yuzhe Zhang · Min Zeng · Baptiste Magnier · Lequan Yu · Hong Zhang · Liansheng Wang
|
||
QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization
Yueh-Cheng Liu · Lukas Hoellein · Matthias Nießner · Angela Dai
|
||
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun · Rayan Armani · Manuel Meier · Max Moebus · Christian Holz
|
||
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur · Thibault Groueix · Wang Yifan · Rana Hanocka · Vladimir Kim · Matheus Gadelha
|
||
${\rm \bf EYE}^{\bf 3}$:Turn Anything into Naked-eye 3D
Yingde Song · Zongyuan Yang · Baolin Liu · yongping xiong · Sai Chen · Lan Yi · Zhaohe Zhang · Xunbo Yu
|
||
Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Shaohan Li · Hao Yang · Min Chen · Xiaolin Qin
|
||
Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation
Tianyu Zou · Shengwu Xiong · Ruilin Yao · Yi Rong
|
||
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang · Kai Li · Chengjiang Long · Christian Häne · Peihong Guo · Scott Delp · Ehsan Adeli · Li Fei-Fei
|
||
Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting
Xingyu Miao · Haoran Duan · Quanhao Qian · Jiuniu Wang · Yang Long · Ling Shao · Deli Zhao · Ran Xu · Gongjie Zhang
|
||
SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images
Yu Sheng · Jiajun Deng · Xinran Zhang · Yu Zhang · Bei Hua · Yanyong Zhang · Jianmin Ji
|
||
Occlusion-robust Stylization for Drawing-based 3D Animation
Sunjae Yoon · Gwanhyeong Koo · Younghwan Lee · Ji Woo Hong · Chang Yoo
|
||
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi · Davide Bucciarelli · Federico Betti · Marcella Cornia · Lorenzo Baraldi · Nicu Sebe · Rita Cucchiara
|
||
ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning
Xiefan Guo · Miaomiao Cui · Liefeng Bo · Di Huang
|
||
Tree Skeletonization from 3D Point Clouds by Denoising Diffusion
Elias Marks · Lucas Nunes · Federico Magistri · Matteo Sodano · Rodrigo Marcuzzi · Lars Zimmermann · Jens Behley · Cyrill Stachniss
|
||
Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes
Feng Huang · Shuyuan Zheng · Zhaobing Qiu · Huanxian Liu · huanxin Bai · Liqiong Chen
|
||
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang · Xiaoming Liu
|
||
Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation
Pierre-André Brousseau · Sébastien Roy
|
||
Conditional Visual Autoregressive Modeling for Pathological Image Restoration
Ziyi Liu · Zhe Xu · Jiabo MA · Wenqiang Li · Ruixuan Wang · Bo Du · Hao Chen
|
||
A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
Jizong Peng · Tze Ho Elden Tse · Kai Xu · Wenchao Gao · Angela Yao
|
||
MVGBench: a Comprehensive Benchmark for Multi-view Generation Models
Xianghui Xie · Jan Lenssen · Gerard Pons-Moll
|
||
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu · Yuanhao Cai · Ruyi Zha · Zhiwen Fan · Chenxin Li · Yixuan Yuan
|
||
SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang · Fengqi Liu · Yixing Lu · Qin Zhao · Pingyu Wu · Wei Zhai · Ran Yi · Yang Cao · Lizhuang Ma · Zheng-Jun Zha · Junting Dong
|
||
IFAdapter: Instance feature control for grounded Text-to-Image Generation
YINWEI WU · Xianpan Zhou · bing ma · Xuefeng Su · Kai Ma · Xinchao Wang
|
||
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Ziming Yu · Pan Zhou · Sike Wang · Jia Li · Mi Tian · Hua Huang
|
||
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu · Guo-Hua Wang · Xiaohao Chen · Qing-Guo Chen · Zhao Xu · Weihua Luo · Kaifu Zhang
|
||
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
qiusheng huang · Xiaohui Zhong · Xu Fan · Hao Li
|
||
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
Jimyeong Kim · Jungwon Park · Yeji Song · Nojun Kwak · Wonjong Rhee
|
||
SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Maximilian Pittner · Joel Janai · Mario Faigle · Alexandru Condurache
|
||
One Last Attention for Your Vision-Language Model
Liang Chen · Ghazi Shazan Ahmad · Tianjun Yao · Lingqiao Liu · Zhiqiang Shen
|
||
MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion
Zihan Wang · Jeff Tan · Tarasha Khurana · Neehar Peri · Deva Ramanan
|
||
O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views
Lorenzo Mur-Labadia · Maria Santos-Villafranca · Jesus Bermudez-cameo · Alejandro Perez-Yus · Ruben Martinez-Cantin · Jose Guerrero
|
||
InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation
Jungmin Lee · Seonghyuk Hong · Juyong Lee · Jaeyoon Lee · Jongwon Choi
|
||
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
Wentao Hu · Shunkai Li · Ziqiao Peng · Haoxian Zhang · Fan Shi · Xiaoqiang Liu · Pengfei Wan · Di ZHANG · Hui Tian
|
||
BANet: Bilateral Aggregation Network for Mobile Stereo Matching
Gangwei Xu · Jiaxin Liu · Xianqi Wang · Junda Cheng · Yong Deng · Jinliang Zang · Yurui Chen · Xin Yang
|
||
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva · Andrew Zisserman
|
||
GaussRender: Learning 3D Occupancy with Gaussian Rendering
Loick Chambon · Eloi Zablocki · Alexandre Boulch · Mickael Chen · Matthieu Cord
|
||
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
Zhichuan Wang · Yang Zhou · Zhe Liu · Rui Yu · Song Bai · Yulong Wang · Xinwei He · Xiang Bai
|
||
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
yi yang · Xiaoxuan He · Hongkun Pan · Xiyan Jiang · Yan Deng · Xingtao Yang · Haoyu Lu · Dacheng Yin · Fengyun Rao · Minfeng Zhu · Bo Zhang · Wei Chen
|
||
AIM: Amending Inherent Interpretability via Self-Supervised Masking
Eyad Alshami · Shashank Agnihotri · Bernt Schiele · Margret Keuper
|
||
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki · Kang-Jun Liu · Naoto Inoue · Kota Yamaguchi
|
||
Is Tracking really more challenging in First Person Egocentric Vision?
Matteo Dunnhofer · Zaira Manigrasso · Christian Micheloni
|
||
Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang · Hongsi Liu · Bohan Li · Jiawei He · Zekun Qi · Yunnan Wang · Eastern Institute of Technology Shengyang · Ningbo Institute Of Digital Twin XinQiang · Galbot Wenjun · Eastern Institute for Advanced Study Xin
|
||
PVChat: Personalized Video Chat with One-Shot Learning
YUFEI SHI · Weilong Yan · Gang Xu · Yumeng Li · Yucheng Chen · ZhenXi Li · Fei Yu · Ming Li · Si Yong Yeo
|
||
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah KHAYATAN · Mustafa Shukor · Jayneel Parekh · Arnaud Dapogny · Matthieu Cord
|
||
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts
Minwen Liao · Hao Dong · Xinyi Wang · Kurban Ubul · Ziyang Yan · Yihua Shao
|
||
EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images
Wangbo Yu · Chaoran Feng · Jianing Li · Jiye Tang · Jiashu Yang · Zhenyu Tang · Meng Cao · Xu Jia · Yuchao Yang · Li Yuan · Yonghong Tian
|
||
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
Ruiyuan Gao · Kai Chen · Bo Xiao · Lanqing HONG · Zhenguo Li · Qiang Xu
|
||
Autoregressive Denoising Score Matching is a Good Video Anomaly Detector
hanwen Zhang · Congqi Cao · Qinyi Lv · Lingtong Min · Yanning Zhang
|
||
A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields
Aoxiang Fan · Corentin Dumery · Nicolas Talabot · Pascal Fua
|
||
$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang · Bingyao Yu · Yu Zheng · Wenzhao Zheng · Yueqi Duan · Lei Chen · Jie Zhou · Jiwen Lu
|
||
Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
Guanyi Qin · Ziyue Wang · Daiyun Shen · Haofeng Liu · Hantao Zhou · Junde Wu · Runze Hu · Yueming Jin
|
||
Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights
Junhao Zheng · Jiahao Sun · Chenhao Lin · Zhengyu Zhao · Chen Ma · Chong Zhang · Cong Wang · Qian Wang · Chao Shen
|
||
Forecasting Continuous Non-Conservative Dynamical Systems in $SO(3)$
Lennart Bastian · Mohammad Rashed · Nassir Navab · Tolga Birdal
|
||
Beyond Blur: A Fluid Perspective on Generative Diffusion Models
Grzegorz Gruszczynski · Jakub Meixner · Michał Włodarczyk · Przemyslaw Musialski
|
||
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen · Haotian Zhai · Can Zhang · XIUPENG SHI · Ruirui Li
|
||
LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models
Mert Sonmezer · Matthew Zheng · Pinar Yanardag
|
||
Laboring on less labors: RPCA Paradigm for Pan-sharpening
honghui xu · Chuangjie Fang · Yibin Wang · Jie Wu · Jianwei Zheng
|
||
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei · Yijun Yang · Junliang Xing · Yuanchun Shi · Zongqing Lu · Deheng Ye
|
||
SpikeDiff: Zero-shot High-Quality Video Reconstruction from Sub-millisecond Chromatic Spike Streams
Siqi Yang · Jinxiu Liang · Zhaojun Huang · Yeliduosi Xiaokaiti · Yakun Chang · Zhaofei Yu · Boxin Shi
|
||
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang · Yawar Siddiqui · Armen Avetisyan · Chris Xie · Jakob Engel · Henry Howard-Jenkins
|
||
DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou · Jiahui Lei · Chen Wang · Lingjie Liu · Kostas Daniilidis
|
||
Long-Tailed Classification with Multi-Granularity Semantics
Yuting Liu · Liu Yang · Yu Wang
|
||
Global-Aware Monocular Semantic Scene Completion with State Space Models
Shijie Li · Zhongyao Cheng · Rong Li · Shuai Li · Juergen Gall · Xun Xu · Xulei Yang
|
||
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
WANG Yun · Longguang Wang · Chenghao Zhang · Yongjian Zhang · Zhanjie Zhang · Ao Ma · Chenyou Fan · Tin Lun Lam · Junjie Hu
|
||
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-Based Class-Incremental Learning
yan wang · Da-Wei Zhou · Han-Jia Ye
|
||
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao · Wang Lu · Jie Ji · Ruimeng Ye · Gen Li · Xiaolong Ma · Bo Hui
|
||
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao · Haobo Lu · Xiaosen Wang · Kun He
|
||
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy · Duy Anh Huynh · Yutong Xie · Yuankai Qi · Qi Chen · Phi Le Nguyen · Sen Tran · Son Lam Phung · Anton Hengel · Zhibin Liao · Minh-Son To · Johan Verjans · Vu Phan
|
||
Communication-Efficient Multi-Vehicle Collaborative Semantic Segmentation via Sparse 3D Gaussian Sharing
Tianyu Hong · Xiaobo Zhou · Wenkai Hu · Qi Xie · Zhihui Ke · Tie Qiu
|
||
VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching
Xihua Wang · Xin Cheng · Yuyue Wang · Ruihua Song · Yunfeng Wang
|
||
``Principal Components" Enable A New Language of Images
Xin Wen · Bingchen Zhao · Ismail Elezi · Jiankang Deng · Xiaojuan Qi
|
||
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi · Sanghyeok Lee · Byungoh Ko · Eunseo Kim · Jihyung Kil · Hyunwoo Kim
|
||
EgoMLVM: An Egocentric Multitask Large Video Model
Gen Li · Yutong Chen · Yiqian Wu · KAIFENG ZHAO · Marc Pollefeys · Siyu Tang
|
||
Agent-free Breast Cancer Diagnosis and Prognosis via Latent Diffusion Enhancement
Yuhan Wang · Luyang Luo · Yuyin Zhou
|
||
VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data
Jian Shi · Peter Wonka
|
||
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang · zehai he · Wenyi Hong · Yean Cheng · Xiaohan Zhang · Ji Qi · Ming Ding · Xiaotao Gu · Shiyu Huang · Bin Xu · Yuxiao Dong · Jie Tang
|
||
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong · Meng Lan · Qian Zhang · Lefei Zhang
|
||
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So · Juncheol Shin · Hyunho Kook · Eunhyeok Park
|
||
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
Hayeon Kim · Ji Jang Jang · Se Young Chun
|
||
Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition
Haochen Chang · Pengfei Ren · Haoyang Zhang · Liang Xie · Hongbo Chen · Erwei Yin
|
||
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Taekyung Ki · Dongchan Min · Gyeongsu Chae
|
||
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
Xinli Xu · Wenhang Ge · Dicong Qiu · ZhiFei Chen · Dongyu Yan · Zhuoyun LIU · Haoyu Zhao · hanfeng Zhao · Shunsi Zhang · Junwei Liang · Ying-Cong Chen
|
||
Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation
Fengchen He · Dayang Zhao · Hao Xu · Tingwei Quan · Shaoqun zeng
|
||
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton · Ji Woo Hong · Chang Yoo
|
||
Aligning Global Semantics and Local Textures in Generative Video Enhancement
Zhikai Chen · Fuchen Long · Zhaofan Qiu · Ting Yao · Wengang Zhou · Jiebo Luo · Tao Mei
|
||
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng · Jia-Feng Cai · Xiao-Ming Wu · Yilin Wei · Yu-Ming Tang · Wei-Shi Zheng · Ancong Wu
|
||
Completing 3D Partial Assemblies with View-Consistent 2D-3D Correspondence
Weihao Wang · Yu Lan · Mingyu You · Bin He
|
||
Leaps and Bounds: An Improved Point Cloud Winding Number Formulation for Fast Normal Estimation and Surface Reconstruction
Chamin Hewa Koneputugodage · Dylan Campbell · Stephen Gould
|
||
Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions
Nicolai Hermann · Jorge Condor · Piotr Didyk
|
||
Text-to-Any-Skeleton Motion Generation Without Retargeting
Qingyuan Liu · Ke Lv · Kun Dong · Jian Xue · Zehai Niu · Jinbao Wang
|
||
GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination
Chengwei Ren · Fan Zhang · Liangchao Xu · Liang Pan · Ziwei Liu · Wenping Wang · Xiao-Ping Zhang · Yuan Liu
|
||
STEP-DETR: Advancing DETR-based Semi-Supervised Object Detection with Super Teacher and Pseudo-Label Guided Text Queries
Tahira Shehzadi · Khurram Azeem Hashmi · Shalini Sarode · Didier Stricker · Muhammad Zeshan Afzal
|
||
LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
Zijie Wang · Weiming Zhang · Wei Zhang · Xiao Tan · hongxing liu · Yaowei Wang · Guanbin Li
|
||
DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection
Aashish Sharma
|
||
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng · Jingyi Wang · Wenhao Jiang · Zhong Ming
|
||
StepGRPO: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang · Jiaxing Huang · Huanjin Yao · Shunyu Liu · Xikun ZHANG · Shijian Lu · Dacheng Tao
|
||
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu · Xiangjun Gao · Wenbo Hu · Xiaoyu Li · Song-Hai Zhang · Ying Shan
|
||
Explaining Human Preferences via Metrics for Structured Reconstruction
Jack Langerman · Denis Rozumny · Yuzhong Huang · Dmytro Mishkin
|
||
KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding
Ran Ran · Jiwei Wei · Shiyuan He · Zeyu Ma · Chaoning Zhang · Ning Xie · Yang Yang
|
||
Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
Ibtihel Amara · Ahmed Imtiaz Humayun · Ivana Kajic · Zarana Parekh · Natalie Harris · Sarah Young · Chirag Nagpal · Najoung Kim · Junfeng He · Cristina Vasconcelos · Deepak Ramachandran · Golnoosh Farnadi · Katherine Heller · Mohammad Havaei · Negar Rostamzadeh
|
||
Unlearning the Noisy Correspondence Makes CLIP More Robust
Haochen Han · Alex Jinpeng Wang · Peijun Ye · Fangming Liu
|
||
Scaling Laws for Native Multimodal Models
Mustafa Shukor · Enrico Fini · Victor Guilherme Turrisi da Costa · Matthieu Cord · Joshua Susskind · Alaaeldin El-Nouby
|
||
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang · Changxu Cheng · Lingfeng Wang · Senda Chen · Wuyue Zhao
|
||
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation
Shuchang Ye · Usman Naseem · Mingyuan Meng · jinman kim
|
||
Dissecting CLIP: Decomposition with a Schur Complement-based Approach
Azim Ospanov · Mohammad Jalali · Farzan Farnia
|
||
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang · Yutong Liu · Yangguang Li · Renrui Zhang · Yufei Liu · Kai Wang · Wanli Ouyang · Zhiwei Xiong · Peng Gao · Qibin Hou · Ming-Ming Cheng
|
||
EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation
Zengyu Wan · Wei Zhai · Yang Cao · Zheng-Jun Zha
|
||
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen · Ziyu Jiang · Mingfu Liang · Bingbing Zhuang · Jong-Chyi Su · Sparsh Garg · Ying Wu · Manmohan Chandraker
|
||
Task-Decoupled Bézier Surface Constraint for Uneven Low-Light Image Enhancement
Xingxiang Zhou · Xiangdong Su · Haoran Zhang · Wei Chen · Guanglai Gao
|
||
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Xiao Li · Qi Chen · Xiulian Peng · Kai Yu · Xie Chen · Yan Lu
|
||
3D Test-time Adaptation via Graph Spectral Driven Point Shift
Xin Wei · Qin Yang · Yijie Fang · Mingrui Zhu · Nannan Wang
|
||
MAESTRO: Task-Relevant Optimization via Adaptive Feature Enhancement and Suppression for Multi-task 3D Perception
ChangWon Kang · Jisong Kim · Hongjae Shin · Junseo Park · Jun Won Choi
|
||
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim · Jongwook Choi · Yonghyun Jeong · Haeun Noh · Jaejun Yoo · Seungryul Baek · Jongwon Choi
|
||
Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation
Tao Lei · Ziyao Yang · Xingwu wang · Yi Wang · Xuan Wang · FeimanSun FeimanSun · Asoke Nandi
|
||
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
Zerui Tao · Yuhta Takida · Naoki Murata · Qibin Zhao · Yuki Mitsufuji
|
||
Wave-MambaAD: Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection
Qiao Zhang · Mingwen Shao · Xinyuan Chen · Xiang Lv · Kai Xu
|
||
Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu · Yuxiang Lu · Yapan Guo · Wei Ji · Suizhi Huang · Fengyu Yang · Shalayiding Sirejiding · Qichen He · Jing Tong · Yanbiao Ji · Yue Ding · Hongtao Lu
|
||
Robust Low-light Scene Restoration via Illumination Transition
Ze Li · Feng Zhang · Xiatian Zhu · Zhang Meng · Yanghong Zhou · P.Y. Mok
|
||
Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding
Xiaojie Zhang · Yuanfei Wang · Ruihai Wu · Kunqi Xu · Yu Li · Liuyu Xiang · Hao Dong · Zhaofeng He
|
||
Keep Your Friends Close, and Your Enemies Farther: Distance-aware Voxel-wise Contrastive Learning for Semi-supervised Multi-organ Segmentation
Haochen Zhao · Jianwei Niu · Xuefeng Liu · Xiaozheng Xie · Li Kuang · Haotian Yang · Bin Dai · Hui Meng · Yong Wang
|
||
FlowDPS : Flow-Driven Posterior Sampling for Inverse Problems
Jeongsol Kim · Bryan Sangwoo Kim · Jong Ye
|
||
CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Lin Sun · Jiale Cao · Jin Xie · Xiaoheng Jiang · Yanwei Pang
|
||
Failure Cases Are Better Learned But Boundary Says Sorry: Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training
Yanyun Wang · Li Liu
|
||
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu · Yang Wei · Junhao Xiao · Wendong Huang · Xiuli Bi · Bin Xiao
|
||
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang · Chengqi Duan · Kun Wang · Hao Li · Linjiang Huang · Hao Tian · Xingyu Zeng · Rui Zhao · Jifeng Dai · Hongsheng Li · Xihui Liu
|
||
EVDM: Event-based Real-world Video Deblurring with Mamba
Zhijing Sun · Senyan Xu · Kean Liu · Runze Tian · Xueyang Fu · Zheng-Jun Zha
|
||
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju · Shaofei Huang · Si Liu · Zhedong Zheng
|
||
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün · Bernt Schiele · Jonas Fischer
|
||
Exploiting Diffusion Prior for Task-driven Image Restoration
Jaeha Kim · Junghun Oh · Kyoung Mu Lee
|
||
GENMO: A GENeralist Model for Human MOtion
Jiefeng Li · Jinkun Cao · Haotian Zhang · Davis Rempe · Jan Kautz · Umar Iqbal · Ye Yuan
|
||
Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Yifan Liu · Shengjun Zhang · Chensheng Dai · Yang Chen · Hao Liu · Chen Li · Yueqi Duan
|
||
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang · Xinheng Lyu · Meidan Ding · Wenting Chen · Xiaohan Xing · Jipeng Zhang · Sen Yang · Xiangjian He · Song Wu · Xiyue Wang · Linlin Shen
|
||
Gait-X: Exploring X modality for Generalized Gait Recognition
Zengbin Wang · Saihui Hou · Junjie Li · Xu Liu · Chunshui Cao · Yongzhen Huang · Siye Wang · Man Zhang
|
||
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang · Fei Xie · Weidong Cai · Chao Ma
|
||
6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting
Yufeng Jin · Vignesh Prasad · Snehal Jauhri · Mathias Franzius · Georgia Chalvatzaki
|
||
Capturing head avatar with hand contacts from a monocular video
Haonan He · Yufeng Zheng · Jie Song
|
||
Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond
Xin Qiao · Matteo Poggi · Xing Wei · Pengchao Deng · Yanhui Zhou · Stefano Mattoccia
|
||
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang · Dong Gong · Sen Wang · Zi Huang · Yadan Luo
|
||
MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation
Fu-Zhao Ou · Chongyi Li · Shiqi Wang · Sam Kwong
|
||
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang · Yiqin Wang · Yansong Tang · Yong Liu · Jiashi Feng · Xiaojie Jin
|
||
ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples
Shijie Huang · Yiren Song · Yuxuan Zhang · Hailong Guo · Xueyin Wang · Jiaming Liu
|
||
Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
Mengkun She · Felix Seegräber · David Nakath · Patricia Schöntag · Kevin Köser
|
||
Incremental 3D Gaussian Localization for Image-goal Navigation
Wenxuan Guo · Xiuwei Xu · Hang Yin · Ziwei Wang · Jianjiang Feng · Jie Zhou · Jiwen Lu
|
||
ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
Ying Guo · Xi Liu · Cheng Zhen · Pengfei Yan · Xiaoming Wei
|
||
ArtFlow: Bridging Artworks Through Time With Flow
Pingchuan Ma · Ming Gui · Johannes Schusterbauer · Xiaopei Yang · Olga Grebenkova · Vincent Tao Hu · Björn Ommer
|
||
The Source Image is the Best Attention for Infrared and Visible Image Fusion
Song Wang · Xie Han · Liqun Kuang · Boying Wang · Zhongyu Chen · Zherui Qiao · Fan Yang · Xiaoxia Liu · Bingyu Zhang · Zhixun Wang
|
||
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani · Sachit Menon · Ahmet Iscen · Shyamal Buch · Nilpa Jha · Ramin Mehran · Anja Hauth · Mikhail Sirotenko · Yukun Zhu · Carl Vondrick · Cordelia Schmid · Tobias Weyand
|
||
Towards Foundational Models for Single-Chip Radar
Tianshu Huang · Akarsh Prabhakara · Chuhan Chen · Jay Karhade · Deva Ramanan · Matthew O'Toole · Anthony Rowe
|
||
On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
Amir Mehrpanah · Matteo Gamba · Kevin Smith · Hossein Azizpour
|
||
FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift
yong zhang · Feng Liang · Guanghu Yuan · Min Yang · Chengming Li · Xiping Hu
|
||
Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang · Zhen Zhang · Yuhang Liu · Javen Qinfeng Shi
|
||
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao · Yang Wei · Jingyu Wang · Yongchao Wang · Xiuli Bi · Bin Xiao
|
||
S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
Guangting Zheng · Jiajun Deng · Xiaomeng Chu · Yu Yuan · Houqiang Li · Yanyong Zhang
|
||
Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID
Zechao Hu · Zhengwei Yang · Hao Li · Yixiong Zou · Zheng Wang
|
||
Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding
Jingming He · Chongyi Li · Shiqi Wang · Sam Kwong
|
||
Towards Human-like Virtual Beings: Simulating Human Behavior in 3D Scenes
CHEN LIANG · Wenguan Wang · Yi Yang
|
||
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin · Siyue Yu · Bingfeng Zhang · Mingjie Sun · Yi Dong · Jimin XIAO
|
||
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Ziyue Wang · Yurui Dong · Fuwen Luo · Minyuan Ruan · Zhili Cheng · Chi Chen · Peng Li · Yang Liu
|
||
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
Wenhan Wu · Zhishuai Guo · Chen Chen · Hongfei Xue · Aidong Lu
|
||
EAMamba - Efficient All-Around Vision State Space Model for Image Restoration
Yu-Cheng Lin · Yu-Syuan Xu · Hao-Wei Chen · Hsien-Kai Kuo · Chun-Yi Lee
|
||
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge · Shree Nayar
|
||
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli · Edoardo Zorzi · Gianni Franchi · Alberto Castellini · Alessandro Farinelli · Marco Cristani · Yiming Wang
|
||
Preacher: Paper-to-Video Agentic System
Jingwei Liu · Ling Yang · Hao Luo · Fan Wang · Hongyan Li · Mengdi Wang
|
||
UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
Yuhao Wang · Wei Xi
|
||
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen · Xinyu Luo · Tiexin Qin · Jie Liu · Hui Liu · Victor Ho Fun Lee · Hong Yan · Haoliang Li
|
||
Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation
Zheng Gao · Jifei Song · Zhensong Zhang · Jiankang Deng · Ioannis Patras
|
||
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan · Shurong Zheng · Yousong Zhu · Hongyin Zhao · Fan Yang · Ming Tang · Jinqiao Wang
|
||
StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding
Shengrong Yuan · Runmin Wang · Ke Hao · Xu-Qi Ma · Changxin Gao · Li Liu · Nong Sang
|
||
SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning
Zhi Chen · Zecheng Zhao · Jingcai Guo · Jingjing Li · Zi Huang
|
||
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Yiming Wu · Huan Wang · Zhenghao Chen · Jianxin Pang · Dong Xu
|
||
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li · Zihao Huang · Yan Zhang · Yunhang Shen · Ke Li · Xiawu Zheng · Liujuan Cao · Rongrong Ji
|
||
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
Bingchen Gong · Diego Gomez · Abdullah Hamdi · Abdelrahman Eldesokey · Ahmed Abdelreheem · Peter Wonka · Maks Ovsjanikov
|
||
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
Yuedong Tan · Zongwei Wu · Yuqian Fu · Zhuyun Zhou · Guolei Sun · Eduard Zamfir · Chao Ma · Danda Pani Paudel · Luc Gool · Radu Timofte
|
||
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution
Sejin Park · Sangmin Lee · Kyong Hwan Jin · Seung-Won Jung
|
||
GIViC: Generative Implicit Video Compression
Ge Gao · Siyue Teng · Tianhao Peng · Fan Zhang · David Bull
|
||
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai · Pengfei Zhou · xu Pan · Ming Li · Fanrui Zhang · Zizhen Li · Jianwen Sun · Yukang Feng · Baojin Huang · Zhongyuan Wang · Kaipeng Zhang
|
||
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
Yiren Song · Danze Chen · Mike Zheng Shou
|
||
Incremental Few-Shot Semantic Segmentation via Multi-Level Switchable Visual Prompts
Maoxian Wan · Kaige Li · Qichuan Geng · Weimin Shi · Zhong Zhou
|
||
Factorized Learning for Temporally Grounded Video Language Models
Wenzheng Zeng · Difei Gao · Mike Zheng Shou · Hwee Tou Ng
|
||
Bias in Gender Bias Benchmarks: How Confounding Features Distort Evaluation
Yusuke Hirota · Ryo Hachiuma · Boyi Li · Ximing Lu · Michael Boone · Boris Ivanovic · Yejin Choi · Marco Pavone · Yu-Chiang Frank Wang · Noa Garcia · Yuta Nakashima · Chao-Han Yang
|
||
CarGait: Cross-Attention based Re-ranking for Gait recognition
Gavriel Habib · Noa Barzilay · Or Shimshi · Rami Ben-Ari · Nir Darshan
|
||
Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning
Yongwei Jiang · Yixiong Zou · Yuhua Li · Ruixuan Li
|
||
Deeply Supervised Flow-Based Generative Models
Inkyu Shin · Chenglin Yang · Liang-Chieh (Jay) Chen
|
||
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu · Zhu LIAO · Nour Hezbri · Fabio Pizzati · Enzo Tartaglione
|
||
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou · Alexander Vilesov · Xuehai He · Ziyu Wan · Shuwang Zhang · Aditya Nagachandra · Di Chang · Dongdong Chen · Xin Wang · Achuta Kadambi
|
||
Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening
Zihan Cao · Yu Zhong · Liang-Jian Deng
|
||
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model
Junjia Huang · Pengxiang Yan · Jinhang Cai · Jiyang Liu · Zhao Wang · Yitong Wang · Xinglong Wu · Guanbin Li
|
||
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury · Hanan Gani · Nishit Anand · Sayan Nag · Ruohan Gao · Mohamed Elhoseiny · Salman Khan · Dinesh Manocha
|
||
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu · Songhua Liu · Zigeng Chen · Jingwen Ye · Xinchao Wang
|
||
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement
Tewodros W. Ayalew · Xiao Zhang · Kevin Y Wu · Tianchong Jiang · Michael Maire · Matthew Walter
|
||
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
Sagnik Majumder · Tushar Nagarajan · Ziad Al-Halah · Kristen Grauman
|
||
Occupancy Learning with Spatiotemporal Memory
Ziyang Leng · Jiawei Yang · Wenlong Yi · Bolei Zhou
|
||
Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D
Jiesi Hu · Hanyang Peng · Yanwu Yang · Xutao Guo · Yang Shang · Pengcheng Shi · Chenfei Ye · Ting Ma
|
||
Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba for End-to-end Whole Slide Image Analysis
Zhongwei Qiu · Hanqing Chao · Tiancheng Lin · Wanxing Chang · Zijiang Yang · Wenpei Jiao · Yixuan Shen · Yunshuo Zhang · Yelin Yang · Wenbin Liu · Hui Jiang · Yun Bian · Ke Yan · Dakai Jin · Le Lu
|
||
R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception
Jonas Mirlach · Lei Wan · Andreas Wiedholz · Hannan Keen · Andreas Eich
|
||
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo · Saygin Seyfioglu · Rustin Soraki · Wisdom Ikezogwo · Beibin Li · Tejoram Vivekanandan · Joann Elmore · Ranjay Krishna · Linda Shapiro
|
||
When Confidence Fails: Revisiting Pseudo-Label Selection in Semi-supervised Semantic Segmentation
Pan Liu · Jinshi Liu
|
||
Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning
Wufei Xie · Yalin Wang · Chenliang Liu · Zhaohui Jiang · Xue Yang
|
||
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
Yuhui zeng · Haoxiang Wu · Wenjie Nie · Xiawu Zheng · Guangyao Chen · Yunhang Shen · Jun Peng · Yonghong Tian · Rongrong Ji
|
||
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Yudong Jin · Sida Peng · Xuan Wang · Tao Xie · Zhen Xu · Yifan Yang · Yujun Shen · Hujun Bao · Xiaowei Zhou
|
||
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Thuy-Duong Tran · Trung-Kien Tran · Manfred Hauswirth · Danh Le-Phuoc
|
||
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
Yuran Dong · Mang Ye
|
||
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
Cui Miao · Tao Chang · meihan wu · Hongbin Xu · Chun Li · Ming Li · Xiaodong Wang
|
||
Cycle-Consistent Learning for Joint Layout-to-Image Generation and Object Detection
Xinhao Cai · Qiuxia Lai · Gensheng Pei · Xiangbo Shu · Yazhou Yao · Wenguan Wang
|
||
Hierarchical 3D Scene Graphs Construction Outdoors
Jon Nyffeler · Federico Tombari · Daniel Barath
|
||
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Ke Fan · Shunlin Lu · Minyue Dai · Runyi Yu · Lixing Xiao · Zhiyang Dou · Junting Dong · Lizhuang Ma · Jingbo Wang
|
||
Neural Architecture Search Driven by Locally Guided Diffusion for Personalized Federated Learning
PENG LIAO · Xilu Wang · Yaochu Jin · WenLi Du · Han Hu
|
||
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Zhensheng Yuan · Haozhi Huang · Zhen Xiong · Di Wang · Guanghua Yang
|
||
Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians
Quankai Gao · Iliyan Georgiev · Tuanfeng Wang · Krishna Kumar Singh · Ulrich Neumann · Jae Shin Yoon
|
||
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Yating Wang · Haoyi Zhu · Mingyu Liu · Jiange Yang · Hao-Shu Fang · Tong He
|
||
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li · Xin Li · Tianqin Li · Wenbin He · Yu Kong · Liu Ren
|
||
MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization
Yiwen Chen · Yikai Wang · Yihao Luo · Zhengyi Wang · Zilong Chen · Jun Zhu · Chi Zhang · Guosheng Lin
|
||
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding · Hao Wu · Yifan Yang · Shiqi Jiang · Qianxi Zhang · Donglin Bai · Zhibo Chen · Ting Cao
|
||
VideoAuteur: Towards Long Narrative Video Generation - A case study in How-to-Cook Videos
Junfei Xiao · Feng Cheng · Lu Qi · Liangke Gui · Yang Zhao · Shanchuan Lin · Jiepeng Cen · Zhibei Ma · Alan Yuille · Lu Jiang
|
||
Learning Visual Proxy for Compositional Zero-Shot Learning
Shiyu Zhang · Cheng Yan · Yang Liu · Chenchen Jing · Lei Zhou · Wenjun Wang
|
||
Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Baoyou Chen · Ce Liu · Weihao Yuan · Zilong Dong · Siyu Zhu
|
||
DLFR-Gen: Diffusion-based Video Generation with Dynamic Latent Frame Rate
Zhihang Yuan · Rui Xie · Yuzhang Shang · Hanling Zhang · Siyuan Wang · Shengen Yan · Guohao Dai · Yu Wang
|
||
$\chi$: Symmetry Understanding of 3D Shapes via Chirality Disentanglement
Weikang Wang · Tobias Weißberg · Nafie El Amrani · Florian Bernard
|
||
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu · Jingwei Sun · Yueqian Lin · Jingyang Zhang · Ming Yin · Qinsi Wang · Jianyi Zhang · Hai Li · Yiran Chen
|
||
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
Jieming Bian · Lei Wang · Letian Zhang · Jie Xu
|
||
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing · Weixun Luo · Ye Mao · Krystian Mikolajczyk
|
||
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
Gwanghyun Kim · Xueting Li · Ye Yuan · Koki Nagano · Tianye Li · Jan Kautz · Se Young Chun · Umar Iqbal
|
||
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang · Jie Cao · Junxian Duan · Ran He
|
||
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Dongming Wu · Yanping Fu · Saike Huang · Yingfei Liu · Fan Jia · Nian Liu · Feng Dai · Tiancai Wang · Rao Anwer · Fahad Khan · Jianbing Shen
|
||
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang · Zhihao Wang · Xi Zhang · Changsheng Xu
|
||
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
Phillip Mueller · Talip Ünlü · Sebastian Schmidt · Marcel Kollovieh · Jiajie Fan · Stephan Günnemann · Lars Mikelsons
|
||
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng · Ruiqi suo · Chenhao Lin · Zhengyu Zhao · Le Yang · Shuai Liu · Minghui Yang · Cong Wang · Chao Shen
|
||
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang · Kerui Gu · Ha Linh Nguyen · Tze Ho Elden Tse · Angela Yao
|
||
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yuqing Wang · Zhijie Lin · Yao Teng · Yuanzhi Zhu · Shuhuai Ren · Jiashi Feng · Xihui Liu
|
||
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions
Youliang Zhang · Ronghui Li · Yachao Zhang · Liang Pan · Jingbo Wang · Yebin Liu · Xiu Li
|
||
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie · Chen Du · Ping Song · Chang Liu
|
||
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo · Fabio Pizzati · Tong Zhang · Alexander Pondaven · Philip Torr · Juan Perez · Bernard Ghanem
|
||
Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation
Tanay Agrawal · Abid Ali · Antitza Dantcheva · Francois Bremond
|
||
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Soonwoo Cha · Jiwoo Song · Juan Yeo · Hyunbin Jin · Taesup Kim
|
||
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han · Wanghan Xu · Junchao Gong · Xiaoyu Yue · Song Guo · Luping Zhou · LEI BAI
|
||
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha · Songyi Jiang · Bin Li · Bikang Pan · Yihang Zhu · Junjie Zhang · Xiatian Zhu · Shaogang Gong · Jingya Wang
|
||
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li · Feiran Li · Daisuke Iso
|
||
Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu · Lu Qi · Wanyu Lin · Jian Chen · Hailong Sun · chengbin sun
|
||
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu · Bing Li · Cheng Zheng · Jinjie Mai · Jun Chen · Letian Jiang · Abdullah Hamdi · Sara Rojas Martinez · Chia-Wen Lin · Mohamed Elhoseiny · Bernard Ghanem
|
||
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
Mingqi Yuan · Bo Li · Xin Jin · Wenjun Zeng
|
||
Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes
Tom Fischer · Xiaojie Zhang · Eddy Ilg
|
||
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang · Hang Zhang · Xin Li · Jiashuo Sun · Yongliang Shen · Weiming Lu · Deli Zhao · Yueting Zhuang · Lidong Bing
|
||
ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement
Habin Lim · Youngseob Won · Juwon Seo · Gyeong-Moon Park
|
||
Dataset Distillation via Vision-Language Category Prototype
YAWEN ZOU · Guang Li · Duo Su · Zi Wang · Jun YU · Chao Zhang
|
||
Elucidating Vision Feature Spaces for Multimodal Neural Decoding
Weihao Xia · Cengiz Oztireli
|
||
DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
Runqi Wang · Yang Chen · Sijie Xu · Tianyao He · Wei Zhu · Dejia Song · Nemo Chen · Xu Tang · Yao Hu
|
||
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal · Reza Shirkavand · Heng Huang · Gowthami Somepalli · Tom Goldstein
|
||
SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection
Chaesong Park · Eunbin Seo · JihyeonHwang JihyeonHwang · Jongwoo Lim
|
||
FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Seung-Wook Kim · Seongyeol Kim · Jiah Kim · Seowon Ji · Se-Ho Lee
|
||
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang · Luying Huang · Kaisiyuan Wang · Jiazhi Guan · Shengyi He · Fengguo Li · Hang Zhou · Lingyun Yu · Yingying Li · Haocheng Feng · Hongtao Xie
|
||
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
Tianqi Liu · Zihao Huang · Zhaoxi Chen · Guangcong Wang · Shoukang Hu · Liao Shen · Huiqiang Sun · Zhiguo Cao · Wei Li · Ziwei Liu
|
||
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang · Yiqi Song · Cihang Xie · Yang Liu · Zilong Zheng
|
||
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang · Yan Xia · Shulei Wang · Hanting Wang · Minghui Fang · Shengpeng Ji · Sashuai Zhou · Tao Jin · Zhou Zhao
|
||
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
Tianyu Fu · Tengxuan Liu · Qinghao Han · Guohao Dai · Shengen Yan · Huazhong Yang · Xuefei Ning · Yu Wang
|
||
When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection
Bo-Lun Huang · Tzu-Hsiang Ni · Feng-Kai Huang · Hong-Han Shuai · Wen-Huang Cheng
|
||
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju · Weicai Ye · Quande Liu · Qiulin Wang · Xintao Wang · Pengfei Wan · Di ZHANG · Kun Gai · Qiang Xu
|
||
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow · Evelien Riddell · Yimu Wang · Sean Sedwards · Krzysztof Czarnecki
|
||
Dataset Distillation as Data Compression: A Rate-Utility Perspective
Youneng Bao · Yiping Liu · Zhuo Chen · Yongsheng Liang · Mu Li · Kede Ma
|
||
StyleKeeper: Prevent content leakage via a negative visual query guidance
Jaeseok Jeong · Junho Kim · Youngjung Uh · Gayoung Lee · Yunjey Choi
|
||
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee · Byung-Kwan Lee · Jianshu Zhang · Yechan Hwang · Byungsoo Ko · Han-Gyu Kim · Dongyu Yao · Xuankun Rong · Eojin Joo · Seung-Ho Han · Bowon Ko · Ho-Jin Choi
|
||
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue · Vasu Singla · Menglin Jia · John Kirchenbauer · Rifaa Qadri · Zikui Cai · Abhinav Bhatele · Furong Huang · Tom Goldstein
|
||
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
Aritra Bhowmik · Mohammad Mahdi Derakhshani · Dennis Koelma · Yuki Asano · Martin Oswald · Cees Snoek
|
||
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas · Edward Fish · Richard Bowden
|
||
ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors
Minsu Kim · Subin Jeon · In Cho · Mijin Yoo · Seon Joo Kim
|
||
SP$^2$T: Sparse Proxy Attention for Dual-stream Point Transformer
Jiaxu Wan · Hong Zhang · Ziqi He · Yangyan Deng · Qishu Wang · Ding Yuan · Yifan Yang
|
||
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu · Xudong Xie · Yuliang Liu · Xiang Bai
|
||
RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
Chengyu Zheng · Honghua Chen · Jin Huang · Mingqiang Wei
|
||
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
Hailong Guo · Bohan Zeng · Yiren Song · Wentao Zhang · Jiaming Liu · Chuang Zhang
|
||
MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency
Xingbo YAO · xuanmin Wang · Hao WU · Chengliang PING · ZHANG Doudou · Hui Xiong
|
||
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li · Xin Gu · Fan Chen · Xiaoying Xing · Longyin Wen · Chen Chen · Sijie Zhu
|
||
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts
Yanguang Sun · Jiawei Lian · jian Yang · lei luo
|
||
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang · Zeyu Zhang · Dong Wang · Di Gai · Xin Xiong · Jiyang Xu · Ruihua Zhou
|
||
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen · Xin Yan · Yihang Chen · Siyuan Cen · Zixin Wang · Qinwei Ma · Haoyu Zhen · Kaizhi Qian · Lie Lu · Chuang Gan
|
||
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu · Zijie Xin · Yuhan Fu · Ruixiang Zhao · Bangxiang Lan · Xirong Li
|
||
Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels
Yujia Tong · Yuze Wang · Jingling Yuan · Chuang Hu
|
||
STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene
Hanyu Zhou · Haonan Wang · Haoyue Liu · Yuxing Duan · Luxin Yan · Gim Hee Lee
|
||
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
YINGQI TANG · Zhuoran Xu · Zhaotie Meng · Erkang Cheng
|
||
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra · Long Mai · David Bourgin · Yitian Zhang · Feng Liu
|
||
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi · yibing zhang · Xinyu Xiang · Qinglong Yan · Han Xu · Jiayi Ma
|
||
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
hahyeon choi · Junhoo Lee · Nojun Kwak
|
||
Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability
Boyong He · Yuxiang Ji · Zhuoyue Tan · Liaoni Wu
|
||
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu · Longxiang Tang · Bohao PENG · Senqiao Yang · Bei Yu · Jiaya Jia
|
||
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou · Jiayu Tang · Xiaoyan Xiao · Yueyao Lin · Linkai Liu · Zipeng Guo · Hao Fei · Xiaobo Xia · Chao Gou
|
||
Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation
Seogkyu Jeon · Kibeom Hong · Hyeran Byun
|
||
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
Xinyu Sun · Zhikun Zhao · congyan lang · Bing Li · Juan Wang
|
||
Edicho: Consistent Image Editing in the Wild
Qingyan Bai · Hao Ouyang · Yinghao Xu · Qiuyu Wang · Ceyuan Yang · Ka Leong Cheng · Yujun Shen · Qifeng Chen
|
||
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang · Hsiu-Hui Hsiao
|
||
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam · Jinwoo Ahn · Keummin Ka · Jiwan Chung · Youngjae Yu
|
||
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang · Quentin Lohmeyer · Mirko Meboldt · Siyu Tang
|
||
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
shanlin sun · Yifan Wang · Hanwen Zhang · Yifeng Xiong · Qin Ren · Ruogu Fang · Xiaohui Xie · Chenyu You
|
||
FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li · Xiaoyu Ren · Hongjiu Yu · Ying Chen · Kai Li · L Wang · Xiongkuo Min · Huiyu Duan · Guangtao Zhai · Xu Liu
|
||
PEFTDiff: Diffusion-Guided Transferability Estimation for Parameter-Efficient Fine-Tuning
PRAFFUL KHOBA · Zijian Wang · Chetan Arora · Mahsa Baktashmotlagh
|
||
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma · Yiqing Li · Jiawei Wu · Xing Luo · Zhi Jin
|
||
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu · mingxuzhu mingxuzhu · Zheyuan Zhang · Linna Song · xiao zhao · Luo Qingliang · Qi Wang · Chufan Guo · Kuifeng Su
|
||
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu · Xiao Tang · Zhihao Li · Yingfan He · Chongjie Ye · Jianzhuang Liu · Binxiao Huang · Shunbo Zhou · Xiaofei Wu
|
||
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick · Effrosyni Mavroudi · Yale Song · Rama Chellappa · Lorenzo Torresani · Triantafyllos Afouras
|
||
LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association
Peng Wang · Yongcai Wang · Hualong Cao · Wang Chen · Deying Li
|
||
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li · Hyunse Yoon · Sanghoon Lee · Weisi Lin
|
||
MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost
Taiga Yamane · Ryo Masumura · Satoshi Suzuki · Shota Orihashi
|
||
Free$^2$Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim · Bryan Sangwoo Kim · Jong Ye
|
||
Feature Extraction and Representation of Pre-training Point Cloud Based on Diffusion Models
Chang Qiu · Feipeng Da · Zilei Zhang
|
||
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin · Junhao Xu · Weifeng Ge
|
||
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Chunwei Wang · Guansong Lu · Junwei Yang · Runhui Huang · Jianhua Han · Lu Hou · Wei Zhang · Hang Xu
|
||
Decoupled Diffusion Sparks Adaptive Scene Generation
Yunsong Zhou · Naisheng Ye · William Ljungbergh · Tianyu Li · Jiazhi Yang · Zetong Yang · Hongzi Zhu · Christoffer Petersson · Hongyang Li
|
||
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
Young Kyun Jang · Ser-Nam Lim
|
||
Expressive Talking Human from Single-Image with Imperfect Priors
Jun Xiang · Yudong Guo · Leipeng Hu · Boyang Guo · Yancheng Yuan · Juyong Zhang
|
||
Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation
Zhixiang Chi · Yanan Wu · Li Gu · Huan Liu · Ziqiang Wang · Yang Zhang · Yang Wang · Konstantinos Plataniotis
|
||
Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao LI · Xinhong Chen · Zhengmin JIANG · Qian Zhou · Yung-Hui Li · Jianping Wang
|
||
GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen · Zi-Xin Zou · Chang Liu · Tianjiao Jing · Yan-Pei Cao · Shi-Sheng Huang · Hongbo Fu · Hua Huang
|
||
DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation
Haitao Tian
|
||
Correspondence-Free Fast and Robust Spherical Point Pattern Registration
Anik Sarker · Alan Asbeck
|
||
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao · Xinyu Yi · Lu Yin · Shihui Guo · Jun-Hai Yong · Feng Xu
|
||
Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov · Georgy Perevozchikov · Andrei Korepanov · Nancy Mehta · Mahmoud Afifi · Egor Ershov · Radu Timofte
|
||
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Maria-Paola Forte · Nikos Athanasiou · Giulia Ballardini · Jan Bartels · Katherine Kuchenbecker · Michael Black
|
||
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong · Zhonghua Wu · Qingyi Tao · Qinyue Li · Chen Change Loy
|
||
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
Yuhang Lu · Jiadong Tu · Yuexin Ma · Xinge Zhu
|
||
AdaDCP: Learning an Adapter with Discrete Cosine Prior for Clear-to-Adverse Domain Generalization
Qi Bi · Yixian Shen · Jingjun Yi · Gui-Song Xia
|
||
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
Teng Li · Guangcong Zheng · Rui Jiang · Shuigenzhan Shuigenzhan · Tao Wu · Yehao Lu · Yining Lin · Chuanyun Deng · Yepan Xiong · Min Chen · Lin Cheng · Xi Li
|
||
Verbalized Representation Learning for Interpretable Few-Shot Generalization
Cheng-Fu Yang · Da Yin · Wenbo Hu · Heng Ji · Nanyun Peng · Bolei Zhou · Kai-Wei Chang
|
||
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou · Wenhan Wang · Shikun Li · Xiaolei Qu · Xin Guo · Yizhong Liu · Wenzhong Tang · Xun Lin · Yefeng Zheng
|
||
ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction
ADEELA ISLAM · Stefano Fiorini · Stuart James · Pietro Morerio · ALESSIO DEL BUE
|
||
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen · Zhirui Wang · Taowei Sheng · Yi Jiang · Yundu Li · Peirui Cheng · Luning Zhang · Kaiqiang Chen · Yanfeng Hu · Xue Yang · Xian Sun
|
||
A Unified Framework for Motion Reasoning and Generation in Human Interaction
Jeongeun Park · Sungjoon Choi · Sangdoo Yun
|
||
DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF
Doriand Petit · Steve Bourgeois · Vincent Gay-Bellile · Florian Chabot · Loïc Barthe
|
||
Enhancing Numerical Prediction of MLLMs with Soft Labeling
Pei Wang · Zhaowei Cai · Hao Yang · Davide Modolo · Ashwin Swaminathan
|
||
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue · Huilong Pi · Jiapeng Zhang · Qin Yunchuan · Zhuo Tang · Kenli Li · Ruihui Li
|
||
Improving Large Vision and Language Models by Learning from a Panel of Peers
Jefferson Hernandez · Jing Shi · Simon Jenni · Vicente Ordonez · Kushal Kafle
|
||
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian · Hila Barel · Feiran Li · Masakazu Yoshimura · Daisuke Iso
|
||
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu · Luchuan Song · Junhua Huang · Haiyang Liu · Chenliang Xu
|
||
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers
Junjie Zhang · Haisheng Su · Feixiang Song · Sanping Zhou · Wei Wu · Junchi Yan · Nanning Zheng
|
||
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Shaowei Liu · chuan guo · Bing Zhou · Jian Wang
|
||
AnnofreeOD: Detecting All Classes at Low Frame Rates Without Human Annotations
Boyi Sun · Yuhang Liu · Houxin He · Yonglin Tian · Fei-Yue Wang
|
||
LightSwitch: Multi-view Relighting with Material-guided Diffusion
Yehonathan Litman · Fernando De la Torre · Shubham Tulsiani
|
||
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Bingyi Liu · Jian Teng · Hongfei Xue · Enshu Wang · Chuanhui Zhu · Pu Wang · Libing Wu
|
||
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz · Xinyue Bai · Mahsa Baktashmotlagh · Varun Jampani · Guha Balakrishnan
|
||
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation
Jihun Kim · Hoyong Kwon · Hyeokjun Kweon · Wooseong Jeong · Kuk-Jin Yoon
|
||
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen · Xinze Zhou · Chen Liu · Hao Chen · Wenxuan Li · Zekun Jiang · Ziyan Huang · Yuxuan Zhao · Dexin Yu · Junjun He · Yefeng Zheng · Ling Shao · Alan Yuille · Zongwei Zhou
|
||
CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images
Jungho Lee · DongHyeong Kim · Dogyoon Lee · Suhwan Cho · Minhyeok Lee · Wonjoon Lee · Taeoh Kim · Dongyoon Wee · Sangyoun Lee
|
||
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
YihanCao YihanCao · Jiazhao Zhang · Zhinan Yu · Shuzhen Liu · Zheng Qin · Qin Zou · Bo Du · Kai Xu
|
||
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang · Fagui Liu · Quan Tang
|
||
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
Runpeng Yu · Xinyin Ma · Xinchao Wang
|
||
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying · Hengrui Hu · Henghui Ding
|
||
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang · I-Chao Shen · Takeo Igarashi · Yufeng Wang · ZeSheng Wang · Yi Yang · Wenrui Ding · Shuchang Zhou
|
||
EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment
Yufei Zhu · Yiming Zhong · Zemin Yang · Peishan Cong · Jingyi Yu · Xinge Zhu · Yuexin Ma
|
||
Noise-Modeled Diffusion Models for Low-Light Spike Image Restoration
Ruonan Liu · Lin Zhu · Xijie Xiang · Lizhi Wang · Hua Huang
|
||
Online Dense Point Tracking with Streaming Memory
Qiaole Dong · Yanwei Fu
|
||
TeethGenerator: A two-stage framework for paired pre- and post-orthodontic 3D dental data generation
Changsong Lei · Yaqian Liang · Shaofeng Wang · Jiajia Dai · Yong-Jin Liu
|
||
Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
Weiying Xie · Zihan Meng · Jitao Ma · Wenjin Guo · Haowei Li · Haonan Qin · Leyuan Fang · Yunsong Li
|
||
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
Mengmeng Wang · Haonan Wang · Yulong Li · Xiangjie Kong · Jiaxin Du · Feng Xia · Guojiang Shen
|
||
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali · Syed Ahmed Mahmood · Mubin Saeed · Andrey Konin · Zeeshan Zia · Quoc-Huy Tran
|
||
Aligning Moments in Time using Video Queries
Yogesh Kumar · Uday Agarwal · Manish Gupta · Anand Mishra
|
||
TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation
Yinda Chen · Haoyuan Shi · Xiaoyu Liu · Te Shi · Ruobing Zhang · Dong Liu · Zhiwei Xiong · Feng Wu
|
||
Counting Stacked Objects
Corentin Dumery · Noa Ette · Aoxiang Fan · Ren Li · Jingyi Xu · Hieu Le · Pascal Fua
|
||
EditCLIP: Representation Learning for Image Editing
Qian Wang · Aleksandar Cvejic · Abdelrahman Eldesokey · Peter Wonka
|
||
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi · Mingjia Li · Minjing Dong · Chang Xu
|
||
MDP$^3$: A Training-free Approach for List-wise Frame Selection in Video-LLMs
Hui Sun · Shiyin Lu · Huanyu Wang · Qing-Guo Chen · Zhao Xu · Weihua Luo · Kaifu Zhang · Ming Li
|
||
Shape of Motion: 4D Reconstruction from a Single Video
Qianqian Wang · Vickie Ye · Hang Gao · Weijia Zeng · Jake Austin · Zhengqi Li · Angjoo Kanazawa
|
||
Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application
Ruiyun Yu · Bingyang Guo · Haoyuan Li
|
||
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Coordinates
Wenjie Huang · Qi Yang · Shuting Xia · He Huang · Yiling Xu · Zhu Li
|
||
Guiding Noisy Condition Diffusion Models with Score-based Discriminator Correction
Dat Cong · Hieu Tran · Hoang Thanh-Tung
|
||
RALoc: Enhancing Outdoor LiDAR Localization via Rotation Awareness
Yuyang Yang · Wen Li · Sheng Ao · Qingshan Xu · Shangshu Yu · guo yu · Yin Zhou · Siqi Shen · Cheng Wang
|
||
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev · Thaddäus Wiedemer · Ameya Prabhu · Matthias Bethge · Wieland Brendel · A. Koepke
|
||
Generate, Transduct, Adapt: Iterative Transduction with VLMs
Oindrila Saha · Logan Lawrence · Grant Horn · Subhransu Maji
|
||
Towards Video Turing Test: Video Comprehension and Reasoning Benchmark with Complex Visual Narratives
Yuanhan Zhang · Yunice Chew · Yuhao Dong · Aria Leo · Bo Hu · Ziwei Liu
|
||
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction
Wenhao Xu · Wenming Weng · Yueyi Zhang · Ruikang Xu · Zhiwei Xiong
|
||
Robustifying Zero-Shot Vision Language Models by Subspaces Alignment
Junhao Dong · Piotr Koniusz · Liaoyuan Feng · Yifei Zhang · Hao Zhu · Weiming Liu · Xinghua Qu · YEW-SOON ONG
|
||
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen · Yu-Yun Tseng · Zhuoheng Li · Anush Venkatesh · Danna Gurari
|
||
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu · Shangkun Sun · Haoran Tang · Wei Gao · Ge Li
|
||
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou · YIPING LI · Chunlin Zhong · Jianuo Huang · Jialun Pei · Hua Li · He Tang
|
||
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu · Zhaoying Pan · Xiaoqian Wang
|
||
Activation Subspaces for Out-of-Distribution Detection
Barış Zöngür · Robin Hesse · Stefan Roth
|
||
SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld · Zhe Chen · Davide Davoli · Jiapeng Tang · Saimon Terazawa · Ko Nishino · Matthias Nießner
|
||
Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns
Yuki Urakawa · Institute of Science Tokyo Yoshihiro
|
||
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi · Jie Hu · Songan Zhang · GUANNAN JIANG
|
||
PHD: Personalized 3D Human Body Fitting with Point Diffusion
Hsuan-I Ho · Chen Guo · Po-Chen Wu · Ivan Shugurov · Chengcheng Tang · Abhay Mittal · Sizhe An · Manuel Kaufmann · Linguang Zhang
|
||
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detect
Sheng Ye · Xin Chen · Yan Zhang · Xianming Lin · Liujuan Cao
|
||
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Chris Xie · Armen Avetisyan · Henry Howard-Jenkins · Yawar Siddiqui · Julian Straub · Richard Newcombe · Vasileios Balntas · Jakob Engel
|
||
CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task
James Amato · Yunan Xie · Leonel Medina-Varela · Ammar Aljerwi · Adam McCutcheon · T. Rippentrop · Kristian Gonzalez · Jacques Delabrouille · Mustapha Ishak · Nicholas Ruozzi
|
||
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao · Beier Zhu · Qianru Sun · Hanwang Zhang
|
||
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
Jianwei Fei · Yunshu Dai · Peipeng Yu · Zhe Kong · Jiantao Zhou · Zhihua Xia
|
||
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun · Yuxuan Li · Jiude Wei · Longfei Longfei Xu · Wang Nange · Yining Zhang · Cewu Lu
|
||
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie · Kaicheng Yang · Xiang An · Kun Wu · Yongle Zhao · Weimo Deng · Zimin Ran · Yumeng Wang · Ziyong Feng · Roy Miles · Ismail Elezi · Jiankang Deng
|
||
Two Losses, One Goal: Aligning Conflict Gradients for Semi-supervised Semantic Segmentation
Rui Sun · Huayu Mai · Wangkai Li · Yujia Chen · Yuan Wang
|
||
FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection
Brian Isaac-Medina · Mauricio Che · Yona Falinie A. Gaus · Samet Akcay · Toby Breckon
|
||
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang · Yushen Zuo · Yuanjun Chai · Zhendong Liu · Yicheng Fu · Yichun Feng · Kin Man Lam
|
||
CAPTURe: Evaluating Spatial Reasoning in Vision-Language Models through Counting Occluded Objects
Atin Pothiraj · Jaemin Cho · Elias Stengel-Eskin · Mohit Bansal
|
||
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao · Shunlin Lu · Huaijin Pi · Ke Fan · Liang Pan · Yueer Zhou · Ziyong Feng · Xiaowei Zhou · Sida Peng · Jingbo Wang
|
||
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang · Yicheng Feng · Hao Luo · Yijiang Li · Zihao Yue · Sipeng Zheng · Zongqing Lu
|
||
Inter Inertial Poser: Multi-Human Motion Tracking from Sparse Inertial Sensors and Pairwise Inter-Sensor Distances
Ying Xue · Jiaxi Jiang · Rayan Armani · Dominik Hollidt · Yi-Chi Liao · Christian Holz
|
||
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
Sixian Chan · Zedong Li · Xiaoqin Zhang · Wenhao Li · Shijian Lu · Chunhua Shen
|
||
Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection
Hanshi Wang · Jin Gao · Weiming Hu · Zhipeng Zhang
|
||
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić · Matej Grcic · Siniša Šegvić
|
||
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li · Yucan Zhou · Xiaoyan Gu · Bo Li · Weiping Wang
|
||
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu · Ziyun Zeng · Haitian Zheng · Jiebo Luo
|
||
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo · Seo Lee · Seungwoo Lee · Seohyung Hong · Hyungseok Seo · Kyungsu Kim
|
||
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng · Yuying Ge · Yixiao Ge · Jing Liao · Ying Shan
|
||
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
Haiwen Feng · Junyi Zhang · Qianqian Wang · Yufei Ye · Pengcheng Yu · Michael Black · Trevor Darrell · Angjoo Kanazawa
|
||
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang · Jiahui Zhang · Yingchen Yu · Shijian Lu · Song Bai
|
||
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu · Hao Chen · Yongjian Deng · Wei You
|
||
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding · Rui Qian · Xiaoyi Dong · Pan Zhang · Yuhang Zang · Yuhang Cao · Yuwei Guo · Dahua Lin · Jiaqi Wang
|
||
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang · Bin Zhu · Bin Lin · Mingzhe Zheng · Francis Tay · Ser-Nam Lim · Harry Yang · Li Yuan
|
||
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
Yu Cheng · Fajie Yuan
|
||
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou · Tianyi Zhang · Yuwen Xiong · Haonan Duan · Hengjun Pu · Ronglei Tong · Chengyang Zhao · Xizhou Zhu · Yu Qiao · Jifeng Dai · Yuntao Chen
|
||
HFD-Teacher: High-Frequency Depth Distillation from Depth Foundation Models for Enhanced Depth Completion
Zhiyuan Yang · Anqi Cheng · Haiyue Zhu · Tianjiao Li · Pey Tao · Kezhi Mao
|
||
Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen · Peijie Qiu · Wenhui Zhu · Hao Wang · Huayu Li · XUANZHAO DONG · Xiaotong Sun · Xiaobing Yu · Yalin Wang · Abolfazl Razi · Aris Sotiras
|
||
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
Fangwei Zhong · Kui Wu · Churan Wang · Hao Chen · Hai Ci · Zhoujun Li · Yizhou Wang
|
||
Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection
Jiawen Zhu · YEW-SOON ONG · Chunhua Shen · Guansong Pang
|
||
Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang · Shaoli Huang · Shenbo Xie · Xuelin Chen · Yifei Liu · Changxing Ding
|
||
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li · Bencheng Liao · Wenyu Liu · Xinggang Wang
|
||
Backdoor Mitigation by Distance-Driven Detoxification
Shaokui Wei · Jiayin Liu · Hongyuan Zha
|
||
Local Scale Equivariance with Deep Equilibrium Canonicalizer in the Latent Space
Md Ashiqur Rahman · Chiao-An Yang · Michael Cheng · Lim Hao · Jeremiah Jiang · Teck-Yian Lim · Raymond Yeh
|
||
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson · Pauline Luc · Liliane Momeni · Maks Ovsjanikov · Guillaume Le Moing · Alina Kuznetsova · Ira Ktena · Jennifer J. Sun · Skanda Koppula · Dilara Gokay · Joseph Heyward · Etienne Pot · Andrew Zisserman
|
||
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun · Qinglin Yang · Jiawei Wang · Zhen Xu · Chen Liu · Yida Wang · Kun Zhan · Hujun Bao · Xiaowei Zhou · Sida Peng
|
||
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker · Abhinav Mehrotra · Ruchika Chavhan · Malcolm Chadwick · Luca Morreale · Mehdi Noroozi · Alberto Gil Couto Pimentel Ramos · Sourav Bhattacharya
|
||
ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
Chong Xia · Shengjun Zhang · Fangfu Liu · Chang Liu · Khodchaphun Hirunyaratsameewong · Yueqi Duan
|
||
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng · Zhengyu Tong · Zhiyuan Huang · Chun-Guang Li
|
||
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
Shi-Chen Zhang · Yunheng Li · Yu-Huan Wu · Qibin Hou · Ming-Ming Cheng
|
||
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu · Zhenfei Yin · Mengwei He · Zhihui Wang · Zicheng Liu · Zhiyong Wang · Kun Hu
|
||
Guiding Diffusion Models with Adaptive Negative Sampling Without External Resources
Alakh Desai · Nuno Vasconcelos
|
||
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng · Ziyuan Huang · Kaixiang Ji · Yichao Yan
|
||
Spatio-Spectral Pattern Illumination for Direct and Indirect Separation from a Single Hyperspectral Image
Shin Ishihara · Imari Sato
|
||
MOSCATO: Predicting Multiple Object State Change Through Actions
Parnian Zameni · Yuhan Shen · Ehsan Elhamifar
|
||
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture-of-Experts Computing System on Edge
Linshen Liu · Boyan Su · Junyue Jiang · Guanlin Wu · Cong Guo · Ceyu Xu · Hao Yang
|
||
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
Zexi Jia · Chuanwei Huang · Yeshuang Zhu · Hongyan Fei · Ying Deng · Zhiqiang Yuan · Jiapei Zhang · Jinchao Zhang · Jie Zhou
|
||
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
Yulin Pan · Xiangteng He · Chaojie Mao · Zhen Han · Zeyinzi Jiang · Jingfeng Zhang · Yu Liu
|
||
Learning on the Go: A Meta-learning Object Navigation Model
Xiaorong Qin · Xinhang Song · Sixian Zhang · Xinyao Yu · Xinmiao Zhang · Shuqiang Jiang
|
||
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He · Min Cao · Silong Peng · Qiong Xie
|
||
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh · Sadegh Aliakbarian · Charlie Hewitt · Lohit Petikam · Xiao-Xian Xiao-Xian · Antonio Criminisi · Thomas J. Cashman · Tadas Baltrusaitis
|
||
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He · Cheng Luo · Xiaole Xian · Bing Li · Siyang Song · Muhammad Haris Khan · Weicheng Xie · Linlin Shen · Zongyuan Ge · Bernard Ghanem · Xiangyu Yue
|
||
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li · YuanShihao YuanShihao · Haolin Wang · Qizhang Li · Ming Liu · Chen Xu · Guangming Shi · Wangmeng Zuo
|
||
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang · Zhi Hou · Zhihui Ke · Xianghui Yang · Xiaobo Zhou · Tie Qiu
|
||
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng · Hongxun Yao · Kui Jiang · Yi Xiao · Sicheng Zhao
|
||
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
Jiawei He · Danshi Li · Xinqiang Yu · Zekun Qi · Wenyao Zhang · Jiayi Chen · Zhaoxiang Zhang · Zhizheng Zhang · Li Yi · He Wang
|
||
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
Zhuo Li · Mingshuang Luo · RuiBing Hou · XIN ZHAO · Hao Liu · Hong Chang · Zimo Liu · Chen Li
|
||
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
Hongchi Ma · Guanglei Yang · Debin Zhao · Yanli Ji · Wangmeng Zuo
|
||
Towards Physically Plausible Video Generation via VLM Planning
Xindi Yang · Baolu Li · Yiming Zhang · Zhenfei Yin · LEI BAI · Liqian Ma · Zhiyong Wang · Jianfei Cai · Tien-Tsin Wong · Huchuan Lu · Xu Jia
|
||
OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration
Yiming Zuo · Willow Yang · Zeyu Ma · Jia Deng
|
||
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao SHEN · Luozhou Wang · Jiantao Lin · Wenhang Ge · CHAOZHE ZHANG · Xin Tao · Di ZHANG · Pengfei Wan · Guangyong Chen · Yijun Li · Ying-Cong Chen
|
||
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee · Taiki Miyanishi · Shuhei Kurita · Koya Sakamoto · Daichi Azuma · Yutaka Matsuo · Nakamasa Inoue
|
||
Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation
Qin Zhou · Guoyan Liang · Xindi Li · Jingyuan CHEN · Zhe Wang · Chang Yao · Sai Wu
|
||
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He · Yuanyu He · Shaoxuan He · Feng Chen · Hong Zhou · Kaipeng Zhang · Bohan Zhuang
|
||
Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification
Ruiqi Du · Xu Tang · Xiangrong Zhang · Jingjing Ma
|
||
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou · Zongsheng Yue · Xiaoming Li · Chen Change Loy
|
||
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
WEI-JER Chang · Masayoshi Tomizuka · Wei Zhan · Manmohan Chandraker · Francesco Pittaluga
|
||
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye · Yongkun Du · Yunbo Tao · Zhineng Chen
|
||
Genflow3D: Generative scene flow estimation and prediction on point cloud sequences
Hanlin Li · Wenming Weng · Yueyi Zhang · Zhiwei Xiong
|
||
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng · Shuaiting Li · Zeyu Wang · Kedong Xu · Hong Gu · Kejie Huang
|
||
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
Luoxi Zhang · Pragyan Shrestha · Yu Zhou · Chun Xie · Itaru Kitahara
|
||
SD$^2$Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation
lijiayi jiayi
|
||
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
Minkyun Seo · Hyungtae Lim · Kanghee Lee · Luca Carlone · Jaesik Park
|
||
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu · Zhibo Yang · Yuliang Liu · Xiang Bai
|
||
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi · Xiangbo Yin · yeyunchen yeyunchen · Yachao Zhang · zhizhong zhang · Yuan Xie · Yanyun Qu
|
||
MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Jiahui Lei · Kyle Genova · George Kopanas · Noah Snavely · Leonidas Guibas
|
||
Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction
Runmin Zhang · Zhu Yu · Si-Yuan Cao · Lingyu Zhu · Guangyi Zhang · Xiaokai Bai · Hui-liang Shen
|
||
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause · Timy Phan · Ming Gui · Stefan A. Baumann · Vincent Tao Hu · Björn Ommer
|
||
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG · Yiwu Zhong · Zi-Yuan Hu · Yeyao Tao · Liwei Wang
|
||
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
Yuqi Wu · Wenzhao Zheng · Sicheng Zuo · Yuanhui Huang · Jie Zhou · Jiwen Lu
|
||
Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues
Chen Chen · Kangcheng Bin · Hu Ting · Jiahao Qi · Xingyue Liu · Tianpeng Liu · Zhen Liu · Yongxiang Liu · Ping Zhong
|
||
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo · Xiaohan Wang · Serena Yeung-Levy
|
||
EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Yixiang Chen · Peiyan Li · Yan Huang · Jiabing Yang · Kehan Chen · Liang Wang
|
||
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
Shiming Chen · Bowen Duan · Salman Khan · Fahad Khan
|
||
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing · Jiazhou Chen
|
||
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu · Runze Wang · Bin Ren · Guolei Sun · Biao Gong · Yanwei Fu · Danda Pani Paudel · Xuanjing Huang · Luc Gool
|
||
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola · Kevin Qu · Nando Metzger · Bingxin Ke · Alexander Becker · Konrad Schindler · Anton Obukhov
|
||
DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization
Aniket Roy · Shubhankar Borse · Shreya Kadambi · Debasmit Das · Shweta Mahajan · Risheek Garrepalli · Hyojin Park · Ankita Nayak · Rama Chellappa · Munawar Hayat · Fatih Porikli
|
||
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury · Subrata Biswas · Sayan Nag · Tushar Nagarajan · Calvin Murdock · Ishwarya Ananthabhotla · Yijun Qian · Vamsi Ithapu · Dinesh Manocha · Ruohan Gao
|
||
Inference-Time Diffusion Model Distillation
Geon Yeong Park · Sang Wan Lee · Jong Ye
|
||
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim · Chanyong Shin · Joonhyun Jeong · Hyungsik Jung · Seyun Lee · Sewhan Chun · Dong-Hyun HWANG · Joonsang Yu
|
||
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu · Wenqi Shao · Zitao Liu · Lingxiao Du · Fanqing Meng · Boxuan Li · Botong Chen · Siyuan Huang · Kaipeng Zhang · Ping Luo
|
||
Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images
Shunya Nagashima · Komei Sugiura
|
||
4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads
Ling Liu · Jun Tian · Li Yi
|
||
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou · Chunxiao Fan · Ziyan Liu · Yuexin Wu · Xinliang Wang
|
||
PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function
Qikui Zhu
|
||
CharaConsist: Fine-Grained Consistent Character Generation
Mengyu Wang · Henghui Ding · Jianing Peng · Yao Zhao · Yunpeng Chen · Yunchao Wei
|
||
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik · Felix Yang · Benedikt Blumenstiel · Erik Scheurer · Rocco Sedona · Stefano Maurogiovanni · Valerio Marsocci · Nikolaos Dionelis · Jente Bosmans · Niklas Kopp · Rahul Ramachandran · Paolo Fraccaro · Thomas Brunschwiler · Gabriele Cavallaro · Juan Moreno · Nicolas Longépé
|
||
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang · Ning Kang · Lewei Yao · Mengzhao Chen · Chengyue Wu · Songyang Zhang · Shuchen Xue · Yong Liu · Taiqiang Wu · Xihui Liu · Kaipeng Zhang · Shifeng Zhang · Wenqi Shao · Zhenguo Li · Ping Luo
|
||
M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast
Jiacheng Lu · Jiacheng Lu · Shiyu Zhang · Guoping Huo
|
||
GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue · Lin Lu · Yang Liu · Mingdong Wu · Hao Dong · Yanbin Zhang · Renmin Han · Baoquan Chen
|
||
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis.
Onkar Susladkar · Gayatri Deshmukh · Yalcin Tur · Gorkem Durak · Ulas Bagci
|
||
Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos
Chengbo Yuan · Geng Chen · Li Yi · Yang Gao
|
||
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
CHENMING ZHU · Tai Wang · Wenwei Zhang · Jiangmiao Pang · Xihui Liu
|
||
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran · Mingyu Lee · Huan Xu · Souvik Kundu · Tushar Krishna
|
||
Visual Textualization for Image Prompted Object Detection
Yongjian Wu · Yang Zhou · Jiya Saiyin · Bingzheng Wei · Yan Xu
|
||
Tile-wise vs. Image-wise: Random-Tile Loss and Training Paradigm for Gaussian Splatting
Xiaoyu Zhang · Weihong Pan · Xiaojun Xiang · Hongjia Zhai · Liyang Zhou · Hanqing Jiang · Guofeng Zhang
|
||
PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Geonhee Sim · Gyeongsik Moon
|
||
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler · Karsten Mueller · Wojciech Samek
|
||
Understanding Co-speech Gestures in-the-wild
Sindhu Hegde · K R Prajwal · Taein Kwon · Andrew Zisserman
|
||
Textured 3D Regenerative Morphing with 3D Diffusion Prior
Songlin Yang · Yushi LAN · Honghua Chen · Xingang Pan
|
||
Borrowing Eyes for the Blind Spot: Overcoming Data Scarcity in Malicious Video Detection via Cross-Domain Retrieval Augmentation
Rongpei Hong · Jian Lang · Ting Zhong · Fan Zhou
|
||
DreamRelation: Relation-Centric Video Customization
Yujie Wei · Shiwei Zhang · Hangjie Yuan · Biao Gong · Longxiang Tang · Xiang Wang · Haonan Qiu · Hengjia Li · Shuai Tan · Yingya Zhang · Hongming Shan
|
||
AnyPortal: Zero-Shot Consistent Video Background Replacement
Wenshuo Gao · Xicheng Lan · Shuai Yang
|
||
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Yukang Cao · Chenyang Si · Jinghao Wang · Ziwei Liu
|
||
Layer-wise Vision Injection with Disentangled Attention for Efficient LVLMs
Xuange Zhang · Dengjie Li · Bo Liu · Zenghao Bao · Yao Zhou · Baisong Yang · liuzhongying liuzhongying · Yujie Zhong · Tongtong Yuan
|
||
NormalCrafter: Learning Temporally Consistent Video Normal from Video Diffusion Priors
Yanrui Bin · Wenbo Hu · Haoyuan Wang · Xinya Chen · Bing WANG
|
||
Combinative Matching for Geometric Shape Assembly
Nahyuk Lee · Juhong Min · Junhong Lee · Chunghyun Park · Minsu Cho
|
||
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li · Chuanguang Yang · Hansheng Zeng · Zeyu Dong · Zhulin An · Yongjun Xu · Yingli Tian · Hao Wu
|
||
CAVIS: Context-Aware Video Instance Segmentation
Seunghun Lee · Jiwan Seo · Kiljoon Han · Minwoo Choi · Sunghoon Im
|
||
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim · Ziseok Lee · Donghyeon Cho · Sanghyun Jo · Yeonsung Jung · Kyungsu Kim · Eunho Yang
|
||
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis
Inseung Hwang · Kiseok Choi · Hyunho Ha · Min Kim
|
||
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
Shizhen Zhao · Jiahui Liu · Xin Wen · Haoru Tan · Xiaojuan Qi
|
||
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth · David Rozenberszki · Angela Dai
|
||
Leveraging Spatial Invariance to Boost Adversarial Transferability
Zihan Zhou · LI LI · Yanli Ren · Chuan Qin · Guorui Feng
|
||
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng · Yuanhuiyi Lyu · Lutao Jiang · Danda Pani Paudel · Luc Gool · Xuming Hu
|
||
PLAN: Proactive Low-Rank Allocation for Continual Learning
XIEQUN WANG · Zhan Zhuang · Yu Zhang
|
||
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du · Jiayang Zhang · Guoshun Nan · Wendi Deng · Zhenyan Chen · Chenyang Zhang · Wang Xiao · Shan Huang · Yuqi Pan · Tao Qi · Sicong Leng
|
||
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu · Junwen Chen · Zhanhao Liang · Yinuo Wang · Ji Li · Chao Zhang · Bin Wang · Yuhui Yuan
|
||
PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction
Yufei Han · Bowen Tie · Heng Guo · Youwei Lyu · Si Li · Boxin Shi · Yunpeng Jia · Zhanyu Ma
|
||
Personalized Federated Learning under Local Supervision
Qiqi Liu · Jiaqiang Li · Yuchen Liu · Yaochu Jin · Lingjuan Lyu · Xiaohu Wu · Han Yu
|
||
Scaling Inference-time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang · Zhengyuan Yang · Linjie Li · Hongjin Lu · Yuancheng Xu · Chung-Ching Lin · Kevin Lin · Furong Huang · Lijuan Wang
|
||
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee Wong · Jose Javier Gonzalez Ortiz · John Guttag · Adrian Dalca
|
||
Understanding Flatness in Generative Models: Its Role and Benefits
Taehwan Lee · Kyeongkook Seo · Jaejun Yoo · Sung Yoon Yoon
|
||
Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong · Muhammad Saleem · Korrawe Karunratanakul · Pu Wang · Hongfei Xue · Chen Chen · chuan guo · Junli Cao · Jian Ren · Sergey Tulyakov
|
||
ModSkill: Physical Character Skill Modularization
Yiming Huang · Zhiyang Dou · Lingjie Liu
|
||
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du · Xin WANG · Fangwei Hao · Mingyang Yu · Chunyuan Chen · Jiesheng Wu · Bin Wang · Jing Xu · Ping Li
|
||
DATA: Domain-And-Time Alignment for High-Quality Feature Fusion in Collaborative Perception
Chengchang Tian · Jianwei Ma · Yan Huang · Zhanye Chen · Honghao Wei · Hui Zhang · Wei Hong
|
||
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation
Zhengyin Liang · Hui Yin · Min Liang · Qianqian Du · Ying Yang · Hua Huang
|
||
Beyond Pixel Uncertainty: Bounding the OoD Objects in Road Scenes
Huachao Zhu · Zelong Liu · Zhichao Sun · Yuda Zou · Gui-Song Xia · Yongchao Xu
|
||
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong · Jinglun Li · Xinyu Zhou · Shilin Yan · Pinxue Guo · Kaixun Jiang · Zhaoyu Chen · Shuyong Gao · Runze Li · Xingdong Sheng · Wei Zhang · Hong Lu · Wenqiang Zhang
|
||
MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling
Guan Luo · Jianfeng Zhang
|
||
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Vittorio Pipoli · Alessia Saporita · Federico Bolelli · Marcella Cornia · Lorenzo Baraldi · Costantino Grana · Rita Cucchiara · Elisa Ficarra
|
||
NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li · Changsheng Li · Ruilin Lv · Yuhang Li · Yang Gao · Xiaolu Zhang · JUN ZHOU
|
||
AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance
Yilin Wei · Mu Lin · Yuhao Lin · Jian-Jian Jiang · Xiao-Ming Wu · Ling-An Zeng · Wei-Shi Zheng
|
||
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for Moving Object Segmentation
Ziliang Miao · Runjian Chen · Yixi Cai · Buwei He · Wenquan Zhao · Wenqi Shao · Bo Zhang · Fu Zhang
|
||
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
Zhixuan Liu · Haokun Zhu · Rui Chen · Jonathan Francis · Soonmin Hwang · Ji Zhang · Jean Oh
|
||
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets
Lingyu Chen · Yawen Zeng · Yue Wang · Peng Wan · Guo-chen Ning · Hongen Liao · Daoqiang Zhang · Fang Chen
|
||
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu · Senthil Purushwalkam · An Yan · Caiming Xiong · Ran Xu
|
||
Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration
Sitao Zhang · Hongda Mao · Qingshuang Chen · Yelin Kim
|
||
Drawing Developmental Trajectory from Cortical Surface Reconstruction
WENXUAN WU · ruowen qu · Zhongliang Liu · Zhuoyan Dai · Dongzi Shi · Sijin Yu · Tong Xiong · Shiping Liu · Xiangmin Xu · Xiaofen Xing · Xin Zhang
|
||
Snakes and Ladders: Two Steps Up for VideoMamba
Hui Lu · Albert Ali Salah · Ronald Poppe
|
||
Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
Luyao Tang · Kunze Huang · Yuxuan Yuan · Chenxin Li · Xiaotong Tu · Xinghao Ding · Chaoqi Chen · Yue Huang
|
||
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon · Yannis Karmim · Julio Silva-Rodríguez · Paul Couairon · Clément Rambour · Raphael Fournier-Sniehotta · Ismail Ayed · Jose Dolz · Nicolas THOME
|
||
Scaling Language-Free Visual Representation Learning
David Fan · Shengbang Tong · Jiachen Zhu · Koustuv Sinha · Zhuang Liu · Xinlei Chen · Michael Rabbat · Nicolas Ballas · Yann LeCun · Amir Bar · Saining Xie
|
||
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai · Weizhao Weng · Zihao Huang · Yandan Chen · Siquan Huang · Ping Gao · Victor Leung · Ying Gao
|
||
MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
Prerit Gupta · Jason Alexander Fotso-Puepi · Zhengyuan Li · Jay Mehta · Aniket Bera
|
||
DisTime: Distribution-based Time Tokenizer for Temporal Localization with Video Large Language Model
yingsen zeng · Zepeng Huang · Yujie Zhong · Chengjian Feng · Jie Hu · Lin Ma · Yang Liu
|
||
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei · Jiajin Tang · Sibei Yang
|
||
PosedVideo365 - A Diverse Dataset with Accurate Camera Pose
Karhan Kayan · Stamatis Alexandropoulos · Rishabh Jain · Yiming Zuo · Erich Liang · Jia Deng
|
||
Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection
Juan Hu · Shaojing Fan · Terence Sim
|
||
CARP: Coarse-to-Fine Autoregressive Prediction for Visuomotor Policy Learning
Zhefei Gong · Pengxiang Ding · Shangke Lyu · Siteng Huang · Mingyang Sun · Wei Zhao · Zhaoxin Fan · Donglin Wang
|
||
Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu · Peng Gao · Lewei Lu · Xiaoqin Zhang · Ling Shao · Shijian Lu
|
||
Large Scale Video Continual Learning with Bootstrapped Compression
Shivani Mall · Joao F. Henriques
|
||
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang · Huiyao Xu · Sitong Guo · Zhongwei Xie · Hujun Bao · Weiwei Xu · Changqing Zou
|
||
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai · Menghan Xia · Xiao Fu · Xintao Wang · Lianrui Mu · Jinwen Cao · Zuozhu Liu · Haoji Hu · Xiang Bai · Pengfei Wan · Di ZHANG
|
||
TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging
QingleiCao QingleiCao · Ziyao Tang · Xiaoqin Tang
|
||
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu · Diping Song
|
||
FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases
Matteo Poggi · Fabio Tosi
|
||
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang · Han Qiu
|
||
DEPTHOR:Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
Jijun Xiang · Xuan Zhu · Xianqi Wang · Yu Wang · Hong Zhang · Fei Guo · Xin Yang
|
||
Street Gaussians without 3D Object Tracker
Ruida Zhang · Chengxi Li · Chenyangguang Zhang · Xingyu Liu · Haili Yuan · Yanyan Li · Xiangyang Ji · Gim Hee Lee
|
||
Event-aided Dense and Continuous Point Tracking: Everywhere and Anytime
Zhexiong Wan · Jianqin Luo · Yuchao Dai · Gim Hee Lee
|
||
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano · Aditya Arora · Luis Herranz · Kosta Derpanis · Michael Brown · Javier Vazquez-Corral
|
||
Boosting Dynamic Prototyping via Dual-Knowledge Clustering for Semi-Supervised Lifelong Person Re-Identification
Kunlun Xu · Fan Zhuo · Jiangmeng Li · Xu Zou · Jiahuan Zhou
|
||
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching
Yang Liu · Wentao Feng · Zhuoyao Liu · Shudong Huang · Jiancheng Lv
|
||
Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation
Jinjing Zhu · Tianbo Pan · Zidong Cao · Yexin Liu · James Kwok · Hui Xiong
|
||
Mobile Video Diffusion
Haitam Ben Yahia · Denis Korzhenkov · Ioannis Lelekas · Amir Ghodrati · Amir Habibian
|
||
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie · Hao Tang · Bin Duan · Dawen Cai · Yan Yan · Gady Agam
|
||
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du · Shengwu Xiong · Yi Rong
|
||
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay · Amira Guesmi · Muhammad Abdullah Hanif · Bassem ouni · Muhammad Shafique
|
||
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild · Rafael Valencia · Patric Jensfelt
|
||
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Fabian Perez · Sara Rojas Martinez · Carlos Hinojosa · Hoover Rueda-Chacón · Bernard Ghanem
|
||
IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark
Zhe Cao · Jin Zhang · Ruiheng Zhang
|
||
Learning Robust Image Watermarking with Lossless Cover Recovery
jiale chen · Wei Wang · Chongyang Shi · Li Dong · Xiping Hu
|
||
OmniDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan · Jiaxin Guo · Chen Wang · Yun-Hui Liu
|
||
ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers
Nicholas DiBrita · Jason Han · Tirthak Patel
|
||
Faster and Better 3D Splatting via Group Training
Chengbo Wang · Guozheng Ma · Yizhen Lao · Yifei Xue
|
||
Hierarchical Cross-modal Prompt Learning for Vision-Language Models
Hao Zheng · Shunzhi Yang · Zhuoxin He · Jinfeng Yang · Zhenhua Huang
|
||
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu · Xiaoqing Guo · Xinyu Liu · Yifan Liu · Hao Zheng · Yawen Huang · Yixuan Yuan
|
||
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao · Ranjie Duan · Fengxiang Wang · Chi Chen · Caixin KANG · Shouwei Ruan · Jialing Tao · YueFeng Chen · Hui Xue · Xingxing Wei
|
||
Spatial-Temporal Aware Visuomotor Diffusion Policy Learning
Zhenyang Liu · Yikai Wang · Kuanning Wang · Longfei Liang · Xiangyang Xue · Yanwei Fu
|
||
A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition
Connor Malone · Somayeh Hussaini · Tobias Fischer · Michael Milford
|
||
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Pei He · Lingling Li · Licheng Jiao · Ronghua Shang · Fang Liu · Shuang Wang · Xu Liu · wenping ma
|
||
MeshMamba: State Space Models for articulated 3D mesh generation and reconstruction
Yusuke Yoshiyasu · Leyuan Sun · Ryusuke Sagawa
|
||
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects
Yidi Shao · Mu Huang · Chen Change Loy · Bo Dai
|
||
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu · Ziqing Yang · Yihan Ma · Michael Backes · Savvas Zannettou · Yang Zhang
|
||
TARS: Traffic-Aware Radar Scene Flow Estimation
Jialong Wu · Marco Braun · Dominic Spata · Matthias Rottmann
|
||
M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision
Kailai Zhou · Fuqiang Yang · Shixian Wang · Bihan Wen · Chongde Zi · Linsen Chen · Qiu Shen · Xun Cao
|
||
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit
Stefan Kolek · Aditya Chattopadhyay · Kwan Ho Ryan Chan · Hector Andrade Loarca · Gitta Kutyniok · Rene Vidal
|
||
FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Fei Yin · Mallikarjun Reddy · Chun-Han Yao · Rafal Mantiuk · Varun Jampani
|
||
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan · Ce Zhang · Silong Yong · Martin Ma · Simon Stepputtis · Louis-Philippe Morency · Deva Ramanan · Katia Sycara · Yaqi Xie
|
||
Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge
Xiaogang Xu · Jiafei Wu · Qingsen Yan · Jiequan Cui · Richang Hong · Bei Yu
|
||
Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park · Seokeon Choi · Hyoungwoo Park · Sungrack Yun
|
||
SummDiff: Generative Modeling of Video Summarization with Diffusion
Kwanseok Kim · Jaehoon Hahm · Sumin Kim · Jinhwan Sul · Byung-Hak Kim · Joonseok Lee
|
||
Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation
Yunze Tong · Fengda Zhang · Didi Zhu · Jun Xiao · Kun Kuang
|
||
Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography
Jianing Zhang · Jiayi Zhu · Feiyu Ji · Xiaokang Yang · Xiaoyun Yuan
|
||
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting
Baijun Ye · Minghui Qin · Saining Zhang · Moonjun Gong · Shaoting Zhu · Hao Zhao · Hang Zhao
|
||
Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
Yu Lei · Bingde Liu · Qingsong Xie · Haonan Lu · Zhijie Deng
|
||
MonoSOWA: Scalable monocular 3D Object detector Without human Annotations
Jan Skvrna · Lukas Neumann
|
||
S⁴M: Boosting Semi-Supervised Instance Segmentation with Segment Anything Model
Heeji Yoon · Heeseong Shin · Eunbeen Hong · Hyunwook Choi · Hansang Cho · Daun Jeong · Seungryong Kim
|
||
NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement
Yang Yang · Mao Dongni · Hiroaki Santo · Yasuyuki Matsushita · Fumio Okura
|
||
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
Yi Qin · Rui Wang · Tao Huang · Tong Xiao · Liping Jing
|
||
Loss Functions for Predictor-based Neural Architecture Search
Han Ji · Yuqi Feng · Jiahao Fan · Yanan Sun
|
||
TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images
Tu Bui · Shruti Agarwal · John Collomosse
|
||
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo · Yawei Li · Taolin Zhang · Jiangshan Wang · Tao Dai · Shu-Tao Xia · Luca Benini
|
||
SeHDR: Single-Exposure HDR Scene Reconstruction via 3D Gaussian Bracketing
Yiyu Li · Haoyuan Wang · Ke Xu · Gerhard Hancke · Rynson W.H. Lau
|
||
SILO: Solving Inverse Problems with Latent Operators
Ron Raphaeli · Sean Man · Michael Elad
|
||
Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
Enyu Liu · En Yu · Sijia Chen · Wenbing Tao
|
||
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
Simone Peirone · Francesca Pistilli · Giuseppe Averta
|
||
Efficient 3D Gaussian Splatting with Compressed Model Training
Sankeerth Durvasula · Sharanshangar Muhunthan · Zain Moustafa · Richard Chen · Ruofan Liang · Yushi Guan · Nilesh Ahuja · Nilesh Jain · Selvakumar Panneer · Nandita Vijaykumar
|
||
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota · Zongze Wu · Richard Zhang · David Bau · Eli Shechtman · Nicholas Kolkin
|
||
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
Thomas Kreutz · Max Mühlhäuser · Alejandro Sanchez Guinea
|
||
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Wufei Ma · Haoyu Chen · Guofeng Zhang · Yu-Cheng Chou · Celso de Melo · Alan Yuille · Jieneng Chen
|
||
Dual-Process Image Generation
Grace Luo · Jonathan Granskog · Aleksander Holynski · Trevor Darrell
|
||
Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity
Sung Ju Lee · Nam Ik Cho
|
||
ZeroStereo: Zero-shot Stereo Matching from Single Images
Xianqi Wang · Hao Yang · Gangwei Xu · Junda Cheng · Min Lin · Yong Deng · Jinliang Zang · Yurui Chen · Xin Yang
|
||
PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups
Sakuya Ota · Qing Yu · Kent Fujiwara · Satoshi Ikehata · Ikuro Sato
|
||
Scene Coordinate Reconstruction Priors
Wenjing Bian · Axel Barroso-Laguna · Tommaso Cavallari · Victor Prisacariu · Eric Brachmann
|
||
ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models
Mengxue Qu · Yibo Hu · Kunyang Han · Yunchao Wei · Yao Zhao
|
||
UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering
Junkai Deng · Hanting Niu · Jiaze Li · Fei Hou · Ying He
|
||
RIOcc: Efficient Cross-Modal Fusion Transformer with Collaborative Feature Refinement for 3D Semantic Occupancy Prediction
Baojie Fan · Xiaotian Li · Yuhan Zhou · Yuyu Jiang · Jiandong Tian · Huijie Fan
|
||
Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios
Deng Li · Aming WU · Yang Li · Yaowei Wang · Yahong Han
|
||
FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Hao-Yu Hou · Chun-Yi Lee · Motoharu Sonogashira · Yasutomo Kawanishi
|
||
Egocentric Action-aware Inertial Localization in Point Clouds
Mingfang Zhang · Ryo Yonetani · Yifei Huang · Liangyang Ouyang · Ruicong Liu · Yoichi Sato
|
||
G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation
Juntao Jian · Xiuping Liu · Zixuanchen Zixuanchen · Manyi Li · Jian Liu · Ruizhen Hu
|
||
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong · Shichao Dong · Jin Wang · Jing Huang · Li Zhou · Zenghui Sun · Lihua Jing · Jinsong Lan · Xiaoyong Zhu · Bo Zheng
|
||
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin · Min Wang · Peiwei Li · Wengang Zhou · Houqiang Li
|
||
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh · SeungHoo Hong · Tae-Young Kim · Jae-Pil Heo · Simon Woo
|
||
SparseVILA: Query-Aware Visual Sparsity Should Happen at Decoding
Samir Khaki · Junxian Guo · Jiaming Tang · Shang Yang · Yukang Chen · Konstantinos Plataniotis · Yao Lu · Song Han · Zhijian Liu
|
||
HPSv3: Towards Full-Spectrum Human Preference Score
Yuhang Ma · Keqiang Sun · Xiaoshi Wu · Hongsheng Li
|
||
MonSTeR: a Unified Model for Motion, Scene, Text Retrieval
Luca Collorone · Matteo Gioia · Massimiliano Pappa · Paolo Leoni · Giovanni Ficarra · Or Litany · Indro Spinelli · Fabio Galasso
|
||
Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories
Yicong Li · Yiyang Chen · Zhenyuan Ma · Junbin Xiao · Xiang Wang · Angela Yao
|
||
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang · Yuan Liu · Ziwei Liu · Wenping Wang · Zhen Dong · Bisheng Yang
|
||
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou · Hao Xiang · Zhaoliang Zheng · Zhihao Zhao · Mingyue Lei · Yun Zhang · Tianhui Cai · Xinyi Liu · Johnson Liu · Maheswari Bajji · Xin Xia · Zhiyu Huang · Bolei Zhou · Jiaqi Ma
|
||
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan · Ömer TAL · M. Emre Gursoy
|
||
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Yujie Zhou · Jiazi Bu · Pengyang Ling · Pan Zhang · Tong Wu · Qidong Huang · Jinsong Li · Xiaoyi Dong · Yuhang Zang · Yuhang Cao · Anyi Rao · Jiaqi Wang · Li Niu
|
||
DCHM: Depth-Consistency Human Modeling for Multiview Detection
Jiahao Ma · Tianyu Wang · Miaomiao Liu · David Ahmedt Aristizabal · Chuong Nguyen
|
||
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev · Iurii Makarov · Jianyuan Wang · Natalia Neverova · Andrea Vedaldi · Christian Rupprecht
|
||
Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction
Zhirui Gao · Renjiao Yi · YaQiao Dai · Xuening Zhu · Wei Chen · Kai Xu · Chenyang Zhu
|
||
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed · Junjie Fei · Jian Ding · Eslam BAKR · Mohamed Elhoseiny
|
||
SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
Wenkun He · Yun Liu · Ruitao Liu · Li Yi
|
||
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang · Yingchen Yu · Yunqing Zhao · Shijian Lu · Song Bai
|
||
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng · Yanchen Huang · Yingchao Yu · Zizheng Zhu · Junfeng Tang · Zhaofei Yu · Yaochu Jin
|
||
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Bimsara Pathiraja · Maitreya Patel · Shivam Singh · Yezhou Yang · Chitta Baral
|
||
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran · Thieu Vo · Anh Nguyen · Sang Dinh · Van Nguyen
|
||
Discovering Divergent Representations between Text-to-Image Models
Lisa Dunlap · Trevor Darrell · Joseph Gonzalez · Fabian Caba Heilbron · Josef Sivic · Bryan Russell
|
||
Learned Image Compression with Hierarchical Progressive Context Modeling
Yuqi Li · Haotian Zhang · Li Li · Dong Liu
|
||
PLMP - Point-Line Minimal Problems for Projective SfM
Kim Kiehn · Albin Ahlbäck · Kathlén Kohn
|
||
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
Yusheng Dai · Chenxi Wang · Chang Li · Chen Wang · Kewei Li · Jun Du · Lei Sun · Jianqing Gao · Ruoyu Wang · Jiefeng Ma
|
||
Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts
Chiao-An Yang · Kuan-Chuan Peng · Raymond Yeh
|
||
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu · Saihui Hou · Saijie Hou · Jiabao Du · Shibei Meng · Yongzhen Huang
|
||
DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching
Emery Pierson · Lei Li · Angela Dai · Maks Ovsjanikov
|
||
Blind Video Super-Resolution based on Implicit Kernels
Qiang Zhu · Yuxuan Jiang · Shuyuan Zhu · Fan Zhang · David Bull · Bing Zeng
|
||
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao · Yu Zhong · Ziqi Wang · Liang-Jian Deng
|
||
SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Arthur Josi · Luiz Gustavo Hafemann · Abdallah Dib · Emeline Got · Rafael M. O. Cruz · Marc-André Carbonneau
|
||
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu · Ruonan Yu · Xinchao Wang
|
||
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu · Shanmukha Vellamcheti · Sathyanarayanan Aakur
|
||
MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild
Muhammad Saleem · Ekkasit Pinyoanuntapong · Mayur Patel · Hongfei Xue · Ahmed Helmy · Srijan Das · Pu Wang
|
||
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao · Wei Qian · Aobo Chen · Mengdi Huai
|
||
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian · Yanhao Chen · Wangyancheng Wangyancheng · Junfeng Yao · Meihong Wang · Qingqiang Wu
|
||
An OpenMind for 3D medical vision self-supervised learning
Tassilo Wald · Constantin Ulrich · Jonathan Suprijadi · Sebastian Ziegler · Michal Nohel · Robin Peretzke · Gregor Koehler · Klaus Maier-Hein
|
||
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan · Shaohui Liu · Paul-Edouard Sarlin · Oscar Gentilhomme · David Caruso · Maurizio Monge · Richard Newcombe · Jakob Engel · Marc Pollefeys
|
||
CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen · Lei Yu · Yi Wan · Yingying Pei · Xinyi Liu · YongxiangYao YongxiangYao · Yingying Zhang · Lixiang Ru · Liheng Zhong · Jingdong Chen · Ming Yang · Yongjun Zhang
|
||
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Chun-Han Yao · Yiming Xie · Vikram Voleti · Huaizu Jiang · Varun Jampani
|
||
DAA$^\ast$: Deep Angular A Star For Image-Based Path Planning
Zhiwei Xu
|
||
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang · Lin Geng Foo · Hossein Rahmani · Jun Liu · De Wen Soh
|
||
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha · Jens Mehnert · Alexandru Condurache
|
||
PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations
YU WEI · Jiahui Zhang · Xiaoqin Zhang · Ling Shao · Shijian Lu
|
||
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou · Xiaogang Wang · Yulan Guo · Kai Xu
|
||
Seeing the Unseen: A Semantic Alignment and Context-Aware Prompt Framework for Open-Vocabulary Camouflaged Object Segmentation
Peng Ren · Tian Bai · Jing Sun · Fuming Sun
|
||
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
Jiwon Kim · Pureum Kim · SeonHwa Kim · Soobin Park · Eunju Cha · Kyong Hwan Jin
|
||
RESCUE: cRowd Evacuation Simulation via Controlling SDM-United charactErs
Xiaolin Liu · Tianyi zhou · Hongbo Kang · Jian Ma · Ziwen Wang · Jing Huang · Wenguo Weng · Yu-Kun Lai · Kun Li
|
||
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
YIWEN CHEN · Hieu Nguyen · Vikram Voleti · Varun Jampani · Huaizu Jiang
|
||
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan · Karthik Nandakumar
|
||
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan · Ziquan Deng · Kwan-Liu Ma
|
||
DOGE : Towards Versatile Visual Document Grounding and Referring
Yinan Zhou · Yuxin Chen · Haokun Lin · Yichen Wu · Shuyu Yang · Zhongang Qi · Chen Ma · Li Zhu
|
||
RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding
Baoli Sun · Ning Wang · Xinzhu Ma · Anqi Zou · Lu Yihang · Chuixuan Fan · Zhihui Wang · Kun Lu · Zhiyong Wang
|
||
Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection
Xingjian Wang · Li Chai · Jiming Chen
|
||
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han · Yongjun Lee · Byeonghun Lee · Sang Hyun Park · Sunghoon Im · Kyong Hwan Jin
|
||
3D Mesh Editing using Masked LRMs
Will Gao · Dilin Wang · Yuchen Fan · Aljaz Bozic · Tuur Stuyck · Zhengqin Li · Zhao Dong · Rakesh Ranjan · Nikolaos Sarafianos
|
||
Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee · Edoardo Remelli · Yale Song · Bugra Tekin · Abhay Mittal · Bharat Bhatnagar · Necati Cihan Camgoz · Shreyas Hampali · Eric Sauser · Shugao Ma · Angela Yao · Fadime Sener
|
||
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu · Yu Zhou · hao cheng · Jingke Meng · Wei-Shi Zheng
|
||
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi · Manon Béchaz · Zeming Chen · Antoine Bosselut · Devis Tuia
|
||
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang · Haihong E · Jiacheng Liu · Zhongjun Yang · Rongjin Li · Zihua Rong · Haoyang He · Zhuodi Hao · Xinyang Hu · Kun Ji · Ziyan Ma · Mengyuan Ji · Jun Zhang · Chenghao Ma · Qianhe Zheng · Yang Liu · Yiling Huang · Xinyi Hu · Qing Huang · Zijian Xie · Shiyao Peng
|
||
FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning
Huan Wang · Haoran Li · Huaming Chen · Jun Yan · Jiahua Shi · Jun Shen
|
||
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
shiduo zhang · Zhe Xu · Peiju Liu · Xiaopeng Yu · Qinghui Gao · Yuan Li · Zhaoye Fei · Zhangyue Yin · Zuxuan Wu · Yu-Gang Jiang · Xipeng Qiu
|
||
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang · Zhuokai Zhao · Zhaorun Chen · Zenghui Ding · Xianjun Yang · Yining Sun
|
||
StreamDiffusion: A Pipeline-level Solution for Real-Time Interactive Generation
Akio Kodaira · Chenfeng Xu · Toshiki Hazama · Takanori Yoshimoto · Kohei Ohno · Shogo Mitsuhori · Soichi Sugano · Hanying Cho · Zhijian Liu · Masayoshi Tomizuka · Kurt Keutzer
|
||
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su · Xinyu Zhan · Hongjie Fang · Han Xue · Hao-Shu Fang · Yong-Lu Li · Cewu Lu · Lixin Yang
|
||
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko · Ji Soo Lee · Minhyuk Choi · Zihang Meng · Hyunwoo Kim
|
||
DAMap: Distance-aware MapNet for High Quality HD Map Construction
JINPENG DONG · Chen Li · Yutong Lin · Jingwen Fu · Sanping Zhou · Nanning Zheng
|
||
COVTrack: Continuous Open-Vocabulary Multi-Object Tracking via Adaptive Multi-Cue Fusion
Zekun Qian · Ruize Han · Zhixiang Wang · Junhui Hou · Wei Feng
|
||
Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
David Pujol-Perich · Sergio Escalera · Albert Clapés
|
||
You Think, You ACT: The New Task of Arbitrary Text to Motion Generation
Runqi Wang · Caoyuan Ma · Guopeng Li · Hanrui Xu · Yuke Li · Zheng Wang
|
||
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen · Bohan Liu · Chenjia Li · Lalithkumar Seenivasan · Mathias Unberath
|
||
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang · Lei Chen
|
||
Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen · Zhaowen Wang · Nanxuan Zhao · Li Zhang · Difan Liu · Jimei Yang · Qifeng Chen
|
||
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla · Junyong Lee · Beixuan Yang · Mahmoud Afifi · Michael Brown
|
||
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu · Hanyang Wang · Yimo Cai · Kaiyan Zhang · Xiaohang Zhan · Yueqi Duan
|
||
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
Pingrui Zhang · Xianqiang Gao · Yuhan Wu · Kehui Liu · Dong Wang · Zhigang Wang · Bin Zhao · Yan Ding · Xuelong Li
|
||
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
Yunheng Li · Yuxuan Li · Quan-Sheng Zeng · Wenhai Wang · Qibin Hou · Ming-Ming Cheng
|
||
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu · Xinlong Ding · Jiawei Li · Jinlong Wang · Yudong Zhang · Rongquan Wang · Huimin Ma · Jiansheng Chen
|
||
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan · Luyao Tang · Chaoqi Chen · Yixin Chen · Yue Huang · Xinghao Ding
|
||
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision Language Model
Yuxuan Luo · Jiaqi Tang · Chenyi Huang · Feiyang Hao · Zhouhui Lian
|
||
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
Rundong Luo · Matthew Wallingford · Ali Farhadi · Noah Snavely · Wei-Chiu Ma
|
||
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Phu Tran Dinh · Hung Dao · Daeyoung Kim
|
||
MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation
Anjun Hu · Richard Tomsett · Valentin Gourmet · Massimo Camplani · Jas Kandola · Hanting Xie
|
||
Magic Insert: Style-Aware Drag-and-Drop
Nataniel Ruiz · Yuanzhen Li · Neal Wadhwa · Yael Pritch · Michael Rubinstein · David Jacobs · Shlomi Fruchter
|
||
Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining
Qi Fan · Kaiqi Liu · Nian Liu · Hisham Cholakkal · Rao Anwer · Wenbin Li · Yang Gao
|
||
Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection
Jin-Hee Lee · Jae-keun Lee · Jeseok Kim · Kwon Soon
|
||
PersonaCraft: Personalized Full-body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion
Gwanghyun Kim · Suh Jeon Jeon · Seunggyu Lee · Se Young Chun
|
||
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou · Xiaoyu Zhang · Yongchuan Tang
|
||
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza · Ayesha Azam · Talha Qaiser · Nasir Rajpoot
|
||
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu · Qiang Lu · Meichen Dong · Jake Luo
|
||
SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
Yijia Hong · Yuan-Chen Guo · Ran Yi · Yulong Chen · Yan-Pei Cao · Lizhuang Ma
|
||
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li · Yihao Liu · Shuo Cao · Chen Ziyan · SHAOBIN ZHUANG · Xiangyu Chen · Yinan He · Yi Wang · Yu Qiao
|
||
FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement
Chenhang Ying · Huiyu Yang · Jieyi Ge · Zhaodong Sun · Xu Cheng · Kui Ren · Xiaobai Li
|
||
Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation
Wenliang Zhong · Rob Barton · Weizhi An · Feng Jiang · Hehuan Ma · Yuzhi Guo · Abhishek Dan · Shioulin Sam · Karim Bouyarmane · Junzhou Huang
|
||
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Tianrui Zhu · Shiyi Zhang · Jiawei Shao · Yansong Tang
|
||
SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang · Yu Liang · Hao Zhang · Jinwei Chen · Wei Dong · Lunde Chen · Wanyu Liu · Bo Li · Peng-Tao Jiang
|
||
Uncertainty-Aware Diffusion-Guided Refinement of 3D Scenes
Sarosij Bose · Arindam Dutta · Sayak Nag · Junge Zhang · Jiachen Li · Konstantinos Karydis · Amit Roy-Chowdhury
|
||
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
Wenjin Mo · Zhiyuan Li · Minghong Fang · Mingwei Fang
|
||
UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions
Yao Siyuan · Rui Zhu · Ziqi Wang · Wenqi Ren · Yanyang Yan · Xiaochun Cao
|
||
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go · Byeongjun Park · Hyelin Nam · Byung-Hoon Kim · Hyungjin Chung · Changick Kim
|
||
Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
Yooshin Cho · Hanbyel Cho · Janghyeon Lee · HyeongGwon Hong · Jaesung Ahn · Junmo Kim
|
||
Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Pulkit Kumar · Shuaiyi Huang · Matthew Walmer · Sai Saketh Rambhatla · Abhinav Shrivastava
|
||
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge · Jiantao Lin · Guibao SHEN · Jiawei Feng · Tao Hu · Xinli Xu · Ying-Cong Chen
|
||
Knowledge-Guided Part Segmentation
Xuejian Gou · Fang Liu · Licheng Jiao · Shuo Li · Lingling Li · Hao Wang · Xu Liu · Puhua Chen · wenping ma
|
||
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel · Artur Jesslen · Jiahao Xie · Christian Theobalt · Christian Rupprecht · Adam Kortylewski
|
||
HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao · Hanzhang Tu · Cheng Peng · Hongwen Zhang · Boyao Zhou · Yebin Liu
|
||
Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Zhenbang Du · Yonggan Fu · Lifu Wang · Jiayi Qian · Xiao Luo · Yingyan Celine Lin
|
||
An Information-Theoretic Regularizer for Lossy Neural Image Compression
ZHANG YINGWEN · Meng Wang · Xihua Sheng · Peilin CHEN · Junru Li · Li Zhang · Shiqi Wang
|
||
Referring Expression Comprehension for Small Objects
Kanoko Goto · Takumi Hirose · Mahiro Ukai · Shuhei Kurita · Nakamasa Inoue
|
||
Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild
Haoran Wang · Zekun Li · Jian Zhang · Lei Qi · Yinghuan Shi
|
||
Prototype Guided Backdoor Defense
Venkat Adithya Amula · Sunayana Samavedam · Saurabh Saini · Avani Gupta · P Narayanan
|
||
Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation
Lin Bie · Siqi Li · Yifan Feng · Yue Gao
|
||
Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
Zhenghong Zhou · Jie An · Jiebo Luo
|
||
PBFG: A New Physically-Based Dataset and Removal of Lens Flares and Glares
Jie Zhu · Sungkil Lee
|
||
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN · Yulong Huang · Hongwei Ren · Zunchang Liu · Hongxiang Huang · Yue Zhou · Haotian FU · Bojun Cheng
|
||
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu · Lorenzo Agnolucci · Daisuke Iso · Dietmar Saupe
|
||
Model Explainability with Localized Soft Completeness
Ziv Haddad Haddad · Oren Barkan · Yehonatan Elisha · Noam Koenigstein
|
||
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen · Xufang Luo · Dongsheng Li
|
||
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
Xiaomeng Chu · Jiajun Deng · Guoliang You · Wei Liu · Xingchen Li · Jianmin Ji · Yanyong Zhang
|
||
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu · Linxuan Li · Kai Wang · Yaxing Wang · jian Yang · Ming-Ming Cheng
|
||
LLM-assisted Entropy-based Adaptive Distillation for Unsupervised Fine-grained Visual Representation Learning
Jianfeng Dong · Danfeng Luo · Daizong Liu · Jie Sun · Xiaoye Qu · Xun Yang · Dongsheng Liu · Xun Wang
|
||
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
JIAHE ZHAO · RuiBing Hou · zejie tian · Hong Chang · Shiguang Shan
|
||
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen · Shell Xu Hu · Wayne Luk · Timothy Hospedales · Hongxiang Fan
|
||
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Junqi Ge · Ziyi Chen · Jintao Lin · Jinguo Zhu · Xihui Liu · Jifeng Dai · Xizhou Zhu
|
||
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu · Kang Liu · Jinshuai Liu · Lu Wang · Lisheng XU · Jun Cheng
|
||
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo · Shengkun Tang · Cong Zeng · Zhiqiang Shen
|
||
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou · Kai-Wen Li · Jingyi Ning · Han-Jia Ye · Lijun Zhang · De-Chuan Zhan
|
||
MultiModal Representation for MultiSensory Video Simulation
Yichen Li · Antonio Torralba
|
||
Purge-Gate: Efficient Backpropagation-Free Test-Time Adaptation for Point Clouds via Token purging
Moslem Yazdanpanah · Ali Bahri · Mehrdad Noori · Sahar Dastani · Gustavo Vargas Hakim · David OSOWIECHI · Ismail Ayed · Christian Desrosiers
|
||
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu · Song-Li Wu · Sule Bai · Jiahao Wang · Yitong Wang · Yansong Tang
|
||
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy
Dongyoung Kim · Mahmoud Afifi · Dongyun Kim · Michael Brown · Seon Joo Kim
|
||
ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning
Yuanlin Wang · Ruiqin Xiong · Rui Zhao · Jin Wang · Xiaopeng Fan · Tiejun Huang
|
||
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Qi Wang · Zhipeng Zhang · Baao Xie · Xin Jin · Yunbo Wang · Shiyu Wang · Liaomo Zheng · Xiaokang Yang · Wenjun Zeng
|
||
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing · Qi Dai · Zejia Weng · Zuxuan Wu · Yu-Gang Jiang
|
||
GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations
Yunqi Liu · Xiaohui Cui · Ouyang Xue
|
||
Your Text Encoder Can Be An Object-Level Watermarking Controller
Naresh Kumar Devulapally · Mingzhen Huang · Vishal Asnani · Shruti Agarwal · Siwei Lyu · Vishnu Lokhande
|
||
Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu · Wenbo Li · Haoze Sun · Fan Li · Zhixin Wang · Long Peng · Jingjing Ren · HAORAN YANG · Xiaowei Hu · Renjing Pei · Pheng-Ann Heng
|
||
Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang · Wenzhao Zheng · Dalong Du · Yunpeng Zhang · Yilong Ren · Han Jiang · Zhiyong Cui · Haiyang Yu · Jie Zhou · Shanghang Zhang
|
||
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He · Sanchit Sinha · Guangzhi Xiong · Aidong Zhang
|
||
DiffPCI: Large Motion Point Cloud frame Interpolation with Diffusion Model
tianyu zhang · Haobo Jiang · jian Yang · Jin Xie
|
||
MSA$^2$: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li · Hongjian Zhan · Qi Liu · Li Sun · Yu-Jie Xiong · Yue Lu
|
||
POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction
Songyan Zhang · Yongtao Ge · Jinyuan Tian · Guangkai Xu · Hao Chen · Chen Lv · Chunhua Shen
|
||
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya Petrov · Riccardo Marin · Julian Chibane · Gerard Pons-Moll
|
||
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
Wenzhuang Wang · Yifan Zhao · Mingcan Ma · Ming Liu · Zhonglin Jiang · Yong Chen · Jia Li
|
||
Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue
Guohao Sun · Can Qin · Yihao Feng · Zeyuan Chen · Ran Xu · Sohail Dianat · MAJID RABBANI · Raghuveer Rao · Zhiqiang Tao
|
||
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen · Xuewei Chen · Xiao Guo · Yingwei Li · Longlong Jing · Liang Yang · Bing Li
|
||
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
Gen Li · Yang Xiao · Jie Ji · Kaiyuan Deng · Bo Hui · Linke Guo · Xiaolong Ma
|
||
TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration
Xiaomeng Fu · Jia Li
|
||
PlaneRAS: Learning Planar Primitives for 3D Plane Recovery
Fang Zhang · Wenzhao Zheng · Linqing Zhao · Zelan Zhu · Jiwen Lu · Xiuzhuang Zhou
|
||
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao · Hongcan Guo · Jiawen Qian · Guoshun Nan · Chao Wang · Yuqi Pan · Tianhao Hou · Xiaojuan Wang · Yutong Gao
|
||
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao · Bin Zhu · Jingjing Chen · Chong-Wah Ngo · Yu-Gang Jiang
|
||
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang · Junlin Xie · Wei Zhang · Weikai Chen · Xiao Tan · Xiang Wan · Guanbin Li
|
||
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang · Minjae Seong · Jisong Kim · Geunju Baek · DayeOh DayeOh · Junhyung Kim · Junho Koh · Jun Won Choi
|
||
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh · Mohammadmehdi Ataei · Hyunmin Cheong · Amir Khasahmadi · Rahul Krishnan
|
||
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Han Ji · Yuqi Feng · Jiahao Fan · Yanan Sun
|
||
Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation
Jie Liu · Jiayi Shen · Pan Zhou · Jan-Jakob Sonke · Stratis Gavves
|
||
MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy
Wuyang Li · Wentao Pan · Xiaoyuan Liu · Zhendong Luo · Chenxin Li · Hengyu Liu · Din Tsai · Mu Chen · Yixuan Yuan
|
||
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia · Bingkui Tong · Yuhang Zang · Rui Shao · Kaiyang Zhou
|
||
Dataset Ownership Verification for Pre-trained Masked Models
Yuechen Xie · Jie Song · Yicheng Shan · Xiaoyan Zhang · Yuanyu Wan · Shengxuming Zhang · Jiarui Duan · Mingli Song
|
||
SA-MAE: A Sensor-Agnostic Masked Autoencoder for Remote Sensing Image Representation Learning
Gencer Sumbul · Chang Xu · Emanuele Dalsasso · Devis Tuia
|
||
Bridging the Sky and Ground: Towards View-Invariant Feature Learning for Aerial-Ground Person Re-Identification
Wajahat Khalid · Bin Liu · Xulin Li · MUHAMMAD WAQAS · MUHAMMAD AFGAN
|
||
Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
lee hyuck · Taemin Park · Heeyoung Kim
|
||
CAT: A Unified Click-and-Track Framework for Realistic Tracking
Yongsheng Yuan · Jie Zhao · Dong Wang · Huchuan Lu
|
||
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Hyeonho Jeong · Suhyeon Lee · Jong Ye
|
||
RoboPearls: Editable Video Simulation for Robot Manipulation
Tao Tang · Likui Zhang · Youpeng Wen · Kaidong Zhang · Jia-Wang Bian · xia zhou · Tianyi Yan · Kun Zhan · Peng Jia · Hefeng Wu · Liang Lin · Xiaodan Liang
|
||
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation
Yujie Zhang · Bingyang Cui · Qi Yang · Zhu Li · Yiling Xu
|
||
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng · Zeren Sun · Tianfei Zhou · Xiangbo Shu · Jinshan Pan · Yazhou Yao
|
||
CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks
Zhixiang Guo · Siyuan Liang · Aishan Liu · Dacheng Tao
|
||
Multi-view Gaze Target Estimation
Qiaomu Miao · Vivek Golani · Jingyi Xu · Progga Paromita Dutta · Minh Hoai · Dimitris Samaras
|
||
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen · Tai Wang · Quanyi Li · Tao Huang · Jiangmiao Pang · Tianfan Xue
|
||
Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
Lilika Makabe · Hiroaki Santo · Fumio Okura · Michael Brown · Yasuyuki Matsushita
|
||
Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
Hang Phung · Manh Nguyen · Thanh Huynh · Quoc Viet Hung Nguyen · Trong Nghia Hoang · Phi Le Nguyen
|
||
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang · Zhe Wang · Qin Zhou · Mengping Yang
|
||
Progressive Test Time Energy Adaptation for Medical Image Segmentation
Xiaoran Zhang · Byung-Woo Hong · Hyoungseob Park · Daniel Pak · Anne-Marie Rickmann · Lawrence Staib · James Duncan · Alex Wong
|
||
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang · Yuhui Li · Shengfeng He · Xinzhe Li · Yangyang Xu · Junyu Dong · Yong Du
|
||
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Hoellein · Aljaz Bozic · Michael Zollhöfer · Matthias Nießner
|
||
SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity
Valter Piedade · Chitturi Sidhartha · José Gaspar · Venu Madhav Govindu · Pedro Miraldo
|
||
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu · Wenhang Ge · Jiantao Lin · Jiawei Feng · Lie XU · hanfeng Zhao · Shunsi Zhang · Ying-Cong Chen
|
||
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
Dayong Su · Yafei Zhang · Huafeng Li · Jinxing Li · Yu Liu
|
||
CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu · Shuchao Pang · Xu Zheng · Xiang GU · Anan Du · Yunhuai Liu · Yongbin Zhou
|
||
Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory
Daixun Li · Yusi Zhang · Mingxiang Cao · donglai Liu · Weiying Xie · Tianlin Hui · Lunkai Lin · Zhiqiang Xie · Yunsong Li
|
||
AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery
Xinzi Cao · Ke Chen · Feidiao Yang · Xiawu Zheng · Yutong Lu · Yonghong Tian
|
||
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim · Seokho Ahn · Young-Duk Seo
|
||
DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effecitve Cross-Domain Learning
Ziqi Gao · Qiufu Li · Linlin Shen
|
||
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
Chengxu Liu · Lu Qi · Jinshan Pan · Xueming Qian · Ming-Hsuan Yang
|
||
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim · Hamid Suleman · Olga Zatsarynna · Muzammal Naseer · Juergen Gall
|
||
LoRD-HOI: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei · Bo Wang · Robby Tan
|
||
Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering
Siddharth Tourani · Jayarami Gurram · Akash Kumbar · Satyajit Tourani · Nishant Goyal · Madhava Krishna · Dinesh Reddy Narapureddy · Muhammad Haris Khan
|
||
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu · Huiyu Duan · Qiang Hu · Liu Yang · Chunlei Cai · Tianxiao Ye · Huayu Liu · Xiaoyun Zhang · Guangtao Zhai
|
||
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal · Dmitrii Petrov · Sheldon Andrews · Yizhak Ben-Shabat · Hsueh-Ti Derek Liu · Evangelos Kalogerakis
|
||
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng · Hongzhi Wang
|
||
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu · Na Zhao · Gang Niu · Masashi Sugiyama · Xiaofeng Zhu
|
||
AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving
Jiawei Xu · Kai Deng · Zexin Fan · Shenlong Wang · Jin Xie · jian Yang
|
||
CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning
Marco P. Apolinario · Sakshi Choudhary · Kaushik Roy
|
||
Dynamic Multimodal Prototype Learning in Vision-Language Models
Xingyu Zhu · Shuo Wang · Beier Zhu · Miaoge Li · Yunfan Li · Junfeng Fang · Zhicai Wang · Dongsheng Wang · Hanwang Zhang
|
||
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai · Yifan Zhang · Yutao Qin · Qiangya Guo · Shuangping Huang · Shuicheng YAN
|
||
Neural Compression for 3D Geometry Sets
Siyu Ren · Junhui Hou · Weiyao Lin · Wenping Wang
|
||
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti · Massimiliano Mancini · Enrico Fini · Yiming Wang · Paolo Rota · Elisa Ricci
|
||
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang · Tianhong Dai · Bingfeng Zhang · Siyue Yu · ENG LIM · Jimin XIAO
|
||
GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors
Kang DU · Zhihao Liang · Yulin Shen · Zeyu Wang
|
||
Contrastive Flow Matching
George Stoica · Vivek Ramanujan · Xiang Fan · Ali Farhadi · Ranjay Krishna · Judy Hoffman
|
||
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng · Lu Qi · Xi Chen · Yi Wang · Kun Wang · Hengshuang Zhao
|
||
OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving
Ji mingqian · Jian Yang · Shanshan Zhang
|
||
I2VControl: Disentangled and Unified Video Motion Synthesis Control
Wanquan Feng · Tianhao Qi · Jiawei Liu · Mingzhen Sun · Pengqi Tu · Tianxiang Ma · Fei Dai · Songtao Zhao · SiYu Zhou · Qian HE
|
||
HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Jiaxin Lu · Chun-Hao Huang · Uttaran Bhattacharya · Qixing Huang · Yi Zhou
|
||
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu · Yunfan Ye · Fan Zhang · Qingyang Zhou · Yuchuan Luo · Zhiping Cai
|
||
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Xingjian Leng · Jaskirat Singh · Yunzhong Hou · Zhenchang Xing · Saining Xie · Liang Zheng
|
||
DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Rui Yu · Xianghang Zhang · Runkai Zhao · Huaicheng Yan · Meng Wang
|
||
Learning Normal Flow Directly From Events
Dehao Yuan · Levi Burner · Jiayi Wu · Minghui Liu · Jingxi Chen · Yiannis Aloimonos · Cornelia Fermuller
|
||
Balanced Sharpness-Aware Minimization for Imbalanced Regression
Yahao Liu · Qin Wang · Lixin Duan · Wen Li
|
||
Recover Biological Structure from Sparse-View Diffraction Images with Neural Volumetric Prior
Renzhi He · Haowen Zhou · Yubei Chen · Yi Xue
|
||
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang · Shuaihang Yuan · Geeta Chandra Raju Bethala · Congcong Wen · Anthony Tzes · Yi Fang
|
||
DONUT: A Decoder-Only Model for Trajectory Prediction
Markus Knoche · Daan de Geus · Bastian Leibe
|
||
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
Andrew Bond · Jui-Hsien Wang · Long Mai · Erkut Erdem · Aykut Erdem
|
||
4D Gaussian Splatting SLAM
Yanyan Li · Youxu Fang · Zunjie Zhu · Kunyi Li · Yong Ding · Federico Tombari
|
||
SL$^{2}$A-INR: Single-Layer Learnable Activation for Implicit Neural Representation
Reza Rezaeian · Moein Heidari · Reza Azad · Dorit Merhof · Hamid Soltanian-Zadeh · Ilker Hacihaliloglu
|
||
DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models
SeungHoo Hong · GeonHo Son · Juhun Lee · Simon Woo
|
||
Video Individual Counting for Moving Drones
Yaowu Fan · Jia Wan · Tao Han · Antoni Chan · Jinhua Ma
|
||
Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency
Yuxin CHENG · Binxiao Huang · Taiqiang Wu · Wenyong Zhou · Chenchen Ding · Zhengwu Liu · Graziano Chesi · Ngai Wong
|
||
Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal
Jhon Jhon
|
||
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
huilin xu · Jian Ding · Jiakun Xu · Ruixiang Wang · Jun Chen · Jinjie Mai · Yanwei Fu · Bernard Ghanem · Feng Xu · Mohamed Elhoseiny
|
||
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez · Lorenzo Vaquero · Mingxuan Liu · Feng Xue · Daniel Cores · Nicu Sebe · Manuel Mucientes · Elisa Ricci
|
||
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
Rui Tian · Qi Dai · Jianmin Bao · Kai Qiu · Yifan Yang · Chong Luo · Zuxuan Wu · Yu-Gang Jiang
|
||
Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning
Yue Duan · Taicai Chen · Lei Qi · Yinghuan Shi
|
||
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger · Florin-Alexandru Vasluianu · Marcos Conde · Zongwei Wu · Radu Timofte
|
||
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho · Yan-Ying Chen · Matthew Klenk · David I. Inouye · Yanxia Zhang
|
||
Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding
Jiaxuan Chen · Yu Qi · Yueming Wang · Gang Pan
|
||
Event-guided HDR Reconstruction with Diffusion Priors
Yixin Yang · jiawei zhang · Yang Zhang · Yunxuan Wei · Dongqing Zou · Jimmy Ren · Boxin Shi
|
||
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts · Kai Han · Samuel Albanie
|
||
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar · Sachidanand VS · Venkatesh Babu Radhakrishnan
|
||
RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration
Chong Cheng · Yu Hu · Sicheng Yu · Beizhen ZHAO · Zijian Wang · Hao Wang
|
||
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar · Dmitry Demidov · Ritesh Thawkar · Rao Anwer · Mubarak Shah · Fahad Khan · Salman Khan
|
||
Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics
Muleilan Pei · Shaoshuai Shi · Xuesong Chen · Xu Liu · Shaojie Shen
|
||
SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
Zhentao Tan · Ben Xue · Jian Jia · Junhao Wang · Wencai Ye · Shaoyun Shi · Sun Mingjie · Wenjin Wu · Quan Chen · Peng Jiang
|
||
Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion
Junru Lin · Chirag Vashist · Mikaela Uy · Colton Stearns · Xuan Luo · Leonidas Guibas · Ke Li
|
||
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim · Jaewoo Ahn · Gunhee Kim
|
||
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu · Xiaoqin Zhang · Shijian Lu
|
||
Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang · Chao Wen · Haoyu Guo · Sida Peng · Minghan Qin · Hujun Bao · Ruizhen Hu · Xiaowei Zhou
|
||
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen · Yi-Chen Yao · Kuan-Wei Ho · Chun-Hung Wu · Huu-Tai Phung · Martin Benjak · Jörn Ostermann · Wen-Hsiao Peng
|
||
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu · Chen Chen
|
||
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images
Jiamin WU · Kenkun Liu · Xiaoke Jiang · Yuan Yao · Lei Zhang
|
||
An Inversion-based Measure of Memorization for Diffusion Models
Zhe Ma · Qingming Li · Xuhong Zhang · Tianyu Du · Ruixiao Lin · Zonghui Wang · Shouling Ji · Wenzhi CHEN
|
||
SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions
Mengwei Xie · Shuang Zeng · Xinyuan Chang · Xinran Liu · Zheng Pan · Mu Xu · Xing Wei
|
||
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu · Helmut Grabner
|
||
EVT: Efficient View Transformation for Multi-Modal 3D Object Detection
Yongjin Lee · Hyeon-Mun Jeong · Yurim Jeon · Sanghyun Kim
|
||
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu · Xiuxiu Bai · Xiaojun Jia · Jianwu Fang · Shanmin Pang
|
||
UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
Rui Chen · Zehuan Wu · Yichen Liu · Yuxin Guo · Jingcheng Ni · Haifeng Xia · Siyu Xia
|
||
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimzation
SUBRAT KISHORE DUTTA · Xiao Zhang
|
||
MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation
Vladislav Bargatin · Egor Chistov · Alexander Yakovenko · Dmitriy Vatolin
|
||
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh · Asako Kanezaki
|
||
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Hui Zhang · Dexiang Hong · Yitong Wang · Jie Shao · Xinglong Wu · Zuxuan Wu · Yu-Gang Jiang
|
||
Robust Unfolding Network for HDR imaging with Modulo Cameras
Zhile Chen · Hui Ji
|
||
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva · Yang Miao · Jan-Nico Zaech · Xi Wang · Luc Gool · Danda Pani Paudel
|
||
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li · Dezhi Li · Cheng Lin · Wei Li · Wei Xue · Sirui Han · Yike Guo
|
||
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito · Donghyun Kim · Kwanyong Park · Atsushi Hashimoto · Yoshitaka Ushiku
|
||
CityGS-X : A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao · Hao Li · Jiaqi Chen · Zhihang Zhong · Zhengyu Zou · Dingwen Zhang · Xiao Sun · Junwei Han
|
||
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
hao si · Ehsan Javanmardi · Manabu Tsukada
|
||
Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
Chunxiao Li · Xiaoxiao Wang · Meiling Li · Boming Miao · Peng Sun · Yunjian Zhang · Xiangyang Ji · Yao Zhu
|
||
Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation
I-Hsiang Chen · Hua-En Chang · Wei-Ting Chen · Jenq-Newng Hwang · Sy-Yen Kuo
|
||
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang · Kejiang Chen · Zehua Ma · Jiajun Deng · Yicong Li · Weiming Zhang · Ee-Chien Chang
|
||
DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering
Jie Chen · Zhangchi Hu · Peixi Wu · Huyue Zhu · Hebei Li · Xiaoyan Sun
|
||
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu · Shiwei Zhang · Yujie Wei · Ruihang Chu · Hangjie Yuan · Xiang Wang · Yingya Zhang · Ziwei Liu
|
||
ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Yanzhe Lyu · Kai Cheng · Kang Xin · Xuejin Chen
|
||
GAS: Generative Avatar Synthesis from a Single Image
Yixing Lu · Junting Dong · YoungJoong Kwon · Qin Zhao · Bo Dai · Fernando De la Torre
|
||
Rectifying Magnitude Neglect in Linear Attention
Qihang Fan · Huaibo Huang · Yuang Ai · Ran He
|
||
Global and Local Entailment Learning for Natural World Imagery
Srikumar Sastry · Aayush Dhakal · Eric Xing · Subash Khanal · Nathan Jacobs
|
||
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau · Valerio Marsocci · Dawa Derksen
|
||
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao · Zhengyuan Yang · Linjie Li · Dianqi Li · Kevin Lin · Yu Cheng · Lijuan Wang
|
||
$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
Zhimin Liao · Ping Wei · Ruijie Zhang · Shuaijia Chen · Haoxuan Wang · Ziyang Ren
|
||
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
Qidong Huang · Xiaoyi Dong · Pan Zhang · Yuhang Zang · Yuhang Cao · Jiaqi Wang · Weiming Zhang · Nenghai Yu
|
||
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying · Henghui Ding · Guangquan Jie · Yu-Gang Jiang
|
||
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Ke Niu · Haiyang Yu · Mengyang Zhao · Teng Fu · Siyang Yi · Wei Lu · Bin Li · Xuelin Qian · Xiangyang Xue
|
||
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Junjie He · Yifeng Geng · Liefeng Bo
|
||
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang · Bingtao Fu · Qingnan Fan · Qi Zhang · Runxing Liu · Hong Gu · Huaqi Zhang · Xinguo Liu
|
||
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni · Soroush Mehraban · Michael Brudno · Babak Taati
|
||
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG · Jicheng Zhou · Haiwei Wu · Yain-Whar Si · Jiantao Zhou
|
||
Semi-supervised Deep Transfer for Regression without Domain Alignment
Mainak Biswas · Ambedkar Dukkipati · Devarajan Sridharan
|
||
SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
David Stotko · Reinhard Klein
|
||
Benefit From Seen: Enhancing Open-Vocabulary Object Detection by Bridging Visual and Textual Co-Occurrence Knowledge
Yanqi Li · Jianwei Niu · Tao Ren
|
||
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF · Enzo Tartaglione · Stéphane Lathuilière · Joost van de Weijer
|
||
Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training
Qiaosi Yi · Shuai Li · Rongyuan Wu · Lingchen Sun · Yuhui WU · Lei Zhang
|
||
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu · Shancheng Fang · Yuxin Wang · Xiaorui Wang · Zhineng Chen · Hongtao Xie · Yongdong Zhang
|
||
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu · Kun yuan · Yaling Shen · feilong tang · Xiaohao Xu · Lin Zhou · Wei Li · Ying Chen · Zhongxing Xu · Zelin Peng · Siyuan Yan · Vinkle Srivastav · Diping Song · Tianbin Li · Danli Shi · Jin Ye · Nicolas Padoy · Nassir Navab · Junjun He · Zongyuan Ge
|
||
ArgMatch: Adaptive Refinement Gathering for Efficient Dense Matching
Yuxin Deng · Kaining Zhang · Linfeng Tang · Jiaqi Yang · Jiayi Ma
|
||
Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning
Tan Pan · Zhaorui Tan · Kaiyu Guo · Dongli Xu · Weidi Xu · Chen Jiang · Xin Guo · Yuan Qi · Yuan Cheng
|
||
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao · Yuwei Niu · Fanqing Meng · Hao Li · Changyao Tian · Yinuo Du · Yuwen Xiong · Dianqi Li · Xizhou Zhu · Li Yuan · Jifeng Dai · Yu Cheng
|
||
Open-Vocabulary Octree-Graph for 3D Scene Understanding
Zhigang Wang · Yifei Su · Chenhui Li · Dong Wang · Yan Huang · Xuelong Li · Bin Zhao
|
||
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao · Xiaotong Li · Yufeng Cui · Yueze Wang · Haoge Deng · Ting Pan · Wenxuan Wang · Huchuan Lu · Xinlong Wang
|
||
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang · Peiyin Zhu · Chengwei Zhang · Zhiyuan Yan · Jikang Cheng · Mingrui Lao · Siqi Cai · Yanming Guo
|
||
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi · Hung Nguyen · Rosanne Liu · Long Mai · Anh Nguyen
|
||
SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images
Yichi Zhang · Le Xue · Wenbo zhang · Lanlan Li · Yuchen Liu · Chen Jiang · Yuan Cheng · Yuan Qi
|
||
Make Your Training Flexible: Towards Deployment-Efficient Video Models
Chenting Wang · Kunchang Li · Tianxiang Jiang · Xiangyu Zeng · Yi Wang · Limin Wang
|
||
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
Xi Fang · Jiankun Wang · Xiaochen Cai · Shang Chien · Shuwen Yang · Haoyi Tao · Nan wang · Lin Yao · Linfeng Zhang · Guolin Ke
|
||
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Fangfu Liu · Hao Li · Jiawei Chi · Hanyang Wang · Minghui Yang · Fudong Wang · Yueqi Duan
|
||
Identity-aware Language Gaussian Splatting for Open-vocabulary 3D Semantic Segmentation
SungMin Jang · Wonjun Kim
|
||
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai · Haitian Li · Shangchen Zhou · Chen Change Loy
|
||
Teeth Reconstruction and Performance Capture Using a Phone Camera
Weixi Zheng · Jingwang Ling · Zhibo Wang · Quan Wang · Feng Xu
|
||
NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration
Haotian Dong · Xin WANG · Di Lin · Yipeng Wu · Qin Chen · Ruonan Liu · Kairui Yang · Ping Li · Qing Guo
|
||
Region-Level Data Attribution for Text-to-Image Generative Models
Trong Bang Nguyen · Phi Le Nguyen · Simon Lucey · Minh Hoai
|
||
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou · Liao Shen · Tianqi Liu · Jiaqi Li · Zihao Huang · Huiqiang Sun · Zhiguo Cao
|
||
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
Yuntao Shou · Xiangyong Cao · PeiqiangYan PeiqiangYan · Qiaohui Qiaohui · Qian Zhao · Deyu Meng
|
||
Underwater Visual SLAM with Depth Uncertainty and Medium Modeling
Rui Liu · Sheng Fan · Wenguan Wang · Yi Yang
|
||
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction
Danhui Chen · Ziquan Liu · Chuxi Yang · Dan Wang · Yan Yan · Yi Xu · Xiangyang Ji
|
||
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Jeongho Kim · Hoiyeong Jin · Sunghyun Park · Jaegul Choo
|
||
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel · Hiroki Nakamura · Yuta Kyuragi · Kazuki Kozuka · Juan Carlos Niebles · Ehsan Adeli
|
||
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View
Zitong Zhang · Suranjan Gautam · Rui Yu
|
||
GloPER: Unsupervised Animal Pattern Extraction from Local Reconstruction
Bowen Chen · Yun Sing Koh · Gillian Dobbie
|
||
Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On
Delong Zhang · Qiwei Huang · Yang Sun · Yuanliu Liu · Wei-Shi Zheng · Pengfei Xiong · Wei Zhang
|
||
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo · Nian Liu · Xuguang Yang · Salman Khan · Rao Anwer · Hisham Cholakkal · Fahad Khan · Junwei Han
|
||
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy · Shankar Gangisetty · Shyam Nandan Rai · Carlo Masone · C.V. Jawahar
|
||
Adversarial Exploitation of Data Diversity Improves Visual Localization
Sihang Li · Siqi Tan · Bowen Chang · Jing Zhang · Chen Feng · Yiming Li
|
||
NeuralSVG: An Implicit Representation for Text-to-Vector Generation
Sagi Polaczek · Yuval Alaluf · Elad Richardson · Yael Vinker · Daniel Cohen-Or
|
||
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections
Youwei Zhou · Tianyang Xu · Cong Wu · Xiaojun Wu · Josef Kittler
|
||
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
Chengxu Liu · Lu Qi · Jinshan Pan · Xueming Qian · Ming-Hsuan Yang
|
||
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu · Guolei Sun · Cheng Wang · Yixuan Yuan · Ender Konukoglu
|
||
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU · Yinghao Chen · Renshou Wu · Haohao Gao · Xi Chen · Xue Yang · Xiangyu Zhao · Aojun Zhou · Fangyuan Li · Yafei Wen · Xiaoxin Chen · shuai ren · Hongsheng Li
|
||
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Xiao Fang · Minhyek Jeon · Zheyang Qin · Stanislav Panev · Celso de Melo · Shuowen Hu · Shayok Chakraborty · Fernando De la Torre
|
||
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
Zhixi Cai · Fucai Ke · Simindokht Jahangard · Maria Banda · Gholamreza Haffari · Peter Stuckey · Hamid Rezatofighi
|
||
Learning 4D Embodied World Models
Haoyu Zhen · Qiao Sun · Hongxin Zhang · Junyan Li · Siyuan Zhou · Yilun Du · Chuang Gan
|
||
Real3D: Towards Scaling Large Reconstruction Models with Real Images
Hanwen Jiang · Qixing Huang · Georgios Pavlakos
|
||
WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation
Jiajia Li · Huisi Wu · Jing Qin
|
||
MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration
George Ciubotariu · Zhuyun Zhou · Zongwei Wu · Radu Timofte
|
||
CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
Abhinav Kumar · Yuliang Guo · Zhihao Zhang · Xinyu Huang · Liu Ren · Xiaoming Liu
|
||
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
Simon Boeder · Fabian Gigengack · Benjamin Risse
|
||
Sparfels: Fast Reconstruction from Sparse Unposed Imagery
Shubhendu Jena · Amine Ouasfi · Mae Younes · Adnane Boukhayma
|
||
Differentially Private Fine-Tuning of Diffusion Models
Yu-Lin Tsai · Yizhe Li · Zekai Chen · Po-Yu Chen · Francois Buet-Golfouse · Chia-Mu Yu · Xuebin Ren
|
||
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai · Kyle Genova · Songyou Peng · Leonidas Guibas · Thomas Funkhouser
|
||
Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction
Tuo Feng · Wenguan Wang · Yi Yang
|
||
Rethinking the Upsampling Process in Light Field Super-Resolution with Spatial-Epipolar Implicit Image Function
Ruixuan Cong · Yu Wang · Mingyuan Zhao · Da Yang · Rongshan Chen · Hao Sheng
|
||
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng · Jiaye Qian · Jiajin Tang · Sibei Yang
|
||
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu · Peng Zhang · Shiwei Zhang · Wei Ke
|
||
Discontinuity-aware Normal Integration for Generic Central Camera Models
Francesco Milano · Manuel Lopez-Antequera · Naina Dhingra · Roland Siegwart · Robert Thiel
|
||
Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging
Ruangrawee Kitichotkul · Shashwath Bharadwaj · Joshua Rapp · Yanting Ma · Alexander Mehta · Vivek Goyal
|
||
Synchronization of Multiple Videos in-the-wild
Avihai Naaman · Ron Shapira Weber · Oren Freifeld
|
||
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang · Jun Tang · Zhaohai Li · Pengfei Wang · Jianqiang Wan · Humen Zhong · Xuejing Liu · Mingkun Yang · Peng Wang · Shuai Bai · Lianwen Jin · Junyang Lin
|
||
A Tiny Change, A Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment
xinyi lai · Luojun Lin · Weijie Chen · yuanlong yu
|
||
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim · Ju He · Qihang Yu · Chenglin Yang · Xiaohui Shen · Suha Kwak · Liang-Chieh (Jay) Chen
|
||
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
KUO WANG · Quanlong Zheng · Junlin Xie · Yanhao Zhang · Jinguo Luo · Haonan Lu · Liang Lin · Fan Zhou · Guanbin Li
|
||
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
Xiangyang Luo · Ye Zhu · Yunfei Liu · Lijian Lin · Cong Wan · Zijian Cai · Yu Li · Shao-Lun Huang
|
||
Mitigating Geometric Degradation in Fast DownSampling via FastAdapter for Point Cloud Segmentation
Shuofeng Sun · Haibin Yan
|
||
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama · Chihiro Nakatani · Norimichi Ukita
|
||
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu · Jinghe Wang · Yuan Meng · Yanning Zhang · Le Sun · Zhi Wang
|
||
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar · Jiteng Mu · Nuno Vasconcelos
|
||
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Jiazheng Liu · Zejin Wang · Bohao Chen · Hua Han
|
||
GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar
SeungJun Moon · Hah Min Lew · Seungeun Lee · Ji-Su Kang · Gyeong-Moon Park
|
||
E-NeMF: Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes
Yan Liu · Zehao Chen · Haojie Yan · De Ma · Huajin Tang · Qian Zheng · Gang Pan
|
||
Skeleton Motion Words for Unsupervised Skeleton-based Temporal Action Segmentation
Uzay Hüsnü Gökay · Federico Spurio · Dominik Bach · Juergen Gall
|
||
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim · Hyeongwoo Jeon · Jongseong Bae · Ha Young Kim
|
||
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng · Guojin Zhong · Jintao Cheng · Jin Yuan · Zhiyong Li
|
||
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
Shuhang Chen · Hangjie Yuan · Pengwei Liu · Hanxue Gu · Tao Feng · Dong Ni
|
||
FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models
Jiaqi Wu · Simin Chen · Jing Tang · Yuzhe YANG · Yiming Chen · Lixu Wang · Song Lin · Zehua Wang · Wei Chen · Zijian Tian
|
||
ClaraVid: A Holistic Scene Reconstruction Benchmark from Aerial Perspective with Delentropy-Based Complexity Profiling
Radu Beche · Sergiu Nedevschi
|
||
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
AO LI · Jinpeng Liu · Yixuan Zhu · Yansong Tang
|
||
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen · Yuqi Wang · Zhaoxiang Zhang
|
||
DiffRefine: Diffusion-based Proposal Specific Densification for Point Cloud Object Detection
Sangyun Shin · Yuhang He · Xinyu Hou · Samuel Hodgson · Andrew Markham · Niki Trigoni
|
||
A Quality-Guided Mixture of Score-fusion Experts Framework for Human Recognition
Jie Zhu · Yiyang Su · Minchul Kim · Anil Jain · Xiaoming Liu
|
||
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim · Jinmyeong Kim · Yoonji Kim · Sung-Bae Cho
|
||
Information Density Principle for MLLM Benchmarks
Chunyi Li · Xiaozhe Li · Zicheng Zhang · Yuan Tian · Ziheng Jia · Xiaohong Liu · Xiongkuo Min · Jia Wang · Haodong Duan · Kai Chen · Guangtao Zhai
|
||
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić · Georgios Paschalidis · Shashank Tripathi · Theo Gevers · Sai Kumar Dwivedi · Dimitrios Tzionas
|
||
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen · Pi Bu · Yingyao Wang · Xinyi Wang · Ziming Wang · Jie Guo · Yingxiu Zhao · Qi Zhu · Jun Song · Siran Yang · Jiamang Wang · Bo Zheng
|
||
GT-Mean Loss: A Simple Yet Effective Solution for Brightness Mismatch in Low-Light Image Enhancement
Jingxi Liao · Shijie Hao · Richang Hong · Meng Wang
|
||
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang · Zhizhou Sha · Zhenmei Shi · Zhao Song · Mingda Wan · Yufa Zhou
|
||
$\mathcal{D}$-Attn: Decomposed Attention for Large Vision-and-Language Model
Chia-Wen Kuo · Sijie Zhu · Fan Chen · Xiaohui Shen · Longyin Wen
|
||
You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data
Shanshan Yan · Zexi Li · Chao Wu · Meng Pang · Yang Lu · Yan Yan · Hanzi Wang
|
||
ProbMed: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao · Sangwook Kim · Jianzhong You · Chris Mcintosh
|
||
Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product
Paul Albert · Frederic Zhang · Hemanth Saratchandran · Anton Hengel · Ehsan Abbasnejad
|
||
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
Hang Xu · Jie Huang · Linjiang Huang · Dong Li · Yidi Liu · Feng Zhao
|
||
Neuromanifold-Regularized KANs for Shape-fair Feature Representations
Mazlum Arslan · Weihong Guo · Shuo Li
|
||
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan · Meijun Sun · Ge-Peng Ji · Fahad Khan · Salman Khan · Deng-Ping Fan
|
||
Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal
wanchang Yu · Qing Zhang · Rongjia Zheng · Wei-Shi Zheng
|
||
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fating Hong · Zunnan Xu · Zixiang Zhou · Jun Zhou · Xiu Li · Qin Lin · Qinglin Lu · Dan Xu
|
||
Video Color Grading via Look-Up Table Generation
Seunghyun Shin · Dongmin Shin · Jisu Shin · Hae-Gon Jeon · Joon-Young Lee
|
||
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li · Yifan Lu · Linfeng Tang · Shihua Zhang · Jiayi Ma
|
||
Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation
Joëlle Hanna · Damian Borth
|
||
HDR Image Generation via Gain Map Decomposed Diffusion
Yuanshen Guan · Ruikang Xu · Yinuo Liao · Mingde Yao · Lizhi Wang · Zhiwei Xiong
|
||
Identity Preserving 3D Head Stylization with Multiview Score Distillation
Bahri Batuhan Bilecen · Ahmet Berke Gokmen · Furkan Güzelant · Aysegul Dundar
|
||
One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution
Xinyu Mao · Xiaohan Xing · Fei MENG · Jianbang LIU · Fan BAI · Qiang Nie · Max Meng
|
||
MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin · Rubanova Alexandrovna · Aleksandr Utkov · Dmitry Nikolotov · Ilya Makarov
|
||
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu · Yu-an Tan · Ruinan Ma · Wencong Ma · Dehua Zhu · Yuanzhang Li
|
||
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang · Long Mai · Aniruddha Mahapatra · David Bourgin · Yicong Hong · Jonah Casebeer · Feng Liu · Yun Fu
|
||
SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang · Chang Che · Qi Wang · Yangyang Li · Zenglin Shi · Meng Wang
|
||
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran · Xingyu Ren · Xiang An · Kaicheng Yang · Ziyong Feng · Jing Yang · Rolandos Alexandros Potamias · Linchao Zhu · Jiankang Deng
|
||
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan · Jiyuan Liu · Qian Qu · Suyuan Liu · Chuyu Zhang · Fangdi Wang · Xinwang Liu · En Zhu · Kunlun He
|
||
Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse · Pushpak Pati · Srikar Yellapragada · Srijan Das · Rajarsi Gupta · Joel Saltz · Dimitris Samaras · Prateek Prasanna
|
||
Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection
Yingsong Huang · Hui Guo · Jing Huang · Bing Bai · Qi Xiong
|
||
Towards a Universal Image Degradation Model via Content-Degradation Disentanglement
Wenbo Yang · Zhongling Wang · Zhou Wang
|
||
Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation
Xu Chen · Yang Li · Yahong Han · Guangquan Xu · Jialie Shen
|
||
BVINet: Unlocking Blind Video Inpainting with Zero Annotations
zhiliang wu · Kerui Chen · Kun Li · Hehe Fan · Yi Yang
|
||
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li · Yifan Jiao · Dan Meng · Heng Fan · Libo Zhang
|
||
Scalable Image Tokenization with Index Backpropagation Quantization
Fengyuan Shi · Zhuoyan Luo · Yixiao Ge · Yujiu Yang · Ying Shan · Limin Wang
|
||
SMGDiff: Soccer Motion Generation using diffusion probabilistic models
Hongdi Yang · Chengyang Li · Zhenxuan Wu · Gaozheng Li · Jingya Wang · Jingyi Yu · Zhuo Su · Lan Xu
|
||
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi · Zhipeng Bao · Yu-Xiong Wang · Pavel Tokmakov · Martial Hebert
|
||
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting
Zhaojie Zeng · Yuesong Wang · Chao Yang · Tao Guan · Lili Ju
|
||
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li · Haoxuan Qu · Jason Kuen · Jiuxiang Gu · Qiuhong Ke · Jun Liu · Hossein Rahmani
|
||
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Yuxiao Wang · Yu Lei · Zhenao WEI · WeiYing Xue · Xinyu Jiang · Nan Zhuang · Qi Liu
|
||
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang · Arjun Chandra · Aoming Liu · Boqing Gong · Venkatesh Saligrama
|
||
S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Shengpeng Wang · Yulong Xie · Qing Liao · Wei Wang
|
||
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
Yeon-Ji Song · Jaein Kim · Suhyung Choi · Jin-Hwa Kim · Byoung-Tak Zhang
|
||
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Ao Ma · Jiasong Feng · Ke Cao · Jing Wang · WANG Yun · Quanwei Zhang · Zhanjie Zhang
|
||
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
Rui Wang · Yimu Sun · Jingxing Guo · Huisi Wu · Jing Qin
|
||
AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation
Haifeng Zhong · Fan Tang · Zhuo Chen · Hyung Jin Chang · Yixing Gao
|
||
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing
Shengdong Han · Shangdong Yang · Yuxuan Li · Xin Zhang · Xiang Li · jian Yang · Ming-Ming Cheng · Yimian Dai
|
||
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection
Subhajit Maity · Ayan Bhunia · Subhadeep Koley · Pinaki Chowdhury · Aneeshan Sain · Yi-Zhe Song
|
||
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
HIroyasu Akada · Jian Wang · Vladislav Golyanik · Christian Theobalt
|
||
GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
Shivangi Aneja · Artem Sevastopolsky · Tobias Kirschstein · Justus Thies · Angela Dai · Matthias Nießner
|
||
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho · Yash Belhe · Steve Rotenberg · Ravi Ramamoorthi · Tzu-Mao Li · Nicholas Antipa
|
||
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
yifei feng · Mx Yang · Shuhui Yang · Sheng Zhang · Jiaao Yu · Zibo Zhao · Lliu Yuhong · Jie Jiang · Chunchao Guo
|
||
PixelStitch: Structure-Preserving Pixel-Wise Bidirectional Warps for Unsupervised Image Stitching
Hengzhe Jin · Lang Nie · Chunyu Lin · Xiaomei Feng · Yao Zhao
|
||
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
Yijing Lin · Mengqi Huang · Shuhan Zhuang · Zhendong Mao
|
||
Real-time Streaming Depth Estimation at 2K Resolution
Gene Chou · Wenqi Xian · Guandao Yang · Mohamed Abdelfattah · Bharath Hariharan · Noah Snavely · Ning Yu · Paul Debevec
|
||
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo · Chaoyang Wang · Michael Vasilkovsky · Vladislav Shakhrai · Di Liu · Peiye Zhuang · Sergey Tulyakov · Peter Wonka · Hsin-Ying Lee · James Davis · Jian Wang
|
||
A Good Teacher Adapts Their Knowledge for Distillation
Chengyao Qian · Trung Le · Mehrtash Harandi
|
||
Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection
Dat NGUYEN · Marcella Astrid · Anis Kacem · Enjie Ghorbel · Djamila Aouada
|
||
Edit360: 2D Image Edits to 3D Assets from Any Angle
Junchao Huang · Xinting Hu · Shaoshuai Shi · Zhuotao Tian · Li Jiang
|
||
Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform
Guowei Shi · Zian Mao · Peisen Huang
|
||
TOTP: Transferable Online Pedestrian Trajectory Prediction with Temporal-Adaptive Mamba Latent Diffusion
Ziyang Ren · Ping Wei · Shangqi Deng · Haowen Tang · Jiapeng Li · Huan Li
|
||
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang · Aosong Cheng · Ming Lu · Renrui Zhang · Zhiyong Zhuo · Jiajun Cao · Shaobo Guo · Qi She · Shanghang Zhang
|
||
Can We Achieve Efficient Diffusion Without Self-Attention? Distilling Self-Attention into Convolutions
ZiYi Dong · Chengxing Zhou · Weijian Deng · Pengxu Wei · Xiangyang Ji · Liang Lin
|
||
Knowledge Distillation with Refined Logits
Wujie Sun · Defang Chen · Siwei Lyu · Genlang Chen · Chun Chen · Can Wang
|
||
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu · Jiahao Wang · Hao chen · Weizhan Zhang · Benqi Wang · yikun Li · Haishun Nan
|
||
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang · Dongyu She · Zhong Zhou
|
||
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao · Bin Yang · Wenke Huang · Mang Ye · Bo Du
|
||
Progressive Distribution Bridging: Unsupervised Adaptation for Large-scale Pre-trained Models via Adaptive Auxiliary Data
Weinan He · Yixin Zhang · Zilei Wang
|
||
SimBoost: Improving Real-World Driving via Simulated Hard-Case
Baihui Xiao · Chengjian Feng · Zhijian Huang · Feng yan · Yujie Zhong · Lin Ma
|
||
V2XScenes: A Multiple Challenging Traffic Conditions Dataset for Large-Range Vehicle-Infrastructure Collaborative Perception
Bowen Wang · Yafei Wang · Wei Gong · Siheng Chen · Genjia Liu · Minhao Xiong · Chin Ng
|
||
Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo · Lijun Sheng · Yongcan Yu · Jian Liang · Zilei Wang · Ran He
|
||
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu · Yaoming Wang · Bowen Shi · XIAOPENG ZHANG · Wenrui Dai · Chenglin Li · Hongkai Xiong · Qi Tian
|
||
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu · Linchao Zhu · Yi Yang
|
||
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan · Jiawei Shao · Eduard Zamfir · Ruanjun Li · Zhaochong An · Chao Ma · Danda Pani Paudel · Luc Gool · Radu Timofte · Zongwei Wu
|
||
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke · Vijay Kumar b g · Xingjian Leng · Zhixi Cai · Zaid Khan · Weiqing Wang · Pari Delir Haghighi · Hamid Rezatofighi · Manmohan Chandraker
|
||
RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment
Wanting ZHANG · Zhenhui Ding · Guilian Chen · Huisi Wu · Jing Qin
|
||
CMAD: Correlation-Aware and Modalities-Aware Distillation for Multimodal Sentiment Analysis with Missing Modalities
Yan Zhuang · Minhao Liu · Wei Bai · Yanru Zhang · Xiaoyue Zhang · Jiawen Deng · Fuji Ren
|
||
Stochastic Gradient Estimation for Higher-Order Differentiable Rendering
Zican Wang · Michael Fischer · Tobias Ritschel
|
||
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen · Bingchuan Li · Fan Zhang · Songtao Zhao · Qian HE
|
||
Frequency-Dynamic Attention Modulation For Dense Prediction
Linwei Chen · Lin Gu · Ying Fu
|
||
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li · Ruoyi Du · Juncheng Yan · Le Zhuo · Zhen Li · Peng Gao · Zhanyu Ma · Ming-Ming Cheng
|
||
Event-guided Unified Framework for Low-light Video Enhancement, Frame Interpolation, and Deblurring
Taewoo Kim · Kuk-Jin Yoon
|
||
AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images
Jiao Tang · Junjie Zhou · Bo Qian · Peng Wan · Yingli Zuo · WEI SHAO · Daoqiang Zhang
|
||
Class-Wise Federated Averaging for Efficient Personalization
Gyuejeong Lee · Daeyoung Choi
|
||
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long · Zilin Tian · Liguo Zhang · Huosheng Xu
|
||
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral · Enis Simsar · Federico Tombari · Pinar Yanardag
|
||
Supercharging Floorplan Localization with Semantic Rays
Yuval Grader · Hadar Averbuch-Elor
|
||
MMOne: Representing Multiple Modalities in One Scene
Zhifeng Gu · Bing WANG
|
||
SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang · Weiliang Chen · Wenzhao Zheng · Yueqi Duan · Jie Zhou · Jiwen Lu
|
||
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation
Xiaoling Hu · Xiangrui Zeng · Oula Puonti · Juan Iglesias · Bruce Fischl · Yaël Balbastre
|
||
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng · Hsueh-Ti Derek Liu · Yiheng Zhu · Xiaoxia Sun · Chong Shang · Kiran Bhat · Deva Ramanan · Jun-Yan Zhu · Maneesh Agrawala · Tinghui Zhou
|
||
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin · Jinpeng Wang · Yimin Zhou · Bin Chen · Tianci Luo · Baoyi An · Tao Dai · Shu-Tao Xia · Yaowei Wang
|
||
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li · Kaixun Jiang · Zhaoyu Chen · Bo Lin · Yao Tang · Weifeng Ge · Wenqiang Zhang
|
||
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan · Xi Yang · Tan Pan · TIANYI LIU · Chen Jiang · Xin Guo · Qiufeng Wang · Anh Nguyen · Yuan Qi · Kaizhu Huang · Yuan Cheng
|
||
MagicColor: Multi-instance Sketch Colorization
yinhan Zhang · Yue Ma · Bingyuan Wang · Qifeng Chen · Zeyu Wang
|
||
Shot by Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie · Tengda Han · Max Bain · Arsha Nagrani · Eshika Khandelwal · Gül Varol · Weidi Xie · Andrew Zisserman
|
||
Multi-modal Segment Anything Model for Camouflaged Scene Segmentation
Guangyu Ren · Hengyan Liu · Michalis Lazarou · Tania Stathaki
|
||
Generative Adversarial Diffusion
U-Chae Jun · Jaeeun Ko · Jiwoo Kang
|
||
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li · Kuangyu Ding · Kim-chuan Toh · Pan Zhou
|
||
GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation
Ye Tao · jiawei zhang · Yahao Shi · Dongqing Zou · Bin Zhou
|
||
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
Changxing Liu · Genjia Liu · Zijun Wang · Jinchang Yang · Siheng Chen
|
||
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He · Ceyuan Yang · Shanchuan Lin · Yinghao Xu · Meng Wei · Liangke Gui · Qi Zhao · Gordon Wetzstein · Lu Jiang · Hongsheng Li
|
||
Jigsaw++: Imagining Complete Shape Prior for Object Reassembly
Jiaxin Lu · Gang Hua · Qixing Huang
|
||
Temperature in Cosine-based Softmax Loss
Takumi Kobayashi
|
||
Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
Yihong Cao · Jiaming Zhang · Xu Zheng · Hao Shi · Kunyu Peng · Hang Liu · Kailun Yang · Hui Zhang
|
||
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey · Lucas Prieto · Stefanos Zafeiriou · Tolga Birdal
|
||
WIR3D: Semantic and Geometric-Aware 3D Shape Abstraction
Richard Liu · Daniel Fu · Noah Tan · Itai Lang · Rana Hanocka
|
||
RePoseD: Efficient Relative Pose Estimation With Known Depth Information
Yaqing Ding · Viktor Kocur · VACLAV VAVRA · Zuzana Berger Haladova · jian Yang · Torsten Sattler · Zuzana Kukelova
|
||
Knowledge Transfer from Interactions Learning
Yilin Gao · Kangyi Chen · Zhongxing Peng · Hengjie Lu · Shugong Xu
|
||
Reference-based Super-Resolution via Image-based Retrieval-Augmented Generation Diffusion
Byeonghun Lee · Hyunmin Cho · Honggyu Choi · Soo Min Kang · ILJUN AHN · Kyong Hwan Jin
|
||
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu · Zeyi Sun · Yuhang Zang · Xiaoyi Dong · Yuhang Cao · Haodong Duan · Dahua Lin · Jiaqi Wang
|
||
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
gaojie lin · Jianwen Jiang · Jiaqi Yang · Zerong Zheng · Chao Liang · ZHANG YUAN · Jingtu Li
|
||
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang · Cong Han · Yafei Li · Zhipeng Jin · Xiawei Li · Sinan Du · Wen Tao · Yi Yang · shuanglong li · Chun Yuan · LIU LIN
|
||
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao · Chenqi Kong · SIYUAN YANG · Zihao Shao · Xinghao Jiang · Boon Ng · Meng Er · Alex Kot
|
||
Aligning Effective Tokens with Video Anomaly in Large Language Models
Carol Chen · Jiahui Liu · Ruidi Fan · Yanwei Li · Chirui CHANG · Shizhen Zhao · Wilton.W.T. Fok · Xiaojuan Qi · Yik WU
|
||
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong · Yitong Li · Weihuang Chen · Chen Min · Yizhe Li · Zhiqiang Gao · Haoyang Li · Zhongyu Guo · Hongbin Sun
|
||
RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning
Kiseong Hong · Gyeong-Hyeon Kim · Eunwoo Kim
|
||
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu · Yuzhen N/A · Zhimin Sun · Ran Yi · Yifan Qi · Yizhe Tang · Tianyi Wang · Lizhuang Ma · FangYuan Zou
|
||
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai · Jichang Li · Zhaolun Li · Weikai Chen · Rushi Lan · Xi Xie · Xiaonan Luo · Guanbin Li
|
||
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
Francisco Caetano · Christiaan Viviers · Luis Zavala-Mondragón · Peter H.N. De With · Fons van der Sommen
|
||
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim · Mohammad Fahes · Tuan-Hung Vu · Andrei Bursuc · Raoul de Charette
|
||
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao · Yepeng Tang · Chunjie Zhang · Xiaolong Zheng · Chao Liang · Yunchao Wei · Yao Zhao
|
||
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh · Nimrod Shabtay · Eli Schwartz · Leonid Karlinsky · Raja Giryes · Hilde Kuehne · Rogerio Feris · James Glass · Assaf Arbelle · Shimon Ullman · Muhammad Mirza
|
||
SIMView: Long-term Autoregressive Scene Generation with Surfel-Indexed Memory of Views
Runjia Li · Philip Torr · Andrea Vedaldi · Tomas Jakab
|
||
A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering under Non-IID Missing Patterns
Xiaorui Jiang · Buyun He · Peng Yuan Zhou · Xinyue Chen · Jingcai Guo · Jie Xu · Yong Liao
|
||
RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration
Longxin Kou · Fei Ni · Jianye HAO · Han Peilong · Jinyi Liu · Haiqin Cui · Rui Liu · YAN ZHENG
|
||
Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
Yingjie Chen · Yifang Men · Yuan Yao · Miaomiao Cui · Liefeng Bo
|
||
Learning Neural Scene Representation from iToF Imaging
Wenjie Chang · Hanzhi Chang · Yueyi Zhang · Wenfei Yang · Tianzhu Zhang
|
||
FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
Yunpeng Bai · Qixing Huang
|
||
EA-KD: Entropy-based Adaptive Knowledge Distillation
Chi-Ping Su · Ching-Hsun Tseng · Bin Pu · Lei Zhao · Jiewen Yang · Zhuangzhuang Chen · Shin-Jye Lee
|
||
Generalized Deep Multi-view Clustering via Causal Learning with Partially Aligned Cross-view Correspondence
Xihong Yang · Siwei Wang · Jiaqi Jin · Fangdi Wang · Tianrui Liu · Yueming Jin · Xinwang Liu · En Zhu · Kunlun He
|
||
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Andreas Engelhardt · Mark Boss · Vikram Voleti · Chun-Han Yao · Hendrik Lensch · Varun Jampani
|
||
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu · Jinheng Xie · Meidan Ding · Zhe Kong · Jianfeng Ren · Ruibin Bai · Rong Qu · Linlin Shen
|
||
SALAD -- Semantics-Aware Logical Anomaly Detection
Matic Fučka · Vitjan Zavrtanik · Danijel Skocaj
|
||
Token Activation Map to Visually Explain Multimodal LLMs
Yi Li · Hualiang Wang · Xinpeng Ding · Haonan Wang · Xiaomeng Li
|
||
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li · Jingran Su · Jingfan CHEN · Zheng Ju · Yuntao Chen · Li Qing · Zhaoxiang Zhang
|
||
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
Xingyu Chen · Yue Chen · Yuliang Xiu · Andreas Geiger · Anpei Chen
|
||
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li · Shuxiao Ding · You Zhou · Qingwen Zhang · Onat Inak · Larissa Triess · Niklas Hanselmann · Marius Cordts · Andreas Zell
|
||
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia · David Bourgin · Krishna Kumar Singh · Yuheng Li · Yan Kang · Zhan Xu · Niraj Jha · Yuchen Liu
|
||
PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
Haoyu Zhao · Hao Wang · Xingyue Zhao · Hao Fei · Hongqiu Wang · Chengjiang Long · Hua Zou
|
||
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
Bin Cao · Sipeng Zheng · Ye Wang · Lujie Xia · Qianshan Wei · Qin Jin · Jing Liu · Zongqing Lu
|
||
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Junsung Park · Jungbeom Lee · Jongyoon Song · Sangwon Yu · Dahuin Jung · Sungroh Yoon
|
||
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao · Xuqi Liu · Zhongqi Yue · Yang Wu · Shuang Chen · Juncheng Li · Siliang Tang · Fei Wu · Tat-Seng Chua · Yueting Zhuang
|
||
NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction
Chao Liu · Yangbo Jiang · Nenggan Zheng
|
||
PseudoMapTrainer: Learning Online Mapping without HD Maps
Christian Löwens · Thorben Funke · Jingchao Xie · Alexandru Condurache
|
||
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
Jiahao Xia · Yike Wu · Wenjian Huang · Jianguo Zhang · Jian Zhang
|
||
MMAT-1M: A Large CoT Dataset for Multimodal Agent Tuning
Tianhong Gao · Yannian Fu · Weiqun Wu · Haixiao Yue · Shanshan Liu · Gang Zhang
|
||
KOEnsAttack: Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles
Chaoyong Yang · Jia-Li Yin · Bin Chen · Zhaozhe Hu · Xiaolei Liu · Wei Lin
|
||
Cross-Architecture Distillation Made Simple with Redundancy Suppression
Weijia Zhang · Yuehao Liu · Wu Ran · Chao Ma
|
||
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi · Lianyu Hu · Fanhua Shang · Liqing Gao · Wei Feng
|
||
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
Conghao Wong · Ziqian Zou · Beihao Xia
|
||
ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation
Qizhen Lan · Qing Tian
|
||
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han · Rui Yang · Huan Liao · Jiankai Xing · Zunnan Xu · Xiaoming Yu · Junwei Zha · Xiu Li · Wanhua Li
|
||
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
Miroslav Purkrábek · Jiri Matas
|
||
Multi-View 3D Point Tracking
Frano Rajič · Haofei Xu · Marko Mihajlovic · Siyuan Li · Irem Demir · Emircan Gündoğdu · Lei Ke · Sergey Prokudin · Marc Pollefeys · Siyu Tang
|
||
Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Liwei Luo · 帅滕远 李 · Dongwei Ren · Qilong Wang · Pengfei Zhu · Qinghua Hu
|
||
Auxiliary Prompt Tuning of Vision-Language Models for Out-of-Distribution Detection
Wenjun Miao · Guansong Pang · Zihan Wang · Jin Zheng · Xiao Bai
|
||
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji · Boyong He · Zhuoyue Tan · Liaoni Wu
|
||
Evading Data Provenance in Deep Neural Networks
Hongyu Zhu · Sichu Liang · Wenwen Wang · Zhuomeng Zhang · Fangqi Li · Shi-Lin Wang
|
||
EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients
meihan wu · Tao Chang · Cui Miao · Jie Zhou · Chun Li · Xiangyu Xu · Ming Li · Xiaodong Wang
|
||
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella · Vittorio Cuculo · Alessandro D'Amelio · Marcella Cornia · Giuseppe Boccignone · Rita Cucchiara
|
||
NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang Zhai · Jiajun Li · Yue Liu · Huanran Chen · Zhihua Tian · Wenjie Qu · Qingni Shen · Ruoxi Jia · Yinpeng Dong · Jiaheng Zhang
|
||
Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information
Zhaoxin Yuan · Shuang Yang · Shiguang Shan · Xilin Chen
|
||
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee · Jiaul Paik · Debashis Sen
|
||
JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models
Xiaolong Jin · Zixuan Weng · Hanxi Guo · Chenlong Yin · Siyuan Cheng · Guangyu Shen · Xiangyu Zhang
|
||
Online Language Splatting
Saimouli Katragadda · Cho-Ying Wu · Yuliang Guo · Xinyu Huang · Guoquan Huang · Liu Ren
|
||
Learning Counterfactually Decoupled Attention for Open-world Model Attribution
Yu Zheng · Boyang Gong · Fanye Kong · Yueqi Duan · Bingyao Yu · Wenzhao Zheng · Lei Chen · Jiwen Lu · Jie Zhou
|
||
Language Driven Occupancy Prediction
Zhu Yu · Bowen Pang · Lizhe Liu · Runmin Zhang · Qiang Li · Si-Yuan Cao · Maochun Luo · Mingxia Chen · Sheng Yang · Hui-liang Shen
|
||
Deterministic Object Pose Confidence Region Estimation
Jinghao Wang · Zhang Li · Zi Wang · Banglei Guan · Yang Shang · Qifeng Yu
|
||
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
Minghang Zheng · Yuxin Peng · Benyuan Sun · Yi Yang · Yang Liu
|
||
EDM: Efficient Deep Feature Matching
Xi Li · Tong Rao · Cihui Pan
|
||
Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen · Ting-Hsuan Liao · Pengsheng Guo · Alex Schwing · Jia-Bin Huang
|
||
StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams
Yang LI · Jinglu Wang · Lei Chu · Xiao Li · Shiu-hong Kao · Ying-Cong Chen · Yan Lu
|
||
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang · Yubo Wang · Haoyu Cao · Linli Xu
|
||
Multi-turn Consistent Image Editing
Zijun Zhou · Yingying Deng · Xiangyu He · Weiming Dong · Fan Tang
|
||
TeRA : Rethinking Text-driven Realistic 3D Avatar Generation
Yanwen Wang · Yiyu Zhuang · Jiawei Zhang · Li Wang · Yifei Zeng · Xun Cao · Xinxin Zuo · Hao Zhu
|
||
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
Zhenjun Yu · Wenqiang Xu · Pengfei Xie · Yutong Li · Brian Anthony · Zhuorui Zhang · Cewu Lu
|
||
Penalizing Boundary Activation for Object Completeness in Diffusion Models
Haoyang Xu · Tianhao Zhao · Sibei Yang · Yutian Lin
|
||
Dual-level Prototype Learning for Composite Degraded Image Restoration
Zhongze Wang · Haitao Zhao · Lujian Yao · Jingchao Peng · Kaijie Zhao
|
||
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li · Cheng Lin · Dezhi Li · You-Liang Huang · Wei Li · Tianyu Wu · Jie Zou · Wei Xue · Sirui Han · Yike Guo
|
||
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park · Javier Romero · Shunsuke Saito · Fabian Prada · Takaaki Shiratori · Yichen Xu · Federica Bogo · Shoou-I Yu · Kris Kitani · Rawal Khirodkar
|
||
Efficient Spiking Point Mamba for Point Cloud Analysis
Peixi Wu · Bosong Chai · Menghua Zheng · Wei Li · Zhangchi Hu · Jie Chen · Zheyu Zhang · Hebei Li · Xiaoyan Sun
|
||
Dream-to-Real: Leveraging Image Generation for Single-View Volumetric Reconstruction
Philipp Wulff · Felix Wimbauer · Dominik Muhle · Daniel Cremers
|
||
InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes
Zesong Yang · Bangbang Yang · Wenqi Dong · Chenxuan Cao · Liyuan Cui · Yuewen Ma · Zhaopeng Cui · Hujun Bao
|
||
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker · Abhishek Goyal · Rene Vidal
|
||
Detect Anything 3D in the Wild
Hanxue Zhang · Haoran Jiang · Qingsong Yao · Yanan SUN · Renrui Zhang · Hao Zhao · Hongyang Li · Hongzi Zhu · Zetong Yang
|
||
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Jiawei Gu · Ziyue Qiao · Zechao Li
|
||
Lidar Waveforms are Worth 40x128x33 Words
Dominik Scheuble · Hanno Holzhüter · Steven Peters · Mario Bijelic · Felix Heide
|
||
LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion
Yisu Zhang · Chenjie Cao · Chaohui Yu · Jianke Zhu
|
||
EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting
Xiaobao Wei · Qingpo Wuwu · Zhongyu Zhao · Zhuangzhe Wu · Nan Huang · Ming Lu · ningning ma · Shanghang Zhang
|
||
Latent Expression Generation for Referring Image Segmentation and Grounding
Seonghoon Yu · Junbeom Hong · Joonseok Lee · Jeany Son
|
||
RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes
Pou-Chun Kung · Skanda Harisha · Ram Vasudevan · Aline Eid · Katherine A. Skinner
|
||
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Zihan Ding · Chi Jin · Difan Liu · Haitian Zheng · Krishna Kumar Singh · Qiang Zhang · Yan Kang · Zhe Lin · Yuchen Liu
|
||
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen · Iro Armeni · Daniel Barath · Marc Pollefeys
|
||
DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
Youzhuo Wang · jiayi ye · Chuyang Xiao · Yiming Zhong · Heng Tao · Hang Yu · Yumeng Liu · Jingyi Yu · Yuexin Ma
|
||
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu · Ahyun Seo · Minsu Cho
|
||
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye · Yushuang Wu · Ziteng Lu · Jiahao Chang · Xiaoyang Guo · Jiaqing Zhou · Hao Zhao · Xiaoguang Han
|
||
Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection
Ruiyang Zhang · Hu Zhang · Zhedong Zheng
|
||
Benchmarking Multimodal Large Language Models Against Image Corruptions
Xinkuan Qiu · Meina Kan · Yongbin Zhou · Shiguang Shan
|
||
FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction
Donghyun Lee · Dawoon Jeong · Jae W. Lee · Hongil Yoon
|
||
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao · Feng Hong · Mengxi Chen · Pengyi Chen · Benyuan Liu · Jiangchao Yao · Ya Zhang · Yanfeng Wang
|
||
CObL: Toward Zero-Shot Ordinal Layering without User Prompting
Aneel Damaraju · Dean Hazineh · Todd Zickler
|
||
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
Aniruddha Bala · Rohit Chowdhury · Rohan Jaiswal · Siddharth Roheda
|
||
CaO$_2$ : Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
Haoxuan Wang · Zhenghao Zhao · Junyi Wu · Yuzhang Shang · Gaowen Liu · Yan Yan
|
||
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li · Jianwu Fang · Junbin Xiao · Shanmin Pang · Hongkai Yu · Chen Lv · Jianru Xue · Tat-Seng Chua
|
||
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Piror
Minghao Wen · Shengjie Wu · Kangkan Wang · Dong Liang
|
||
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen · Hongyi Xu · Guoxian Song · You Xie · Chenxu Zhang · Xin Chen · Chao Wang · Di Chang · Linjie Luo
|
||
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He · Feng Chen · Jing Liu · Wenqi Shao · Hong Zhou · Kaipeng Zhang · Bohan Zhuang
|
||
Vector Contrastive Learning For Pixel-Wise Pre-Training In Medical Vision
Yuting He · Shuo Li
|
||
AG$^2$aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
Zhaonan Wang · Manyi Li · Changhe Tu
|
||
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
Peng-Hao Hsu · Ke Zhang · Fu-En Wang · Tao Tu · Ming-Feng Li · Yu-Lun Liu · Albert Y. C. Chen · Min Sun · Cheng-Hao Kuo
|
||
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG · Peiqi Duan · Boxin Shi · Renjie Wan
|
||
IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim · Seunghun Paik · Chanwoo Hwang · Dongsoo Kim · Junbum Shin · Jae Hong Seo
|
||
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han · Wei Zhai · Yang Cao · Bin Li · Zheng-Jun Zha
|
||
EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation
Jong Hyeon Baek · Jiwon oh · Yeong Jun Koh
|
||
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
Yangyi Huang · Ye Yuan · Xueting Li · Jan Kautz · Umar Iqbal
|
||
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis · Ahmet Karadeniz · Sebastian Cavada · Danila Rukhovich · Niki Foteinopoulou · Kseniya Cherenkova · Anis Kacem · Djamila Aouada
|
||
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
Yiyang Su · Yunping Shi · Feng Liu · Xiaoming Liu
|
||
GenHaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing
Sixiang Chen · Tian Ye · Yunlong Lin · Yeying Jin · Yijun Yang · Haoyu Chen · Jianyu Lai · Song Fei · Zhaohu Xing · Fugee Tsung · Lei Zhu
|
||
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Yuxuan Wang · Tianwei Cao · Huayu Zhang · Zhongjiang He · Kongming Liang · Zhanyu Ma
|
||
MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence
Liyuan Deng · Yunpeng Bai · Yongkang Dai · Xiaoshui Huang · Hongping Gan · Dongshuo Huang · Hao jiacheng · Yilei Shi
|
||
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
Xingyu Hu · Junjun Jiang · Chenyang Wang · Kui Jiang · Xianming Liu · Jiayi Ma
|
||
Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching
Giacomo Meanti · Thomas Ryckeboer · Michael Arbel · Julien Mairal
|
||
DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup
Zhen Qu · Xian Tao · Xinyi Gong · ShiChen Qu · Xiaopei Zhang · Xingang Wang · Fei Shen · Zhengtao Zhang · Mukesh Prasad · Guiguang Ding
|
||
Event-based Visual Vibrometry
Xinyu Zhou · Peiqi Duan · Yeliduosi Xiaokaiti · Chao Xu · Boxin Shi
|
||
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu · Bocheng Li · Yifei Xin · Zhihua Xia · Linli Xu
|
||
Tiling artifacts and trade-offs of feature normalization in the segmentation of large biological images
Elena Buglakova · Anwai Archit · Edoardo D'Imprima · Julia Mahamid · Constantin Pape · Anna Kreshuk
|
||
Always skip connection
Yiping Ji · Hemanth Saratchandran · Peyman Moghadam · Simon Lucey
|
||
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition
Minsoo Kim · Min-Cheol Sagong · Gi Nam · Junghyun Cho · Ig-Jae Kim
|
||
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion
Wenqiang Sun · Shuo Chen · Fangfu Liu · Zilong Chen · Yueqi Duan · Jun Zhu · Jun Zhang · Yikai Wang
|
||
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger · Nina Wenzel · David Griffiths · Haiming Gang · Justin Lazarow · Gefen Kohavi · Kai Kang · Marcin Eichner · Yinfei Yang · Afshin Dehghan · Peter Grasch
|
||
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma · Yuxin Chen · Ziqi Zhang · Zhongang Qi · Chunfeng Yuan · Shaojie Zhu · Chengxiang Zhuo · Bing Li · Ye Liu · Zang Li · Ying Shan · Weiming Hu
|
||
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Amin Karimi Monsefi · Mridul Khurana · Rajiv Ramnath · Anuj Karpatne · Wei-Lun Chao · Cheng Zhang
|
||
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Hai Huang · Yan Xia · Sashuai Zhou · Hanting Wang · Shulei Wang · Zhou Zhao
|
||
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu · Xuanchi Ren · Jiawei Yang · Tianchang Shen · Jay Wu · Jun Gao · Yue Wang · Siheng Chen · Mike Chen · Sanja Fidler · Jiahui Huang
|
||
Task-Specific Zero-shot Quantization-Aware Training for Object Detection
Changhao Li · Xinrui Chen · Ji Wang · Kang Zhao · Jianfei Chen
|
||
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Yuru Jia · Valerio Marsocci · Ziyang Gong · Xue Yang · Maarten Vergauwen · Andrea Nascetti
|
||
Test-Time Retrieval-Augmented Adaptation for Vision-Language Models
Xinqi Fan · Xueli CHEN · Luoxiao Yang · Chuin Hong Yap · Rizwan Qureshi · Qi Dou · Moi Hoon Yap · Mubarak Shah
|
||
Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced In-domain Knowledge Transferring
Zhu Xu · Ting Lei · Zhimin Li · Guan Wang · Qingchao Chen · Yuxin Peng · Yang Liu
|
||
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang · Yue Song · Georgia Gkioxari · Pietro Perona
|
||
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei · Jinxi Li · Yafei YANG · Siyuan Zhou · Bo Yang
|
||
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang · Quan Cui · Bingchen Zhao · Cheng Yang
|
||
TITAN: Query-Token based Domain Adaptive Adversarial Learning
Tajamul Ashraf · Janibul Bashir
|
||
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang · Zihan Jia · Zhilin Dai · Sheng Guo · Limin Wang
|
||
M$^2$EIT:Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking
Yan Li · Yang Xu · Changhao Chen · Zhongchen Shi · Wei Chen · Liang Xie · Hongbo Chen · Erwei Yin
|
||
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu · Jingwen Fu · Yang Wu · Kangyi Wu · Pengna Li · Jiayi Wu · Sanping Zhou · Jingmin Xin
|
||
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
Zongheng Tang · Yi Liu · Yifan Sun · Yulu Gao · Jinyu Chen · Runsheng Xu · Si Liu
|
||
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
WonJun Moon · Hyun Seok Seong · Jae-Pil Heo
|
||
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng · Wei Zhang · Shiwei Zhang · Jian Yang · Xiangyuan Guan · Xianjie Wu · Xiang Li · Ge Zhang · Jiaheng Liu · Yuying Mai · Yutao Zeng · Zhoufutu Wen · JinKe JinKe · Baorui Wang · Weixiao Zhou · Lu Yunhong · Hangyuan Ji · Tongliang Li · Wenhao Huang · Zhoujun Li
|
||
Referring to Any Person
Qing Jiang · Lin Wu · Zhaoyang Zeng · Tianhe Ren · Yuda Xiong · Yihao Chen · Liu Qin · Lei Zhang
|
||
SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning
XIN Hu · Ke Qin · Guiduo Duan · Ming Li · Yuan-Fang Li · Tao He
|
||
Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
Yifan Zhan · Qingtian Zhu · Muyao Niu · Mingze Ma · Jiancheng Zhao · Zhihang Zhong · Xiao Sun · Yu Qiao · Yinqiang Zheng
|
||
A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba
Ye Lu · Jie Wang · Jianjun Gao · Rui Gong · Chen Cai · Kim-Hui Yap
|
||
Embodied Representation Alignment with Mirror Neurons
Wentao Zhu · Zhining Zhang · Yuwei Ren · Yin Huang · Hao Xu · Yizhou Wang
|
||
PartField: Learning 3D Feature Fields for Part Segmentation and Beyond
Minghua Liu · Mikaela Uy · Donglai Xiang · Hao Su · Sanja Fidler · Nicholas Sharp · Jun Gao
|
||
Seal Your Backdoor with Variational Defense
Ivan Sabolic · Matej Grcic · Siniša Šegvić
|
||
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li · Qi Ma · Runyi Yang · Huapeng Li · Mengjiao Ma · Bin Ren · Nikola Popovic · Nicu Sebe · Ender Konukoglu · Theo Gevers · Luc Gool · Martin Oswald · Danda Pani Paudel
|
||
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu · Susang Kim · Kisu Lee · Taekyoung Kwon · Won-Yong Shin · Ha Young Kim
|
||
Semi-supervised Concept Bottleneck Models
Lijie Hu · Tianhao Huang · Huanyi Xie · Xilin Gong · Chenyang Ren · Zhengyu Hu · Lu Yu · Ping Ma · Di Wang
|
||
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
Zeyi Sun · Tong Wu · Pan Zhang · Yuhang Zang · Xiaoyi Dong · Yuanjun Xiong · Dahua Lin · Jiaqi Wang
|
||
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao · Jiaxing Huang · Yawen Qiu · Michael K. Chen · Wenzheng Liu · Wei Zhang · wenjie zeng · Xikun ZHANG · Jingyi Zhang · YuXin Song · Wenhao Wu · Dacheng Tao
|
||
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang · Yuxin Chen · Zhongang Qi · Lijun Liu · Jile Jiao · Xuetao Feng · Yujia Liang · Ying Shan · Zhipeng Zhang
|
||
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu · Shenghua Gao · Ying Shan
|
||
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
Beier Zhu · Ruoyu Wang · Tong Zhao · Hanwang Zhang · Chi Zhang
|
||
G2PDiffusion: Cross-species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Mengdi Liu · Zhangyang Gao · Hong Chang · Stan Li · Shiguang Shan · Xilin Chen
|
||
Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection
Zhaoyang Li · Zhu Teng · Baopeng Zhang · Jianping Fan
|
||
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei · Chunbo Luo · Yang Luo
|
||
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng · Caroline Chan · Fredo Durand · Phillip Isola
|
||
TRNAS: A Training-Free Robust Neural Architecture Search
Yeming Yang · Qingling Zhu · Jianping Luo · Ka-Chun Wong · Qiuzhen Lin · Jianqiang Li
|
||
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen · Lei Zhang · Yakun Niu
|
||
LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti · Jean Prost · Andres Almansa · Nicolas Papadakis · Marcelo Pereyra
|
||
Removing Out-of-Focus Reflective Flares via Color Alignment
Fengbo Lan · Chang Wen Chen
|
||
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen · Tao Han · Song Guo · Jie ZHANG · Yonghan Dong · Yunlong Yu · LEI BAI
|
||
Federated domain generalization with domain-specific soft prompts generation
Jianhan Wu · Xiaoyang Qu · Zhangcheng Huang · Jianzong Wang
|
||
VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation
Jiawei Wang · Zhiming Cui · Changjian Li
|
||
High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach
Yuchong Chen · Jian Yu · Shaoyan Gai · Zeyu Cai · Feipeng Da
|
||
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng · Xin Jiang · Zechao Li
|
||
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lyu · Chenyang Si · Tianlin Pan · Zhaoxi Chen · Kwan-Yee K. Wong · Yu Qiao · Ziwei Liu
|
||
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
Yuzhe Li · Yuzhe Li · Yuliang Liu · Yingying Zhu · Xiang Bai
|
||
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
Qian Liang · Ruixu Geng · Jinbo Chen · Haoyu Wang · Yan Chen · Yang Hu
|
||
HERO: Human Reaction Generation from Videos
Chengjun Yu · Wei Zhai · Yuhang Yang · Yang Cao · Zheng-Jun Zha
|
||
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image
Shuang Xu · Zixiang Zhao · Haowen Bai · Chang Yu · Jiangjun Peng · Xiangyong Cao · Deyu Meng
|
||
Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing
Hongyu Shen · Junfeng Ni · Weishuo Li · Mingtao Pei · Yixin Chen · Siyuan Huang
|
||
AgroBench: Vision-Language Model Benchmark in Agriculture
Risa Shinoda · Nakamasa Inoue · Hirokatsu Kataoka · Masaki Onishi · Yoshitaka Ushiku
|
||
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng · Duotun Wang · Zhijing Shao · Ligang Liu · Zeyu Wang
|
||
CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh · Adriana Kovashka
|
||
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Saad · Ziad Al-Halah
|
||
RANKCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang · Zhuokai Zhao · Zhaorun Chen · Zhili Feng · Zenghui Ding · Yining Sun
|
||
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim · Gwangtak Bae · Eun Sun Lee · Young Kim Kim
|
||
Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model
Kai Tong · Kang Pan · Xiao Zhang · Erli Meng · Run He · Yawen Cui · Nuoyan Guo · Huiping Zhuang
|
||
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo · Jinshan Pan · Shao-Yi Chien · Ming-Hsuan Yang
|
||
Diffusion Image Prior
Hamadi Chihaoui · Paolo Favaro
|
||
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model
Jinhua Zhang · Hualian Sheng · Sijia Cai · Bing Deng · Qiao Liang · Wen Li · Ying Fu · Jieping Ye · Shuhang Gu
|
||
MOBIUS: Big-to-Mobile Universal Image Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu · Marta Tintore Gazulla · Yongqin Xian · Luc Gool · Federico Tombari
|
||
Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
Jorge Herrera · Yi Zhou · Xin Sun · Zhixin Shu · Chengan He · Soren Pirk · Dominik Michels
|
||
CLIPSym: Delving into Symmetry Detection with CLIP
Tinghan Yang · Md Ashiqur Rahman · Raymond Yeh
|
||
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI · Fayao Liu · Jingyi Liao · Sichao Tian · Chuan-Sheng Foo · Xulei Yang
|
||
Domain Generalizable Portrait Style Transfer
Xinbo Wang · Wenju Xu · Qing Zhang · Wei-Shi Zheng
|
||
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma · Junzhe Jiang · Yurui Chen · Li Zhang
|
||
AFFECT: Aligning Fisheye Feature Embeddings using Calibration Tokens for Monocular Depth Estimation
Suchisrit Gangopadhyay · Jung Hee Kim · Xien Chen · Patrick Rim · Hyoungseob Park · Alex Wong
|
||
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao · James Jun Liang Chen Ye · Zhengyi Wang · Guangce Liu · Yiwen Chen · Yikai Wang · Jun Zhu
|
||
AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm
Xinyue Li · Zhangkai Ni · Wenhan Yang
|
||
Moment Quantization for Video Temporal Grounding
Xiaolong Sun · Le Wang · Sanping Zhou · Liushuai Shi · Kun Xia · Mengnan Liu · Yabing Wang · Gang Hua
|
||
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo · Xin Gao · Yuxiang Yan · Guanghao Li · Jian Pu
|
||
Similarity Memory Prior is All You Need for Medical Image Segmentation
Hao Tang · Zhiqing Guo · Liejun Wang · Chao Liu
|
||
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
JINGJING JIANG · Chao Ma · Xurui Song · Hanwang Zhang · Jun Luo
|
||
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng · Jiaqi Mao · Minghao Lai · Vu Phan · Yanjie Dong · Wei Wang · Qi Chen · Xiping Hu
|
||
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
Hengjia Li · Lifan Jiang · Xi Xiao · Tianyang Wang · Hongwei Yi · Boxi Wu · Deng Cai
|
||
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning · Zhuotao Tian · Shaoshuai Shi · Daojing He · Guangming Lu · Wenjie Pei · Li Jiang
|
||
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo · Dahyun Kang · Sanghyun Kim · Yunseon Choi · Minsu Cho
|
||
MistSense: Versatile Online Detection of Procedural and Execution Mistakes
Constantin Patsch · Yuankai Wu · Marsil Zakour · Driton Salihu · Eckehard Steinbach
|
||
Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars
Vanessa Skliarova · Egor Zakharov · Malte Prinzler · Giorgio Becherini · Michael Black · Justus Thies
|
||
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan · Jianing Li · Wei Chen · Mingkun Zhang · Xueqi Cheng
|
||
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic
Munish Monga · Vishal Chudasama · Pankaj Wasnik · Biplab Banerjee
|
||
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun · Minho Shim · Sukjun Hwang · Su Ho Han · Taeoh Kim · Inwoong Lee · Dongyoon Wee · Joon-Young Lee · Seon Joo Kim
|
||
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo · Minsu Kim · Chae Won Kim · Stavros Petridis · Yong Man Ro
|
||
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng · Xiang Zhang · Zirui Wang · Haiyang Xu · Zeyuan Chen · Bingnan Li · Zhuowen Tu
|
||
Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim · Sooyoung Yang · Jihyong Oh · Myungjoo Kang · Chanho Eom
|
||
RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation
Chen Zhennan · Yajie Li · Haofan Wang · Zhibo Chen · Zhengkai Jiang · Jun Li · Qian Wang · Jian Yang · Ying Tai
|
||
Diffusion-Based Extreme High-speed Scenes Reconstruction with the Complementary Vision Sensor
Yapeng Meng · Yihan Lin · Taoyi Wang · Yuguo Chen · Lijian Wang · Rong Zhao
|
||
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu · Kangjie Chen · Jiawei Qiu · Yuyang zhang · Run Wang · Jin Mao · Tianwei Zhang · Lina Wang
|
||
ShortV: Freezing Visual Tokens in Ineffective Layers of Multimodal Large Language Models
Qianhao Yuan · Qingyu Zhang · yanjiang liu · Jiawei Chen · Yaojie Lu · Hongyu Lin · Jia Zheng · Xianpei Han · Le Sun
|
||
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Jiahua Dong · Hui Yin · Wenqi Liang · Hanbin Zhao · Henghui Ding · Nicu Sebe · Salman Khan · Fahad Khan
|
||
MPBR: Multimodal Progressive Bidirectional Reasoning for Open-Set Fine-Grained Recognition
Junfu Tan · Peiguang Jing · Yu Zhu · YU LIU
|
||
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang · Zuyan Liu · Yongming Rao · Jiwen Lu
|
||
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova · Dmitry Yudin
|
||
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang · Luanyuan Dai · Qika Lin · Yunfeng Diao · Guangyin Jin · Yufei Guo · Jing Zhang · Xiaoshuai Hao
|
||
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde · Zihao Lu · Radu Timofte
|
||
Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation
Dong Zhao · Qi Zang · Shuang Wang · Nicu Sebe · Zhun Zhong
|
||
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu · Wei Hu
|
||
Stable Score Distillation
Haiming Zhu · Yangyang Xu · Chenshu Xu · Tingrui Shen · Wenxi Liu · Yong Du · Jun Yu · Shengfeng He
|
||
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang · Yunjian Zhang · Yao Zhu · Enci Liu · Jianing Li · Yanwei Liu · Xiangyang Ji
|
||
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang · Reuben Tan · Pengyue Zhu · Jianwei Yang · Zhengyuan Yang · Lijuan Wang · Andrey Kolobov · Jianfeng Gao · Boqing Gong
|
||
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
YoungSeok Jeon · Hongfei Yang · Huazhu Fu · Mengling Feng
|
||
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li · Ziyi Wang · Yulin Yuan · Hong Liu · Xiangming Meng · Junsong Yuan · Mengyuan Liu
|
||
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Dat · Nam Hyeon-Woo · Po-Yuan Mao · Tae-Hyun Oh
|
||
Staining and locking computer vision models without retraining
Oliver Sutton · Qinghua Zhou · George Leete · Alexander Gorban · Ivan Tyukin
|
||
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen · Wen Shen · Zhihua Wei · Lijun Sun · Hongyun Zhang
|
||
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao · Gen Li · Shreyank Gowda · Robert Fisher · Jonathan Huang · Anurag Arnab · Laura Sevilla-Lara
|
||
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao · Sanping Zhou · Yizhe Li · Ye Deng · Wenli Huang · Le Wang
|
||
Efficient Track Anything
Yunyang Xiong · Chong Zhou · Xiaoyu Xiang · Lemeng Wu · Chenchen Zhu · Zechun Liu · Saksham Suri · Balakrishnan Varadarajan · Ramya Akula · Forrest Iandola · Raghuraman Krishnamoorthi · Bilge Soran · Vikas Chandra
|
||
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang · Delong Zhang · Yi-Xing Peng · Zhi Ouyang · Jingke Meng · Wei-Shi Zheng
|
||
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu · Bo Qu · Si-Yuan Cao · Runmin Zhang · Shujie Chen · Bailin Yang · Hui-liang Shen
|
||
Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning
Jeong Lee · Hyoseok Hwang
|
||
LMM-Det: Make Large Multimodal Models Excel in Object Detection
Jincheng Li · Chunyu Xie · Ji Ao · Dawei Leng · Yuhui Yin
|
||
ART: Adaptive Relation Tuning for Generalized Relation Detection
Gopika Sudhakaran · Hikaru Shindo · Patrick Schramowski · Simone Schaub-Meyer · Kristian Kersting · Stefan Roth
|
||
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu · Khoi Nguyen · Preeti Mukherjee · Saurabh Bagchi · Somali Chaterji · Yingyu Liang · Yin Li
|
||
CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Hanzhi Zhong · Zhiyu Xiang · Ruoyu Xu · Jingyun Fu · Peng Xu · Shaohong Wang · Zhihao Zhihao · Tianyu Pu · Eryun Liu
|
||
AIM: Adaptive Inference of Multi-modal LLMs via Token Merging and Pruning
Yiwu Zhong · Zhuoming Liu · Yin Li · Liwei Wang
|
||
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
Shoubin Yu · Difan Liu · Ziqiao Ma · Yicong Hong · Yang Zhou · Hao Tan · Joyce Chai · Mohit Bansal
|
||
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong · Chengyao Wang · Yuqi Liu · Senqiao Yang · Longxiang Tang · Yuechen Zhang · Jingyao Li · Tianyuan Qu · Yanwei Li · Yukang Chen · Shaozuo Yu · WU Sitong · Eric Lo · Shu Liu · Jiaya Jia
|
||
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang · Shuang Yang · Shiguang Shan · Xilin Chen
|
||
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
Xinyu Fang · Zhijian Chen · Kai Lan · Lixin Ma · Shengyuan Ding · Yingji Liang · Xiangyu Zhao · Farong Wen · Zicheng Zhang · Guofeng Zhang · Haodong Duan · Kai Chen · Dahua Lin
|
||
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing · Dongyao Jiang · Yongqiang Ma · Haibo Hua · Bo Huang · Nanning Zheng
|
||
AllTracker: Efficient Dense Point Tracking at High Resolution
Adam Harley · Yang You · Yang Zheng · Xinglong Sun · Nikhil Raghuraman · Sheldon Liang · Yunqi Gu · Wen-Hsuan Chu · Suya You · Achal Dave · Rares Ambrus · Katerina Fragkiadaki · Leonidas Guibas
|
||
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Yash Garg · Saketh Bachu · Arindam Dutta · Rohit Lal · Sarosij Bose · Calvin-Khang Ta · M. Salman Asif · Amit Roy-Chowdhury
|
||
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma · Yiran He · Bin Sun · Shutao Li
|
||
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng · Junke Wang · Yi Chang · Yizhou Yu · Rui Ma · Zuxuan Wu
|
||
Memory-Efficient Generative Models via Product Quantization
Jie Shao · Hanxiao Zhang · Hao Yu · Jianxin Wu
|
||
Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior
Juncheng Mu · Chengwei Ren · Weixiang Zhang · Liang Pan · Xiao-Ping Zhang · Yue Gao
|
||
Radiant Foam: Real-Time Differentiable Ray Tracing
Shrisudhan Govindarajan · Daniel Rebain · Kwang Moo Yi · Andrea Tagliasacchi
|
||
AdsQA: Towards Advertisement Video Understanding
Xinwei Long · Kai Tian · Peng Xu · Guoli Jia · Jingxuan Li · Sa Yang · Yihua Shao · Kaiyan Zhang · Che Jiang · Hao Xu · Yang Liu · Jiaheng Ma · Bowen Zhou
|
||
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou · Chen He · Ruiping Wang · Xilin Chen
|
||
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu · Wei Wu · Guanyu Feng · Yutao Zhang
|
||
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang · Xianfang Zeng · Kangcong Li · Gang YU · Tao Chen
|
||
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du · Zhineng Chen · Hongtao Xie · Caiyan Jia · Yu-Gang Jiang
|
||
ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
Binbin Xiang · Maciej Wielgosz · Stefano Puliti · Kamil Král · Martin Krůček · Azim Missarov · Rasmus Astrup
|
||
Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu · Enxin Song · Wenhao Chai · Xuexiang Wen · Tian Ye · Gaoang Wang
|
||
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng · Mingsheng Li · Jiakang Yuan · Hongbin Zhou · Renqiu Xia · Renrui Zhang · LEI BAI · Song Mao · Bin Wang · Aojun Zhou · Botian Shi · Tao Chen · Bo Zhang · Xiangyu Yue
|
||
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He · Xiao Wu · Jun-Yan He · Shuai Li
|
||
Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition
Rui Ma · Qilong Wang · Bing Cao · Qinghua Hu · Yahong Han
|
||
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang · Luigi Piccinelli · Mattia Segu · Siyuan Li · Rui Huang · Yuqian Fu · Marc Pollefeys · Hermann Blum · Zuria Bauer
|
||
ASCENT: Annotation-free Self-supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy
Haejun Han · Hang Lu
|
||
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo · Haoran Yang · Yi Xin · Mingyang Yi · Guangyang Wu · Guangtao Zhai · Xiaohong Liu
|
||
Moderating the Generalization of Score-based Generative Model
Wan Jiang · He Wang · Xin Zhang · Dan Guo · Zhaoxin Fan · Yunfeng Diao · Richang Hong
|
||
TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception
Khurram Azeem Hashmi · Karthik Suresh · Didier Stricker · Muhammad Zeshan Afzal
|
||
Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning
Borui Kang · Lei Wang · Zhiping Wu · Tao Feng · Yawen Li · Yang Gao · Wenbin Li
|
||
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang · Yupeng Zhou · Kai Wang · Yikai Wang · Zhen Li · Daquan Zhou · Shaohui Jiao · Qibin Hou · Ming-Ming Cheng
|
||
Active Learning Meets Foundation Models: Fast Remote Sensing Data Annotation for Object Detection
Marvin Burges · Philipe Dias · Dalton Lunga · Carson Woody · Sarah Walters
|
||
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
Hao LU · Yuting Zhang · Jiaqi Tang · Bowen Fu · Wenhang Ge · Wei Wei · Kaishun Wu · Ying-Cong Chen
|
||
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini · Luca Giammarino · Lorenzo Rebotti · Giorgio Grisetti · Martin Oswald
|
||
RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro Bassi · Mehmet Yavuz · Ibrahim Ethem Hamamci · Sezgin Er · Xiaoxi Chen · Wenxuan Li · Bjoern Menze · Sergio Decherchi · Andrea Cavalli · Kang Wang · Yang Yang · Alan Yuille · Zongwei Zhou
|
||
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
Hongcheng Gao · Tianyu Pang · Chao Du · Taihang Hu · Zhijie Deng · Min Lin
|
||
Lightcity: An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions
Jingjing Wang · Qirui Hu · Chong Bao · Yuke Zhu · Hujun Bao · Zhaopeng Cui · Guofeng Zhang
|
||
Interpretable point cloud classification using multiple instance learning
Matt De Vries · Reed Naidoo · Olga Fourkioti · Lucas Dent · Nathan Curry · Chris Dunsby · Chris Bakal
|
||
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li · Yibing Song · Ming-Ming Cheng · Xiang Li · jian Yang
|
||
Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann Manifold
Jaeho Shin · Hyeonjae Gil · Junwoo Jang · Maani Ghaffari · Ayoung Kim
|
||
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
Tao Wang · Peiwen Xia · Bo Li · Peng-Tao Jiang · Zhe Kong · Kaihao Zhang · Tong Lu · Wenhan Luo
|
||
DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son · Matheus Gadelha · Yang Zhou · Matthew Fisher · Zexiang Xu · Yi-Ling Qiao · Ming Lin · Yi Zhou
|
||
From Abyssal Darkness to Blinding Glare: A Benchmark on Extreme Exposure Correction in Real World
Bo Wang · Huiyuan Fu · Zhiye Huang · Siru Zhang · Xin Wang · Huadong Ma
|
||
Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation
Bozhong Zheng · Jinye Gan · Xiaohao Xu · Xintao Chen · Wenqiao Li · Xiaonan Huang · Na Ni · Yingna Wu
|
||
Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement
Priyank Pathak · Yogesh Rawat
|
||
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad · Maha Shadaydeh · Joachim Denzler
|
||
Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning
Melanie Götz · Torsten Krauß · Alexandra Dmitrienko
|
||
Scheduling Weight Transitions for Quantization-Aware Training
Junghyup Lee · Jeimin Jeon · Dohyung Kim · Bumsub Ham
|
||
SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations
Qi Zhang · Chi Huang · Qian Zhang · Nan Li · Wei Feng
|
||
Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation
Qi Guo · Zhen Tian · Minghao Yao · Saiyu Qi · Yong Qi · Bingyi Liu
|
||
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin · Jeongsoo Choi · Puyuan Peng · Joon Chung Chung · Tae-Hyun Oh · David Harwath
|
||
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
yifei xia · Suhan Ling · Fangcheng Fu · Yujie Wang · Huixia Li · Xuefeng Xiao · Bin CUI
|
||
Controllable Weather Simulation and Removal with Video Diffusion Models
Chih-Hao Lin · Zian Wang · Ruofan Liang · Yuxuan Zhang · Sanja Fidler · Shenlong Wang · Zan Gojcic
|
||
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA · Bohan An · Ao Shen · Mingzhi Yuan · Minghong Duan · Manning Wang
|
||
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang · Shweta Bhardwaj · Tianyi Zhou
|
||
HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity
Yida Wang · Xueyang Zhang · Kun Zhan · Peng Jia · XianPeng Lang
|
||
No Pose at All : Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
Ranran Huang · Krystian Mikolajczyk
|
||
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang · Chuanxia Zheng · Iro Laina · Diane Larlus · Andrea Vedaldi
|
||
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Tianli Liao · Chenyang Zhao · Lei Li · Heling Cao
|
||
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang · Dongyoung Choi · Donggun Kim · Woohyun Kang · Min Kim
|
||
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin · Jiading Fang · Muhammad Zubair Irshad · Vitor Campagnolo Guizilini · Rares Ambrus · Greg Shakhnarovich · Matthew Walter
|
||
Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement
Mostofa Rafid Uddin · Jana Armouti · Min Xu
|
||
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang · Lixiang Ru · Kang Wu · Lei Yu · Lei Liang · Yansheng Li · Jingdong Chen
|
||
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU · Yihao LIU · Yanjie Li · Bin Xiao
|
||
ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios
Jun Yin · Pengyu Zeng · Licheng Shen · Miao Zhang · Jing Zhong · Yuxing Han · Shuai Lu
|
||
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
Ruijie Zhu · Mulin Yu · Linning Xu · Lihan Jiang · Yixuan Li · Tianzhu Zhang · Jiangmiao Pang · Bo Dai
|
||
SEAL: Semantic Aware Image Watermarking
Kasra Arabi · R. Teal Witter · Chinmay Hegde · Niv Cohen
|
||
Weakly-Supervised Learning of Dense Functional Correspondences
Stefan Stojanov · Linan Zhao · Yunzhi Zhang · Daniel Yamins · Jiajun Wu
|
||
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang · Yucheng Zhao · Tiancai Wang · Haoqiang Fan · Xiangyu Zhang · Zhaoxiang Zhang
|
||
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li · Zhen Xing · Rui Wang · Hui Zhang · Qi Dai · Zuxuan Wu
|
||
FED-PsyAU: Privacy-Preserving Micro-Expression Recognition via Psychological AU Coordination and Dynamic Facial Motion Modeling
Jingting Li · Yu Qian · Lin Zhao · Su-Jing Wang
|
||
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song · Jiamu Wang · Anh Nguyen · Keunho Byeon · Sangjeong Ahn · Sung Hak Lee · Jin Tae Kwak
|
||
Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking
Qiangqiang Wu · Yi Yu · Chenqi Kong · Ziquan Liu · Jia Wan · Haoliang Li · Alex Kot · Antoni Chan
|
||
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian · Ruize Han · Junhui Hou · Linqi Song · Wei Feng
|
||
EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Zexuan Yan · Yue Ma · Chang Zou · Wenteng Chen · Qifeng Chen · Linfeng Zhang
|
||
HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation
Lingxiao Li · Kaixuan Fan · Boqing Gong · Xiangyu Yue
|
||
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery
Xiao Liu · Nan Pu · Haiyang Zheng · Wenjing Li · Nicu Sebe · Zhun Zhong
|
||
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu · Ying Wen · Longzhen Yang · Lianghua He · Heng Tao Shen
|
||
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li · Qiang Wang · Ke Yan · Shouhong Ding · Yuan Gao · Gui-Song Xia
|
||
Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising
Sébastien Herbreteau · Michael Unser
|
||
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
Bin Fu · Zixuan Wang · Kainan Yan · Shitian Zhao · Qi Qin · Jie Wen · Junjun He · Peng Gao
|
||
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
Liying Yang · Chen Liu · Zhenwei Zhu · Ajian Liu · Hui Ma · Jian Nong · Yanyan Liang
|
||
PS-Mamba: Spatial-Temporal Graph Mamba for Pose Sequence Refinement
Haoye Dong · Gim Hee Lee
|
||
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
Guanjie Chen · Xinyu Zhao · Yucheng Zhou · Xiaoye Qu · Tianlong Chen · Yu Cheng
|
||
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
CHANGHEE YANG · Hyeonseop Song · Seokhun Choi · Seungwoo Lee · Jaechul Kim · Hoseok Do
|
||
AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes
Tianyi Xu · Fan Zhang · Boxin Shi · Tianfan Xue · Yujin Wang
|
||
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma · Zhen-Liang Ni · Xinghao Chen
|
||
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Lee Lee · Wongi Jeong · Woojae Han · KYOUNGBUN LEE · Se Young Chun
|
||
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG · Haonan He · Maria Parelli · Christoph Gebhardt · Zicong Fan · Jie Song
|
||
MambaML: Exploring State Space Models for Multi-Label Image Classification
Xuelin Zhu · Jian liu · Jiuxin Cao · Bing WANG
|
||
Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection
Giacomo D'Amicantonio · Snehashis Majhi · Quan Kong · Lorenzo Garattoni · Gianpiero Francesca · Egor Bondarev · Francois Bremond
|
||
Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
Yanzuo Lu · Yuxi Ren · Xin Xia · Shanchuan Lin · XING WANG · Xuefeng Xiao · Jinhua Ma · Xiaohua Xie · Jianhuang Lai
|
||
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang · Hao Zhu · Delong Wu · Di Kang · Linchao Bao · Xun Cao · Zhan Ma
|
||
CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization
Jan Ackermann · Jonas Kulhanek · Shengqu Cai · Haofei Xu · Marc Pollefeys · Gordon Wetzstein · Leonidas Guibas · Songyou Peng
|
||
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Li Huaqiu · Yong Wang · Tongwen Huang · Hailang Huang · Haoqian Wang · Xiangxiang Chu
|
||
MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval
Jaeseok Byun · Young Kyun Jang · Seokhyeon Jeong · Donghyun Kim · Taesup Moon
|
||
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
shengqi dang · Yi He · Long Ling · Ziqing Qian · Nanxuan Zhao · Nan Cao
|
||
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du · Hui Li · Han Xu · Paul Jeon · Dongwook Lee · Daehyun Ji · Ran Yang · Feng Zhu
|
||
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
Zhenyu Yan · Jian Wang · Aoqiang Wang · Yuhan Li · Wenxiang Shang · Zhu Hangcheng
|
||
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng · Haochen Wang · Yucheng Zhao · Weipeng DENG · Tiancai Wang · Xiangyu Zhang · Xiaojuan Qi
|
||
Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
Luca Bartolomei · Enrico Mannocci · Fabio Tosi · Matteo Poggi · Stefano Mattoccia
|
||
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
Zonglin Di · Jing Shi · Yifei Fan · Hao Tan · Alexander Black · John Collomosse · Yang Liu
|
||
MotionFollower: Editing Video Motion via Score-Guided Diffusion
Shuyuan Tu · Qi Dai · Zihao Zhang · Sicheng Xie · Zhi-Qi Cheng · Chong Luo · Xintong Han · Zuxuan Wu · Yu-Gang Jiang
|
||
Event-based Tiny Object Detection: A Benchmark Dataset and Baselines
Nuo Chen · Chao Xiao · Yimian Dai · Shiman He · Miao Li · Wei An
|
||
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci · Shanyan Guan · Yanhao Ge · Yilin Zhang · Wei Li · Zhenyu Zhang · Jian Yang · Ying Tai
|
||
Quanta Vision: From Photons to Perception
Varun Sundar · Tianyi Zhang · Sacha Jungerman · Mohit Gupta
|
||
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
Xiaoyu Zhou · Jingqi Wang · Yongtao Wang · Yufei Wei · Nan Dong · Ming-Hsuan Yang
|
||
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
Inwoo Hwang · Bing Zhou · Young Kim Kim · Jian Wang · chuan guo
|
||
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du · Yida Wang · Haiyang Sun · Zhuojie Wu · Hongwei Sheng · Shuyun Wang · Jiaying Ying · Ming Lu · Tianqing Zhu · Kun Zhan · Xin Yu
|
||
SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Tom Wolf Wolf · Emre Kavak · Fabian Bongratz · Christian Wachinger
|
||
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG · Jiahui Liu · Zhengzhe Liu · Xiaoyang Lyu · Yi-Hua Huang · Xin Tao · Pengfei Wan · Di ZHANG · Xiaojuan Qi
|
||
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling
Xiaoxue Chen · Bhargav Chandaka · Chih-Hao Lin · Ya-Qin Zhang · David Forsyth · Hao Zhao · Shenlong Wang
|
||
FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging
Xin You · Runze Yang · Chuyan Zhang · Zhongliang Jiang · JIE YANG · Nassir Navab
|
||
Multi-modal Identity Extraction
Ryan Webster · Teddy Furon
|
||
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Tobias Kirschstein · Javier Romero · Artem Sevastopolsky · Matthias Nießner · Shunsuke Saito
|
||
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng · Jieyu Zhang · Mohammadreza Salehi · Ziqi Gao · Vishnu Iyengar · Norimasa Kobori · Quan Kong · Ranjay Krishna
|
||
Instruction-based Image Editing with Planning, Reasoning, and Generation
Liya Ji · Chenyang Qi · Qifeng Chen
|
||
Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting
Yian Zhao · rushi ye · Ruochong Zheng · Zesen Cheng · Chaoran Feng · Jiashu Yang · Pengchong Qiao · Chang Liu · Jie Chen
|
||
COSTARR: Consolidated Open Set Technique with Attenuation for Robust Recognition
Ryan Rabinowitz · Steve Cruz · Walter Scheirer · Terrance Boult
|
||
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner · Thomas Köhler · Lukas Radl · Felix Windisch · Dieter Schmalstieg · Markus Steinberger
|
||
GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting
Xiaobao Wei · Peng Chen · Guangyu Li · Ming Lu · Hui Chen · Feng Tian
|
||
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng · Albert Zhai · Evan Chen · Rui Zhou · Yawen Deng · Zitong Li · Kejie Zhao · Janice Shiu · Qianyu Zhao · Yide Xu · Xinlei Wang · Yuan Shen · Sheng Wang · Lisa Ainsworth · Kaiyu Guan · Shenlong Wang
|
||
RTMap: Real-Time Recursive Mapping with Change Detection and Localization
Yuheng Du · Sheng Yang · Lingxuan Wang · Zhenghua.Hou Zhenghua.Hou · Chengying Cai · Zhitao Tan · Mingxia Chen · Shi-Sheng Huang · Qiang Li
|
||
DePR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion Priors
Qingcheng Zhao · Xiang Zhang · Haiyang Xu · Zeyuan Chen · Jianwen Xie · Yuan Gao · Zhuowen Tu
|
||
PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction
Jiahui Ren · Mochu Xiang · Jiajun Zhu · Yuchao Dai
|
||
Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors
Shida Sun · Yue Li · Yueyi Zhang · Zhiwei Xiong
|
||
GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields
Shunsuke Yasuki · Taiki Miyanishi · Nakamasa Inoue · Shuhei Kurita · Koya Sakamoto · Daichi Azuma · Masato Taki · Yutaka Matsuo
|
||
Fast Globally Optimal and Geometrically Consistent 3D Shape Matching
Paul Roetzer · Florian Bernard
|
||
Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal · Naveen Sendhilnathan · Ting Zhang · Yue Liu · Michael Proulx · Michael Iuzzolino · Chuan Qin · Tanya Jonker
|
||
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee · Fangwen Shu · Yago Sanchez de la Fuente · Thomas Schierl · Cornelius Hellge
|
||
RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
Yuwen Du · Anning Hu · Zichen Chao · Yifan Lu · Junhao Ge · Genjia Liu · Wei-Tao Wu · Lanjun Wang · Siheng Chen
|
||
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali · Willi Menapace · Aliaksandr Siarohin · Ivan Skorokhodov · Alper Canberk · Kwot Sin Lee · Vicente Ordonez · Sergey Tulyakov
|
||
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao · Xiangbo Shu · Xin Jiang · Rui Yan · Yazhou Yao · Jinhui Tang
|
||
Video Motion Graphs
Haiyang Liu · Zhan Xu · Fating Hong · Hsin-Ping Huang · Yi Zhou · Yang Zhou
|
||
FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation
Tao Gong · Qi Chu · Bin Liu · Zhou Wei · Nenghai Yu
|
||
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent · Kyle Hsu · Justin Johnson · Li Fei-Fei · Jiajun Wu
|
||
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu · Tianlin Pan · Chenyang Si · Zhaoxi Chen · Wangmeng Zuo · Ziwei Liu · Kwan-Yee K. Wong
|
||
Learning an Implicit Physics Model for Image-based Fluid Simulation
Emily Jia · Jiageng Mao · Zhiyuan Gao · Yajie Zhao · Yue Wang
|
||
Backdoor Attacks on Neural Networks via One-Bit Flip
Xiang Li · Lannan Luo · Qiang Zeng
|
||
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Zhu Yihang · Jinhao Zhang · Yuxuan Wang · Aming WU · Cheng Deng
|
||
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal In-Context Learning In Text-to-Image Models
Teng-Fang Hsiao · Bo-Kai Ruan · Yi-Lun Wu · Tzu-Ling Lin · Hong-Han Shuai
|
||
Disentangled Clothed Avatar Generation with Layered Representation
Weitian Zhang · Yichao Yan · Sijing Wu · Manwen Liao · Xiaokang Yang
|
||
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments
Xuan Yao · Junyu Gao · Changsheng Xu
|
||
SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing
Heyi Sun · Cong Wang · Tian-Xing Xu · Jingwei Huang · Di Kang · Chunchao Guo · Song-Hai Zhang
|
||
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li · Xincheng Shuai · Hao Luo · Henghui Ding
|
||
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ZIYU ZHU · Xilin Wang · Yixuan Li · Zhuofan Zhang · Xiaojian Ma · Yixin Chen · Baoxiong Jia · Wei Liang · Qian Yu · Zhidong Deng · Siyuan Huang · Qing Li
|
||
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction
Weirong Chen · Ganlin Zhang · Felix Wimbauer · Rui Wang · Nikita Araslanov · Andrea Vedaldi · Daniel Cremers
|
||
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou · Tong Wang · Yunkai Ma · Xiao Tan · Fengshui Jing
|
||
WarpHE4D: Dense 4D Head Map toward Full Head Reconstruction
Jongseob Yun · Yong-Hoon Kwon · Min-Gyu Park · Ju-Mi Kang · Min-Ho Lee · Inho Chang · Ju Yoon · Kuk-Jin Yoon
|
||
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure · Jia-Fong Yeh · Min-Hung Chen · Hung-Ting Su · Shang-Hong Lai · Winston Hsu
|
||
UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang · Siqi Li · Yawei Zhang · Shangshu Yu · Shenghai Yuan · Rui She · Quanjiang Guo · JinXuan Zheng · Ong Howe · Leonrich Chandra · Shrivarshann Srijeyan · Aditya Sivadas · Toshan Aggarwal · Heyuan Liu · Hongming Zhang · CHEN CHUJIE · JIANG JUNYU · Lihua Xie · Wee Peng Tay
|
||
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling · Chen Zhu · Meiqi Wu · Hangyu Li · Xiaokun Feng · Cundian Yang · Aiming Hao · Jiashu Zhu · Jiahong Wu · Xiangxiang Chu
|
||
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen · Yizhou Wang · SHIXIANG TANG · Qianhong Ma · Tong He · Wanli Ouyang · Xiaowei Zhou · Hujun Bao · Sida Peng
|
||
ReTracker: Exploring Image Matching for Robust Online Any Point Tracking
Dongli Tan · Xingyi He · Sida Peng · Yiqing Gong · Xing Zhu · Jiaming Sun · Ruizhen Hu · Yujun Shen · Hujun Bao · Xiaowei Zhou
|
||
H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction
Heng Jia · Na Zhao · Linchao Zhu
|
||
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li · Chuanxia Zheng · Christian Rupprecht · Andrea Vedaldi
|
||
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang · Qirui Chen · Cilin Yan · Jiayin Cai · Xiaolong Jiang · Yao Hu · Weidi Xie · Stratis Gavves
|
||
ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
Yuanhe Guo · Linxi Xie · Zhuoran Chen · Kangrui Yu · Ryan Po · Guandao Yang · Gordon Wetzstein · Hongyi Wen
|
||
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun · Zheng Fang · Jiaxu Wang · Kun-Yi Zhang · Qiang Zhang · Renjing Xu
|
||
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
Jing Wang · Rui Zhao · Ruiqin Xiong · Xingtao Wang · Xiaopeng Fan · Tiejun Huang
|
||
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
Weiwei Cao · Jianpeng Zhang · Zhongyi Shui · Sinuo Wang · Zeli Chen · Xi Li · Le Lu · Xianghua Ye · Qi Zhang · Tingbo Liang · Ling Zhang
|
||
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
Andrea Simonelli · Norman Müller · Peter Kontschieder
|
||
Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang · Bowen Zhao · Zhengming Ding · Wei Feng · Quanxue Gao
|
||
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su · Zhongtao Wang · Huishan Au · Yilong Li · Xizhe Cao · Chengwei Pan · Yisong Chen · Guoping Wang
|
||
Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection
Xuehan Chen · Guangyu Ren · Tianhong Dai · Tania Stathaki · Hengyan Liu
|
||
OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization
Saihui Hou · Panjian Huang · Zengbin Wang · Yuan Liu · Zeyu Li · Man Zhang · Yongzhen Huang
|
||
FlowChef: Steering of Rectified Flow Models for Controlled Generations
Maitreya Patel · Song Wen · Dimitris Metaxas · Yezhou Yang
|
||
SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance
Wenjin Zhang · Xinyu Li · Chenyang Gao · Ivan Marsic
|
||
Medical World Model
Yijun Yang · Zhao-Yang Wang · Qiuping Liu · Shu Wen Sun · Kang Wang · Rama Chellappa · Zongwei Zhou · Alan Yuille · Lei Zhu · Yu-Dong Zhang · Jieneng Chen
|
||
What we need is explicit controllability: Training 3D gaze estimator using only facial images
Tingwei Li · Jun Bao · Zhenzhong Kuang · Buyu Liu
|
||
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Wooseong Jeong · Kuk-Jin Yoon
|
||
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun · Ziyang Chu · Pan Zhang · Tong Wu · Xiaoyi Dong · Yuhang Zang · Yuanjun Xiong · Dahua Lin · Jiaqi Wang
|
||
Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction
Yanwen Fang · Wenqi Jia · Xu Cao · Peng-Tao Jiang · Guodong Li · Jintai CHEN
|
||
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
Hongwei Lin · Dongyu Pan · Qiming Xia · Hai Wu · Cheng Wang · Siqi Shen · Chenglu Wen
|
||
A Recipe for Generating VR Worlds from a Single Image
Katja Schwarz · Denis Rozumny · Samuel Rota Bulò · Lorenzo Porzi · Peter Kontschieder
|
||
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas · Pierluca D'Oro · Koustuv Sinha · Adriana Romero-Soriano · Michal Drozdzal · Aishwarya Agrawal
|
||
Make Me Happier: Evoking Emotions Through Image Diffusion Models
Qing Lin · Jingfeng Zhang · YEW-SOON ONG · Mengmi Zhang
|
||
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng · Kyle Genova · Songyou Peng · Gordon Wetzstein · Noah Snavely · Leonidas Guibas · Thomas Funkhouser
|
||
Rethinking Key-frame-based Micro-expression Recognition: A Robust and Accurate Framework Against Key-frame Errors
Zheyuan Zhang · Weihao Tang · Hong Chen
|
||
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng · Tao Huang · Peng Wang · Zizhou Huang · Haihang Zhang · Yuntao Zou · Dagang Li · Kaifeng Zou
|
||
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia · Candace Ross · Karen Ullrich · Sumit Chopra · Adriana Romero-Soriano · Melissa Hall · Matthew Muckley
|
||
BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration
Hanyuan Liu · Chengze Li · Minshan Xie · Wang Zhenni · Jiawen Liang · Chi LEUNG · Tien-Tsin Wong
|
||
TurboReg: TurboClique for Robust and Efficient Point Cloud Registration
Shaocheng Yan · Pengcheng Shi · Zhenjun Zhao · Kaixin Wang · Kuang Cao · Ji Wu · Jiayuan Li
|
||
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
Qingyu Shi · Jianzong Wu · Jinbin Bai · Lu Qi · Jiangning Zhang · Yunhai Tong · Xiangtai Li
|
||
Consensus-Driven Active Model Selection
Justin Kay · Grant Horn · Subhransu Maji · Daniel Sheldon · Sara Beery
|
||
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou · Tao Zhang · Shilin Xu · Shihao Chen · Qianyu Zhou · Yunhai Tong · Shunping Ji · Jiangning Zhang · Lu Qi · Xiangtai Li
|
||
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su · Mengying Xie · Nieqing Cao · Yan Ding · Beichen Shao · Xianlei Long · Fuqiang Gu · Chao Chen
|
||
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
Ma Teng · Xiaojun Jia · Ranjie Duan · Xinfeng Li · Yihao Huang · Xiaoshuang Jia · Zhixuan Chu · Wenqi Ren
|
||
Federated Continuous Category Discovery and Learning
Lixu Wang · Chenxi Liu · Junfeng Guo · Qingqing Ye · Heng Huang · Haibo Hu · Wei Dong
|
||
Continual Personalization for Diffusion Models
Yu-Chien Liao · Jr-Jen Chen · Chi-Pin Huang · Ci-Siang Lin · Meng-Lin Wu · Yu-Chiang Frank Wang
|
||
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning
qian feng · Jiahang Tu · Mintong Kang · Hanbin Zhao · Chao Zhang · Hui Qian
|
||
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu · Dong Li · Yuxin Ma · Jie Huang · Wenlong Zhang · Xueyang Fu · Zheng-Jun Zha
|
||
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
Jiannan Ge · Lingxi Xie · Hongtao Xie · Pandeng Li · Sun-Ao Liu · XIAOPENG ZHANG · Qi Tian · Yongdong Zhang
|
||
Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm
Hongqiu Wang · Wu Chen · Xiangde Luo · Zhaohu Xing · Lihao Liu · Jing Qin · Shaozhi Wu · Lei Zhu
|
||
GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai · Jiaqi Zhu · Songru Jiang · Wei Huang · Tao Lu · Yuanqi Li · Jie Guo · Runze Fu · Yanwen Guo · Lijun Chen
|
||
FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling
Zhenghao Gao · Shengjie Xu · Zijing Li · Meixi Chen · Chaojian Yu · Yuanjie Shao · Changxin Gao
|
||
DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models
Hyeonwoo Kim · Sangwon Baik · Hanbyul Joo
|
||
StableCodec: Taming One-Step Diffusion for Extreme Image Compression
Tianyu Zhang · Xin Luo · Li Li · Dong Liu
|
||
Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Shengkai Sun · Zefan Zhang · Jianfeng Dong · Zhiyong Cheng · Xiaojun Chang · Meng Wang
|
||
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
Hoang Phan · Tung Lam Tran · Quyen Tran · Ngoc Tran · Tuan Truong · Qi Lei · Nhat Ho · Dinh Phung · Trung Le
|
||
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Changwoon Choi · Jeongjun Kim · Geonho Cha · Minkwan Kim · Dongyoon Wee · Young Kim Kim
|
||
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang · Hongzhen Wang · Di Wang · Zonghao Guo · Zhenyu Zhong · Long Lan · Wenjing Yang · Jing Zhang
|
||
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng · Huanan LI · Juntao Guan · Rui Fan · Tong Wu · Xilong Wang · Lai Rui
|
||
EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
Athinoulla Konstantinou · Georgios Leontidis · Mamatha Thota · Aiden Durrant
|
||
Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions
Thomas Dagès · Michael Lindenbaum · Alfred Bruckstein
|
||
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li · Meng Tian · Zhenyu Lin · Jiangtong Zhu · Dechang Zhu · Haiqiang Liu · Yueyi Zhang · Zhiwei Xiong · Xinhai Zhao
|
||
EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision
Dmitrii Torbunov · Yihui Ren · Animesh Ghose · Odera Dim · Yonggang Cui
|
||
InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
Zhuoran Yang · Xi Guo · Chenjing Ding · Chiyu Wang · Wei Wu · Yanyong Zhang
|
||
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li · Hong-Xing Yu · Wei Liu · Yin Yang · Charles Herrmann · Gordon Wetzstein · Jiajun Wu
|
||
Training-Free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao · Jiale Yuan · Zhiyuan Xu · Xiaoshuai Hao · Xinyi Zhang · Kun Wu · Zhengping Che · Chi Liu · Jian Tang
|
||
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
Jin Hu · Mingjia Li · Xiaojie Guo
|
||
FlowStyler: Artistic Video Stylization via Transformation Fields Transports
YuNing Gong · Jiaming Chen · Xiaohua Ren · Yuanjun Liao · Yanci Zhang
|
||
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng · Yuxi Ren · Xin Xia · Xuefeng Xiao · Xiaohua Xie
|
||
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
Jin Cao · Hongrui Wu · Ziyong Feng · Hujun Bao · Xiaowei Zhou · Sida Peng
|
||
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
Emmanuelle Bourigault · Amir Jamaludin · Abdullah Hamdi
|
||
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang · Xinyao Yu · Xinhang Song · Yiyao Wang · Shuqiang Jiang
|
||
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon · Cheol-Ho Cho · Woojin Jun · Minho Shim · Taeoh Kim · Inwoong Lee · Dongyoon Wee · Jae-Pil Heo
|
||
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin · Ruohan Gao
|
||
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu · Hongtao Wu · Song Guo · Yuxiao Liu · Chilam Cheang · Tao Kong
|
||
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
Meng Tian · Shuo Yang · Xinxiao Wu
|
||
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Vision-Language Model Inference
Kai Huang · hao zou · Bochen Wang · Xi Ye · Zhen Xie · Hao Wang
|
||
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen · Shuchen Xue · Yuyang Zhao · Jincheng YU · Sayak Paul · Junyu Chen · Han Cai · Enze Xie · Song Han
|
||
Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning
Gaurav Patel · Qiang Qiu
|
||
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Zixuan Hu · Dongxiao Li · Xinzhu Ma · SHIXIANG TANG · Xiaotong Li · Wenhan Yang · LINGYU DUAN
|
||
Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling
Wenting Luan · Siqi Lu · Yongbin Zheng · Wanying XU · Lang Nie · Zongtan Zhou · Kang Liao
|
||
TrackVerse: A Large-scale Dataset of Object Tracks for Visual Representation Learning
Yibing Wei · Samuel Church · Victor Suciu · Jinhong Lin · Cheng-En Wu · Pedro Morgado
|
||
RnGCam: High-speed video from rolling & global shutter measurements
Kevin Tandi · Xiang Dai · Chinmay Talegaonkar · Gal Mishne · Nicholas Antipa
|
||
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang · Weilong Dai · Jinlong Liu · Wanggui He · Hao Jiang · Mingli Song · Jingyuan CHEN · Chang Yao · Jie Song
|
||
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
Dejie Yang · Zijing Zhao · Yang Liu
|
||
Closed-Loop Transfer for Weakly-supervised Affordance Grounding
Jiajin Tang · Zhengxuan Wei · Ge Zheng · Sibei Yang
|
||
Cross-Subject Mind Decoding from Inaccurate Representations
Yangyang Xu · Bangzhen Liu · Wenqi Shao · Yong Du · Shengfeng He · Tingting Zhu
|
||
NeuFrameQ: Neural Frame Fields for Scalable and Generalizable Anisotropic Quadrangulation
Ying-Tian Liu · Jiajun Li · Yu-Tao Liu · Xin Yu · Yuan-Chen Guo · Yan-Pei Cao · Ding Liang · Ariel Shamir · Song-Hai Zhang
|
||
PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image
Hyeongjin Nam · Donghwan Kim · Gyeongsik Moon · Kyoung Mu Lee
|
||
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Jessica Bader · Leander Girrbach · Stephan Alaniz · Zeynep Akata
|
||
Image as an IMU: Estimating Camera Velocity from a Single Motion-Blurred Image
Jerred Chen · Ronald Clark
|
||
Epona: Autoregressive Diffusion World Model for Autonomous Driving
Kaiwen Zhang · Zhenyu Tang · Xiaotao Hu · Xingang Pan · Xiaoyang Guo · Yuan Liu · Jingwei Huang · Li Yuan · Qian Zhang · XIAOXIAO LONG · Xun Cao · Wei Yin
|
||
ZFusion: Efficient Deep Compositional Zero-shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior
Alireza Esmaeilzehi · Hossein Zaredar · Yapeng Tian · Laleh Seyyed-Kalantari
|
||
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino · Pierluigi Zama Ramirez · Luigi Lella · Matteo Ragaglia · Alessandro Oliva · Giuseppe Lisanti · Luigi Stefano
|
||
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov · Matan Kleiner · Inbar Huberman-Spiegelglas · Tomer Michaeli
|
||
Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning
Hung-Chieh Fang · Hsuan-Tien Lin · Irwin King · Yifei Zhang
|
||
A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks
Qi Bi · Jingjun Yi · Huimin Huang · Hao Zheng · Haolan Zhan · Wei Ji · Yawen Huang · Yuexiang Li · Yefeng Zheng
|
||
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus · Carl Doersch · Yi Yang · Skanda Koppula · Viorica Patraucean · Xu He · Ignacio Rocco · Mehdi Sajjadi · Sarath Chandar · Ross Goroshin
|
||
Gaussian Splatting with Discretized SDF for Relightable Assets
Zuo-Liang Zhu · jian Yang · Beibei Wang
|
||
imHead: A large-scale implicit morphable model for localized head modeling
Rolandos Alexandros Potamias · Stathis Galanakis · Jiankang Deng · Athanasios Papaioannou · Stefanos Zafeiriou
|
||
Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression
Haowei Kuang · Wenhan Yang · Zongming Guo · Jiaying Liu
|
||
MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model
Yaoye Zhu · Zhe Wang · Yan Wang
|
||
Describe Anything: Detailed Localized Image and Video Captioning
Long Lian · Yifan Ding · Yunhao Ge · Sifei Liu · Hanzi Mao · Boyi Li · Marco Pavone · Ming-Yu Liu · Trevor Darrell · Adam Yala · Yin Cui
|
||
Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection
Hyewon Park · Hyejin Park · Jueun Ko · Dongbo Min
|
||
DriveX: Panoptic Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi · Shaoshuai Shi · Kehua Sheng · Bo Zhang · Li Jiang
|
||
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Hyung Kyu Kim · Sangmin Lee · HAK GU KIM
|
||
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
Leon Sick · Dominik Engel · Sebastian Hartwig · Pedro Hermosilla · Timo Ropinski
|
||
CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations
Caner Korkmaz · Brighton Nuwagira · Baris Coskunuzer · Tolga Birdal
|
||
Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation
Tiankai Chen · Yushu Li · Adam Goodge · Fei Teng · Xulei Yang · Tianrui Li · Xun Xu
|
||
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang · Sheng Miao · Bangbang Yang · Yuewen Ma · Yiyi Liao
|
||
CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception
Jiaru Zhong · Jiahao Wang · Jiahui Xu · Xiaofan Li · Zaiqing Nie · Haibao Yu
|
||
$\pi$-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang · Chao Huang · Yunlong Tang · Zeliang Zhang · Chenliang Xu
|
||
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
David Svitov · Pietro Morerio · Lourdes Agapito · ALESSIO DEL BUE
|
||
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana · Baoming Zhang · Leonid Karlinsky · Rogerio Feris · Yunhui Guo
|
||
LONG3R: Long Sequence Streaming 3D Reconstruction
Zhuoguang Chen · Minghui Qin · Tianyuan Yuan · Zhe Liu · Hang Zhao
|
||
Blended Point Cloud Diffusion for Localized Text-guided Shape Editing
Etai Sella · Noam Atia · Ron Mokady · Hadar Averbuch-Elor
|
||
Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives
ziyu zhang · Binbin Huang · Hanqing Jiang · Liyang Zhou · Xiaojun Xiang · Shuhan Shen
|
||
Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection
Jae Young Kang · Hoonhee Cho · Kuk-Jin Yoon
|
||
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho · Junwan Kim · Jisoo Kim · Minseo Kim · Mingu Kang · Sungeun Hong · Tae-Hyun Oh · Youngjae Yu
|
||
Customizing Domain Adapters for Domain Generalization
Yuyang Ji · Zeyi Huang · Haohan Wang · Yong Jae Lee
|
||
Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity
Shouwen Wang · Qian Wan · Junbin Gao · Zhigang Zeng
|
||
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors
Avinash Paliwal · xilong zhou · Wei Ye · Jinhui Xiong · Rakesh Ranjan · Nima Kalantari
|
||
Learning to See in the Extremely Dark
Hai Jiang · Binhao Guan · Zhen Liu · Xiaohong Liu · Jian Yu · Zheng Liu · Songchen Han · Shuaicheng Liu
|
||
Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data
Nithin Gopalakrishnan Nair · Srinivas Kaza · Xuan Luo · Jungyeon Park · Stephen Lombardi · Vishal Patel
|
||
Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer
Hai Wu · Hongwei Lin · Xusheng Guo · Xin Li · Mingming Wang · Cheng Wang · Chenglu Wen
|
||
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo · Yinghao Wu · Tianheng Cheng · Yong Liu · Yicheng Xiao · Hongfa Wang · Xiao-Ping Zhang · Yujiu Yang
|
||
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng · Pengxuan Yang · Zebin Xing · Qichao Zhang · Yuhang Zheng · Yinfeng Gao · Pengfei Li · Teng Zhang · Zhongpu Xia · Peng Jia · XianPeng Lang · Dongbin Zhao
|
||
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Giwon Lee · Wooseong Jeong · Daehee Park · Jaewoo Jeong · Kuk-Jin Yoon
|
||
InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow
Yiming Gong · Zhen Zhu · Minjia Zhang
|
||
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
Bowen Fu · Wei Wei · Jiaqi Tang · Jiangtao Nie · Yanyu Ye · Xiaogang Xu · Ying-Cong Chen · Lei Zhang
|
||
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu · Yizhou Wang · Xiangyu Yue · Xinzhu Ma · Jinyang Guo · Dongzhan Zhou · Wanli Ouyang · SHIXIANG TANG
|
||
One Look is Enough: Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation
Byeongjun Kwon · Munchurl Kim
|
||
Learning Streaming Video Representation via Multitask Training
Yibin Yan · Jilan Xu · Shangzhe Di · Yikun Liu · Yudi Shi · Qirui Chen · Zeqian Li · Yifei Huang · Weidi Xie
|
||
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Luo · Minh-Quan Dao · Zhenzhen Liu · Mark Campbell · Wei-Lun Chao · Kilian Weinberger · Ezio Malis · Vincent FREMONT · Bharath Hariharan · Mao Shan · Stewart Worrall · Julie Perez
|
||
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou · Keren Ye · Mauricio Delbracio · Peyman Milanfar · Vishal Patel · Hossein Talebi
|
||
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
Yingyan Li · Yuqi Wang · Yang Liu · Jiawei He · Lue Fan · Zhaoxiang Zhang
|
||
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders
Ilan Naiman · Emanuel Baruch Baruch · Oron Anschel · Alon Shoshan · Igor Kviatkovsky · Manoj Aggarwal · Gerard Medioni
|