Show Detail |
Timezone: Pacific/Honolulu
|
Filter Rooms:
SAT 18 OCT
1 p.m.
(ends 8:00 PM)
SUN 19 OCT
8 a.m.
Workshop:
(ends 12:00 PM)
Workshop:
(ends 12:00 PM)
Workshop:
(ends 12:00 PM)
Workshop:
(ends 12:30 PM)
Workshop:
(ends 5:00 PM)
8:15 a.m.
8:30 a.m.
Workshop:
(ends 12:00 PM)
8:45 a.m.
8:50 a.m.
8:55 a.m.
9 a.m.
Workshop:
(ends 6:00 PM)
Workshop:
(ends 12:30 PM)
Workshop:
(ends 12:00 PM)
Tutorial:
(ends 12:00 PM)
Tutorial:
(ends 5:00 PM)
9:15 a.m.
10 a.m.
noon
1 p.m.
Workshop:
(ends 6:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
1:30 p.m.
2 p.m.
Workshop:
(ends 6:00 PM)
3 p.m.
MON 20 OCT
8 a.m.
Workshop:
(ends 12:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 12:00 PM)
Workshop:
(ends 12:00 PM)
8:10 a.m.
8:15 a.m.
8:25 a.m.
Workshop:
(ends 6:00 PM)
8:30 a.m.
8:45 a.m.
9 a.m.
Workshop:
(ends 5:00 PM)
10 a.m.
noon
1 p.m.
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
1:30 p.m.
2 p.m.
Workshop:
(ends 6:00 PM)
3 p.m.
TUE 21 OCT
8:45 a.m.
Orals 9:00-10:15
[9:00]
GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
[9:15]
Scaling Laws for Native Multimodal Models
[9:30]
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
[9:45]
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
[10:00]
Token Activation Map to Visually Explain Multimodal LLMs
(ends 10:00 AM)
Orals 9:00-10:15
[9:00]
Multi-View 3D Point Tracking
[9:15]
Uncalibrated Structure from Motion on a Sphere
[9:30]
Removing Cost Volumes from Optical Flow Estimators
[9:45]
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
[10:00]
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
(ends 10:00 AM)
10 a.m.
10:15 a.m.
11:45 a.m.
Posters 11:45-1:45
Semi-ViM: Bidirectional State Space Model for Mitigating Label Imbalance in Semi-Supervised Learning
Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
(ends 1:45 PM)
1:30 p.m.
Orals 1:45-3:00
[1:45]
Variance-Based Pruning for Accelerating and Compressing Trained Networks
[2:00]
Importance-Based Token Merging for Efficient Image and Video Generation
[2:15]
Knowledge Distillation for Learned Image Compression
[2:30]
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
[2:45]
Heavy Labels Out! Dataset Distillation with Label Space Lightening
(ends 2:30 PM)
Orals 1:45-3:00
[1:45]
RayZer: A Self-supervised Large View Synthesis Model
[2:00]
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
[2:15]
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
[2:30]
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction
[2:45]
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
(ends 2:45 PM)
3 p.m.
Posters 3:15-5:15
GT-Mean Loss: A Simple Yet Effective Solution for Brightness Mismatch in Low-Light Image Enhancement
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
(ends 5:00 PM)
WED 22 OCT
8 a.m.
Orals 8:00-9:30
[8:00]
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
[8:15]
Towards a Unified Copernicus Foundation Model for Earth Vision
[8:30]
Learning Streaming Video Representation via Multitask Training
[8:45]
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
[9:00]
Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
[9:15]
GMMamba: Group Masking Mamba for Whole Slide Image Classification
(ends 9:15 AM)
Orals 8:00-9:30
[8:00]
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
[8:15]
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
[8:30]
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
[8:45]
Understanding Co-speech Gestures in-the-wild
[9:00]
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
[9:15]
Teeth Reconstruction and Performance Capture Using a Phone Camera
(ends 9:15 AM)
9:15 a.m.
9:30 a.m.
10:45 a.m.
Posters 10:45-1:15
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
(ends 12:45 PM)
1 p.m.
Orals 1:15-2:30
[1:15]
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3)
[1:30]
Certifiably Optimal Anisotropic Rotation Averaging
[1:45]
Deterministic Object Pose Confidence Region Estimation
[2:00]
RePoseD: Efficient Relative Pose Estimation With Known Depth Information
[2:15]
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
(ends 2:15 PM)
Orals 1:15-2:30
[1:15]
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
[1:30]
Generating Physically Stable and Buildable Brick Structures from Text
[1:45]
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
[2:00]
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
[2:15]
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
(ends 2:15 PM)
2:30 p.m.
Posters 2:45-4:45
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
(ends 4:00 PM)
4:45 p.m.
6:30 p.m.
THU 23 OCT
8 a.m.
Orals 8:00-9:30
[8:00]
ROAR: Reducing Inversion Error in Generative Image Watermarking
[8:15]
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
[8:30]
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
[8:45]
Counting Stacked Objects
[9:00]
MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration
[9:15]
Soft Local Completeness: Rethinking Completeness in XAI
(ends 9:15 AM)
Orals 8:00-9:30
[8:00]
LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
[8:15]
MikuDance: Animating Character Art with Mixed Motion Dynamics
[8:30]
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
[8:45]
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
[9:00]
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
[9:15]
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
(ends 9:15 AM)
10:45 a.m.
Posters 11:15-1:15
Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
(ends 12:45 PM)
11 a.m.
1 p.m.
Orals 1:15-2:30
[1:15]
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
[1:30]
E-SAM: Training-Free Segment Every Entity Model
[1:45]
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
[2:00]
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
[2:15]
ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
(ends 2:15 PM)
Orals 1:15-2:30
[1:15]
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
[1:30]
Diffusion Image Prior
[1:45]
Spatially-Varying Autofocus
[2:00]
Towards Foundational Models for Single-Chip Radar
[2:15]
Event-based Visual Vibrometry
(ends 2:15 PM)
2:30 p.m.
Posters 2:30-4:45
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
TeethGenerator: A two-stage framework for paired pre- and post-orthodontic 3D dental data generation
ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Geometric Alignment and Prior Modulation for View-Guided Point Cloud Completion on Unseen Categories
(ends 4:30 PM)