Poster
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson · Pauline Luc · Liliane Momeni · Maks Ovsjanikov · Guillaume Le Moing · Alina Kuznetsova · Ira Ktena · Jennifer J. Sun · Skanda Koppula · Dilara Gokay · Joseph Heyward · Etienne Pot · Andrew Zisserman
In recent years, there has been a proliferation of spatiotemporal foundation models for different scientific domains. While promising, these models are often domain-specific, limiting their applicability. Given that many spatiotemporal tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise.However, it remains an open question to what extent the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific domains, and whether a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five Scientific Video tasks, across medical computer vision, animal behavior, and weather forecasting.We adapt six leading video models to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating the potential for effective transfer learning. Specifically, we show that state-of-the-art results can be obtained in several applications by effectively transferring general-purpose representations from ViFM backbones. Furthermore, our results shed light on limitations of existing ViFMs, and highlight opportunities for the development of generalizable models for high-impact scientific applications.We will release our code to facilitate further research in cross-domain development of ViFMs.
Live content is unavailable. Log in and register to view live content