AI research has progressed from perception and generation to modeling dynamic, interactive environments—central to world models. These models are key to advancing general intelligence, enabling agents to perceive, understand, and interact with complex, real-world environments. This capability enhances robotics, autonomous systems, and scientific discovery by improving planning, decision-making, and adaptive interaction. Despite advancements, many world models prioritize visual fidelity over physical realism, often violating fundamental geometric and physical laws. Even physics-based simulators like Mujoco enforce Newtonian priors but lack adaptability to real-world stochastic phenomena. Additionally, most models remain passive predictors rather than interactive systems, limiting their ability to support embodied AI in dynamic tasks. Therefore, world models should:

  1. Respect geometric and physical laws in both deterministic and stochastic nature, ensuring accurate and robust simulations;
  2. Support interactivity by enabling agents to explore, intervene, and adapt within their environments;
  3. Generalize beyond simulation by ensuring reliability across diverse and unstructured real-world settings.


The topic of world models is closely related to various research fields, e.g., video generation, 3D generation, spatiotemporal learning, physical engines, VLA embodied AI, etc. Video generation and 3D scene generation methods aim to create high-fidelity videos and static 3D worlds, respectively, but they often fail to strictly comply with external rules (e.g. geometric, physical, chemical, and biological laws). Physics engines can simulate these laws, but they face challenges in modeling complex scenarios and enabling meaningful human interactions. While some pioneering works, including Genie, Dreamer, plaNet, DriveDreamer and MuDreamer, etc., are designed to provide realistic interactive environment, accurate planning and decision for agents, the reliability and interactivity of these models are still unsatisfactory. Given ICCV’s strong emphasis on modeling, simulation, and decision-making, this workshop aligns perfectly with its mission by addressing physically reliable and interactive world models, which are crucial for advancing reinforcement learning, generative modeling, and embodied AI. The interdisciplinary nature of ICCV will offer a platform for researchers to exchange insights, refine methodologies, and drive impactful advancements in autonomous systems, robotics, and interactive AI, reinforcing ICCV’s role in shaping the future of intelligent agents.


The workshop will focus on physical reliability and effective interactivity in world models for applications requiring precise physical reasoning and dense environmental interactions, such as robotics, autonomous systems, and multi-agent interactions. Beyond generating realistic predictions, world models must enforce physical consistency through differentiable physics, hybrid modeling, and adaptive simulation techniques. By bringing together researchers from machine learning, computer graphics, and physics-based modeling, the workshop will explore classical and cutting-edge approaches to aligning world models with real-world physics and extending them beyond simulation. This workshop aims to address the following fundamental questions:

  1. How can we incorporate geometric and physical priors into generative world modeling to ensure physical plausibility?
  2. How can we facilitate interactionsbetween users and simulated environments, and among agents within these environments?
  3. How can we use multi-modalities for scalable modeling of the real world?
  4. How can we effectively evaluate the quality of world models, particularly with regard to reliability and real-time interactivity?


Topics

We will explore a range of topics in this workshop, including, but not limited to, the following areas:

Benchmarks and datasets:

Establishing standardized metrics and datasets to assess physical reliability, interactivity, and generalization beyond controlled simulations.

Physically reliable world models:

Embedding geometric and physical constraints into world models, including model-based reinforcement learning, sequential modeling, and generative approaches.

Interactive and adaptive world models:

Developing world models that support real-time interactions, allowing agents to explore, intervene, and adapt dynamically within their environments.

Scalable and multi-modal world modeling:

Leveraging vision, language, audio, and tactile modalities to create richer, more comprehensive, and scalable world models that capture diverse real-world complexities.

World models in real-world applications:

Discussing the role of world models in robotics, self-driving, gaming and beyond, to enhance decision-making and real-world adaptability.

Call for papers

We welcome full paper submissions. The papers must be no longer than 8 pages in total (excluding references):

  • Paper Length: Minimum of 5 pages, Maximum of 8 pages (excluding references).
  • Format: Submit as PDF following the official ICCV 2025 template and guidelines.
  • Review Policy: Submissions must be anonymous and follow ICCV 2025 double-blind review rules.
  • Dual Submission: Not permitted under ICCV 2025 and RIWM 2025 guidelines.
  • Supplementary Materials: Optional videos, images, etc. can be uploaded as a separate zip file. The deadline matches the paper submission deadline.
  • Presentation Requirement: At least one author of each accepted paper must attend and present the work in person.
  • Presentation Format: Accepted papers will be presented either as oral or poster presentations.
  • Conference Policy: Presentation rules follow the ICCV 2025 main conference policy.
  • Compliance: Failure to meet these rules may result in removal from the workshop program.

Submission Portal

Via OpenReview

  • Submission deadline (archived paper): June 30, 2025, 11:59 PM AOE
  • Notification to authors (archived paper): July 11, 2025
  • Camera ready deadline (archived paper): Aug 18, 2025, 11:59 PM AOE

Via OpenReview (stay tuned)

  • Submission deadline (non-archived papers): Sept 1, 2025, 11:59 PM AOE
  • Notification to authors (non-archived papers): Sept 15, 2025

Speakers and panelists

Yann LeCun

Chief AI Scientist, Meta;Professor, New York University.

Ming-yu Liu

President of Research, NVIDIA; IEEE Fellow

Jiajun Wu

Assistant Professor, Stanford University

Katerina Fragkiadaki

Associate Professor, Carnegie Mellon University.

Tali Dekel

Associate Professor, the Weizmann Institute of Science, Staff Research Scientist, Google DeepMind

Jack Parker-Holder

Research Scientist, Google DeepMind

Lerrel Pinto

Assistant Professor, NYU

Nicklas Hansen

Ph.D. student, UC San Diego

Sherry Yang

Staff research scientist, Google DeepMind

Boyi Li

Research Scientist, NVIDIA

Yilun Du

Senior research scientist, Google Deepmind

Prithvijit Chattopadhyay

Research Scientist, NVIDIA

Workshop Schedule

Time Session Duration Details
9:00AM - 9:10AM Opening Remarks 10 min Welcome and Introduction to the Workshop
9:10AM - 9:40AM Invited Talk #1 30 min Talk1
9:50AM - 10:20AM Invited Talk #2 30 min Talk2
10:30AM - 10:40AM Coffee Socials 10 min Coffee Socials
10:40PM - 11:10PM Invited Talk #3 30 min Talk3
11:20PM - 11:50PM Invited Talk #4 30 min Talk4
12:00AM - 13:00PM Poster Session & Coffee Socials #1 60 min Networking and refreshments
13:30PM - 14:00PM Invited Talk #5 30 min Talk5
14:10PM - 14:40PM Invited Talk #6 30 min Talk6
14:50PM - 15:10PM Oral Presentations #1 10 min * 2 Oral Presentations1
15:10PM - 15:40PM Invited Talk #7 30 min Talk7
15:50PM - 16:20PM Invited Talk #8 30 min Talk8
16:30PM - 17:20PM Panel Discussion 30 min Interactive session with panelists
17:20PM - 17:30PM Awards and Conclusive Remarks 10 min Concluding the workshop and award announcements

Organization

Workshop Organizers

Organizing Commitee

Shixiang Tang

Postdoctoral researcher, the Chinese University of Hong Kong

Thu Nguyen-Phuoc

Senior Research Scientist, Meta

Zhenfei Yin

PhD at USYD, Visiting Researcher at Oxford

Amir Bar

Postdoctoral Researcher, Meta

Pengyu Zhang

Postdoctoral fellow, University of Alberta

Xu Jia

Associate Professor, DUT

Yutong Bai

Postdoc Researcher, UC Berkeley

Lian Xu

Research fellow , UWA

Francesco Ferroni

Principal researcher, NVIDIA

Flora Salim

Professor, UNSW Sydney

Tinne Tuytelaars

Professor, KU Leuven, Belgium

Hairui Yang

SDE, Shanghai AI Lab

Jiajun Wu

Assistant Professor, Stanford University

Huchuan Lu

Director of the School of AI, DUT

Yanyong Zhang

Professor, IEEE Fellow, USTC

Philip H.S. Torr

Professor, University of Oxford

Trevor Darrell

Professor, UCB