AI research has progressed from perception and generation to modeling dynamic, interactive environments—central to world models. These models are key to advancing general intelligence, enabling agents to perceive, understand, and interact with complex, real-world environments. This capability enhances robotics, autonomous systems, and scientific discovery by improving planning, decision-making, and adaptive interaction. Despite advancements, many world models prioritize visual fidelity over physical realism, often violating fundamental geometric and physical laws. Even physics-based simulators like Mujoco enforce Newtonian priors but lack adaptability to real-world stochastic phenomena. Additionally, most models remain passive predictors rather than interactive systems, limiting their ability to support embodied AI in dynamic tasks. Therefore, world models should:
- Respect geometric and physical laws in both deterministic and stochastic nature, ensuring accurate and robust simulations;
- Support interactivity by enabling agents to explore, intervene, and adapt within their environments;
- Generalize beyond simulation by ensuring reliability across diverse and unstructured real-world settings.
The topic of world models is closely related to various research fields,
e.g., video generation, 3D generation, spatiotemporal learning, physical engines, VLA embodied AI,
etc. Video generation and 3D scene generation methods aim to create high-fidelity videos and static 3D
worlds, respectively, but they often fail to strictly comply with external rules (e.g. geometric,
physical, chemical, and biological laws). Physics engines can simulate these laws, but they face
challenges in modeling complex scenarios and enabling meaningful human interactions. While some pioneering
works, including Genie, Dreamer, plaNet, DriveDreamer and MuDreamer, etc., are designed to provide
realistic interactive environment, accurate planning and decision for agents, the reliability and
interactivity of these models are still unsatisfactory. Given ICCV’s strong emphasis on modeling,
simulation, and decision-making, this workshop aligns perfectly with its mission by addressing physically
reliable and interactive world models, which are crucial for advancing reinforcement learning, generative
modeling, and embodied AI. The interdisciplinary nature of ICCV will offer a platform for researchers to
exchange insights, refine methodologies, and drive impactful advancements in autonomous systems,
robotics, and interactive AI, reinforcing ICCV’s role in shaping the future of intelligent agents.
The workshop will focus on physical reliability and effective interactivity in world models for applications requiring
precise physical reasoning and dense environmental interactions, such as robotics, autonomous systems, and multi-agent
interactions. Beyond generating realistic predictions, world models must enforce physical consistency through
differentiable physics, hybrid modeling, and adaptive simulation techniques. By bringing together researchers from
machine learning, computer graphics, and physics-based modeling, the workshop will explore classical and cutting-edge
approaches to aligning world models with real-world physics and extending them beyond simulation.
This workshop aims to address the following fundamental questions:
- How can we incorporate geometric and physical priors into generative world modeling to ensure physical plausibility?
- How can we facilitate interactionsbetween users and simulated environments, and among agents within these environments?
- How can we use multi-modalities for scalable modeling of the real world?
- How can we effectively evaluate the quality of world models, particularly with regard to reliability and real-time interactivity?
Topics
We will explore a range of topics in this workshop, including, but not limited to, the following areas:
Establishing standardized metrics and datasets to assess physical reliability, interactivity, and generalization beyond controlled simulations.
Embedding geometric and physical constraints into world models, including model-based reinforcement learning, sequential modeling, and generative approaches.
Developing world models that support real-time interactions, allowing agents to explore, intervene, and adapt dynamically within their environments.
Leveraging vision, language, audio, and tactile modalities to create richer, more comprehensive, and scalable world models that capture diverse real-world complexities.
Discussing the role of world models in robotics, self-driving, gaming and beyond, to enhance decision-making and real-world adaptability.
Call for papers
We welcome full paper submissions. The papers must be no longer than 8 pages in total (excluding references):
- Paper Length: Minimum of 5 pages, Maximum of 8 pages (excluding references).
- Format: Submit as PDF following the official ICCV 2025 template and guidelines.
- Review Policy: Submissions must be anonymous and follow ICCV 2025 double-blind review rules.
- Dual Submission: Not permitted under ICCV 2025 and RIWM 2025 guidelines.
- Supplementary Materials: Optional videos, images, etc. can be uploaded as a separate zip file. The deadline matches the paper submission deadline.
- Presentation Requirement: At least one author of each accepted paper must attend and present the work in person.
- Presentation Format: Accepted papers will be presented either as oral or poster presentations.
- Conference Policy: Presentation rules follow the ICCV 2025 main conference policy.
- Compliance: Failure to meet these rules may result in removal from the workshop program.
Submission Portal
Via OpenReview
- Submission deadline (archived paper): June 30, 2025, 11:59 PM AOE
- Notification to authors (archived paper): July 11, 2025
- Camera ready deadline (archived paper): Aug 18, 2025, 11:59 PM AOE
Via OpenReview (stay tuned)
- Submission deadline (non-archived papers): Sept 1, 2025, 11:59 PM AOE
- Notification to authors (non-archived papers): Sept 15, 2025
Speakers and panelists

Yann LeCun
Chief AI Scientist, Meta;Professor, New York University.
Ming-yu Liu
President of Research, NVIDIA; IEEE Fellow
Jiajun Wu
Assistant Professor, Stanford University
Katerina Fragkiadaki
Associate Professor, Carnegie Mellon University.
Tali Dekel
Associate Professor, the Weizmann Institute of Science, Staff Research Scientist, Google DeepMind
Jack Parker-Holder
Research Scientist, Google DeepMind
Lerrel Pinto
Assistant Professor, NYU
Nicklas Hansen
Ph.D. student, UC San Diego
Sherry Yang
Staff research scientist, Google DeepMind
Boyi Li
Research Scientist, NVIDIA
Yilun Du
Senior research scientist, Google Deepmind
Prithvijit Chattopadhyay
Research Scientist, NVIDIAWorkshop Schedule
Time | Session | Duration | Details |
---|---|---|---|
9:00AM - 9:10AM | Opening Remarks | 10 min | Welcome and Introduction to the Workshop |
9:10AM - 9:40AM | Invited Talk #1 | 30 min | Talk1 |
9:50AM - 10:20AM | Invited Talk #2 | 30 min | Talk2 |
10:30AM - 10:40AM | Coffee Socials | 10 min | Coffee Socials |
10:40PM - 11:10PM | Invited Talk #3 | 30 min | Talk3 |
11:20PM - 11:50PM | Invited Talk #4 | 30 min | Talk4 |
12:00AM - 13:00PM | Poster Session & Coffee Socials #1 | 60 min | Networking and refreshments |
13:30PM - 14:00PM | Invited Talk #5 | 30 min | Talk5 |
14:10PM - 14:40PM | Invited Talk #6 | 30 min | Talk6 |
14:50PM - 15:10PM | Oral Presentations #1 | 10 min * 2 | Oral Presentations1 |
15:10PM - 15:40PM | Invited Talk #7 | 30 min | Talk7 |
15:50PM - 16:20PM | Invited Talk #8 | 30 min | Talk8 |
16:30PM - 17:20PM | Panel Discussion | 30 min | Interactive session with panelists |
17:20PM - 17:30PM | Awards and Conclusive Remarks | 10 min | Concluding the workshop and award announcements |
Organization
Workshop Organizers
Organizing Commitee

Shixiang Tang
Postdoctoral researcher, the Chinese University of Hong Kong
Thu Nguyen-Phuoc
Senior Research Scientist, Meta
Zhenfei Yin
PhD at USYD, Visiting Researcher at Oxford
Amir Bar
Postdoctoral Researcher, Meta
Pengyu Zhang
Postdoctoral fellow, University of Alberta
Xu Jia
Associate Professor, DUT
Yutong Bai
Postdoc Researcher, UC Berkeley
Lian Xu
Research fellow , UWA
Francesco Ferroni
Principal researcher, NVIDIA
Flora Salim
Professor, UNSW Sydney
Tinne Tuytelaars
Professor, KU Leuven, Belgium
Hairui Yang
SDE, Shanghai AI Lab
Jiajun Wu
Assistant Professor, Stanford University
Huchuan Lu
Director of the School of AI, DUT
Yanyong Zhang
Professor, IEEE Fellow, USTC
Philip H.S. Torr
Professor, University of Oxford.jpg)