top of page
Agentic-AI.webp

Themes

  1. Generative Video Models as Autonomous Reasoning Engines:
    This theme examines how diffusion-based video models, 3D-aware world models, and large multimodal transformers can serve as cognitive backbones for agentic systems. Participants will explore how generative priors improve temporal coherence, fill in missing frames, simulate potential outcomes, and support high-level reasoning over complex dynamic scenes. By integrating planning modules and feedback loops, generative agents can autonomously interpret human activities, detect anomalies, and propose future scene evolutions.

     

  2. Agentic AI for Real-World Environments:
    Agentic AI introduces capabilities such as autonomous task decomposition, context-aware decision-making, and self-optimization. This segment focuses on how these properties enhance video understanding in challenging real-world conditions—crowded surveillance scenes, variable lighting, occlusions, abrupt motion patterns, and multi-agent interactions. Discussions will center on vision-language-action models, tool-use agents, embodied decision systems, and reinforcement learning frameworks designed to operate in unconstrained environments.

     

  3. Spatiotemporal Representation Learning & Predictive Modeling:
    High-quality video understanding depends on structured spatiotemporal representations. This section explores world models, graph neural video architectures, motion-aware transformers, and predictive generative models that anticipate future states of complex scenes. Use cases include trajectory prediction, pedestrian intent estimation, environmental hazard forecasting, and robotics navigation. The session will also investigate multimodal fusion across video, LiDAR, IMU, and audio, enabling agents to form a holistic perception of real-world environments.

Call for Papers

​​​We invite submissions and participation under the theme "GenAAI 2026", including but not limited to:

  1. Agentic AI for Video Reasoning and Decision-Making

  2. Video Models and Spatiotemporal Generative Representations

  3. Vision-Language-Action Agents

  4. Generative Trajectory Prediction and Scene Forecasting

  5. Real-World Video Surveillance, Crowd Analytics, and Behavior Understanding

  6. Video Anomaly Detection Using Generative Priors

  7. Self-Supervised and Foundation Models for Long Video Sequences

  8. Generative Augmentation, Reconstruction, and Inpainting in Videos

  9. Few-shot and Zero-shot Video Understanding

  10. Edge and Real-Time Deployment of Generative Agents

Instruction for Authors and Paper Submission

​​​We invite submissions and participation under the theme "GenAAI 2026"

  1. Instruction for Authors

  2. Paper Submission

© 2026  Workshop on Generative and Agentic AI for Real-World Video Understanding (GenAAI 2026) WS30 | All rights reserved.

bottom of page