july, 2017

24jul - 14febAll DayDemonstration @CVPR2017, HawaiiWe present a live demo of the video generator that has been used to create the almost 40,000 video clips that compose our Procedural Human Action Videos (PHAV) dataset.

Event Details

We present a live demo of the video generator that has been used to create the almost 40,000 video clips that compose our Procedural Human Action Videos (PHAV) dataset used in our publication accepted in the main conference (Procedural Generation of Videos to Train Deep Action Recognition Networks).

Procedural Generation and Virtual Worlds are rapidly gaining momentum as a reliable technique for visual training data generation. This is particularly the case for video, where manual labelling is extremely difficult or even impossible. This scarcity of adequate labeled training data is widely accepted as a major bottleneck of deep learning algorithms for important video understanding tasks like action recognition. Our generator is a tentative solution to this issue, and consists in using modern game technology to generate large scale, densely labeled, high-quality synthetic video data without any manual intervention. In contrast to approaches using existing video games to record limited data from human game sessions, we build upon the more powerful approach of “virtual world generation” [1,2], which can be seen as making a kind of serious game (dynamic virtual environment) to be played only by (game) AIs in order to generate training data for other (perceptual) AI algorithms.

Pioneering this approach, the recent SYNTHIA [1] and Virtual KITTI [2] datasets are among the largest fully-labelled datasets designed to boost perceptual tasks in the context of autonomous driving and video understanding (including semantic and instance segmentation, 2D and 3D object detection and tracking, optical flow estimation, depth estimation, and structure from motion). With PHAV, we push the limits of this approach further by providing stochastic simulations of human actions, camera paths, and environmental conditions. The key problem we solve is how to automatically generate potentially infinite amounts of varied and realistic training data.

The objective of our demo is to show how human action videos can be generated on-the-fly in a short amount of time, demonstrating our Unity-based generation software, and showing how it can be encompassed within a website, serving videos through the browser or a REST interface. Attendees will be able to select the main characteristics of the videos they wish to generate, and then visualize all generated data modalities (RGB, semantic and instance segmentation, optical flow and depth maps) rendered by our generator.

We performed related but different demos (focusing on virtual reality) at NIPS 2016 for PHAV in VR, and CVPR 2016 for Virtual KITTI and SYNTHIA.


  1. German Ros, Laura Sellart, Joanna Materzynska, David Vázquez, Antonio M. López, “The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes”, CVPR 2016.
  2. Adrien Gaidon, Qiao Wang, Yohann Cabon, Eleonora Vig, “Virtual Worlds as Proxy for Multi-Object Tracking Analysis”, CVPR 2016.



July 24 (Monday) - February 14 (Tuesday) UTC-10:00



Hawaii Convention Center in Honolulu, Hawaii