We are currently observing a strong renewed interest in and hopes for Artificial Intelligence (AI), fueled by scientific advances that can efficiently learn powerful statistical models from large data collections processed on efficient hardware. Computer Vision is the prime example of this modern revolution. Its recent successes in many high-level visual recognition tasks, such as image classification, object detection, and semantic segmentation are thanks in part to large labeled datasets such as ImageNet and deep learning algorithms supported by new and more appropriate hardware such as GPUs.

In fact, recent results indicate that the reliability of models might not be limited by the algorithms themselves but by the type and amount of data available. The release of new and more sophisticated datasets has indeed been the trump card for many recent achievements in computer vision and machine learning, e.g., deep convolutional networks — ImageNet.

Therefore, in order to tackle more challenging and general Visual AI (VAI) tasks, such as finegrained global scene and video understanding, progress is needed not only on algorithms, but also on datasets, both for learning and quantitatively evaluating generalization performance of visual models. In particular, labeling every pixel of a large set of varied videos with ground truth depth, optical flow, semantic category, or other visual properties is neither scalable nor cost-effective. This is hinted at by the small scale of existing datasets, such as the KITTI Vision Benchmark Suite, which was acquired through an enormous engineering effort. Such labor-intensive ground truth annotation process is, in addition, prone to errors.

The purpose of this workshop is to provide a forum to gather researchers around the nascent field of Virtual/Augmented Reality (VR/AR or just VAR) used for data generation in order to learn and study VAI algorithms. VAR technologies have made impressive progress recently, in particular in the fields of computer graphics, physics engines, game engines, authoring tools, or hardware, thanks to a strong push from various big players in the industry (including Facebook/Oculus, Google, Sony/Playstation, Valve, and Unity Technologies). Although mostly designed for multimedia applications geared towards human entertainment, more and more researchers (cf. references below) have noticed the tremendous potential that VAR platforms hold as data generation tools for algorithm/AI consumption. In light of the long-standing history of synthetic data in computer vision and multimedia, VAR technologies represent the next step of multimedia data generation, vastly improving on the quantity, variety, and realism of densely and accurately labeled fine-grained data that can be generated, and needed to push the scientific boundaries of research on AI.


This half-day workshop will include invited talks from researchers at the forefront of modern synthetic data generation with VAR for VAI (cf. below) and invite contributions (with awards, cf. below) from multimedia and computer vision researchers on the following non-exclusive topics:

  • Learning Transferable Multimodal Representations in VAR, e.g., via deep learning
  • Virtual World design for realistic training data generation
  • Augmenting real-world training datasets with renderings of 3D virtual objects
  • Active & reinforcement learning algorithms for effective training data generation and accelerated learning
  • Studies on the gap between VAR from the point of view of VAI algorithms
  • Hybrid real/virtual data sets to train and benchmark VAI algorithms
  • Large scale virtual (pre-)training of scene and video understanding algorithms for which current data is scarce, including:
    • Tracking, Re-identification
    • Human Pose Estimation, Action Recognition, and Event Detection
    • Object-, instance-, and scene-level segmentation
    • Optical flow, Scene flow, depth estimation, and viewpoint estimation
    • Visual Question Answering and spatio-temporal reasoning
    • X-recognition: objects, text, faces, emotions, etc.

The main question underlying the workshop will be when, how, and how much can realistic virtual/augmented worlds be used to train and evaluate artificial intelligence algorithms for real-world efficiency?

Best Paper Award


The Best Paper prize, sponsored by Xerox Research Centre Europe (XRCE) and Facebook AI Research (FAIR) Paris Lab, to the manuscript titled:

“Enhancing Place Recognition using Joint Intensity – Depth Analysis and Synthetic Data” by Elena Sizikova, Vivek K. Singh, Bogdan Georgescu, Maciej Halber, Kai Ma, and Terrence Chen.


This workshop is supported by the Spanish projects: TRA2014-57088-C2-1-R and DGT project SPIP2014-01352. With the support of the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Generalitat of Catalonia (2014-SGR-1506) and TECNIOspring with the FP7 of the EU and ACCIÓ.