DAVID VÁZQUEZAcademic Personal Site
.01

ABOUT

PERSONAL DETAILS
Campus UAB, Edifici O, s/n, 08193 Cerdanyola del Vallès, Barcelona
dvazquez@cvc.uab.es
+34 607665417
Hello. I am a Researcher Dreamer Programmer Leader Autonomous Driving Engineer
I am passionate about Deep Learning and autonomous vehicles. Welcome to my Personal and Academic profile
Available as researcher

BIO

ABOUT ME

Dr. David Vázquez is a postdoctoral researcher at Computer Vision Center of Barcelona (CVC) and Montreal Institut of Learning Algorithms (MILA) and Asistant Professor in the Department of Computer Science at the Autonomous University of Barcelona (UAB). He received his Ph.D. in Computer Vision (2013), M.Sc. in CV and AI (2009) and B.Sc. in Computer Science (2008) from the UAB. Previously he received the B.Sc. in Software Engineering from the UDC (2006). He has done internships at Daimler AG, UAM and URJC. He is expert in machine perception for autonomous vehicles. His research interests include deep learning, computer vision, robotics and autonomous driving.

He is a recipient of four awards for his Ph.D. Thesis by a Spahish chapter of the Intelligent Transportation Systems Society (ITSS); the Spanish Chapter of the International Association of Pattern Recognition (IAPR); the UAB; and the Centres de Recerca de Catalunya (CERCA); three best paper awards (GTC2016, NIPs-Workshop2011, ICMI2011) and two challenges (CVPR-Challenge2013&2014). David has also participated in industrial projects with companies such as IDIADA Applus+, Samsung and Audi.

David, has been organizer of international workshops in main conferences (i.e., TASK-CV, CVVT, VARVAI) chair at conferences (i.e., IBPRIA), an Editor of the IET Computer Vision journal (IET-CV)and has served as Program Committe of multiple machine learning and vision conferences and Journals (i.e., NIPS, CVPR, ECCV, ICCV, BMVC).

HOBBIES

INTERESTS

I'm passionate about conditioning training group classes such Bodypump, bodycombat, and any aerobics class.

I like experimenting with new vegetarian receips. I'm a Thermomix Master..

I like programming for different platforms such arduino, raspberry py or Jetson.

Watching funny tv series allow me to disconnect and get asleep.

FACTS

NUMBERS ABOUT ME

50
PAPERS PUBLISHED
9
PROJECTS COMPLETED
7
WORKSHOPS ORGANIZED
9
AWARDS
7300
HOURS OF CODING
2M
LINES OF CODE

.02

RESUME

  • EDUCATION
  • 2013
    2010
    Barcelona

    COMPUTER VISION - PHD

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Thesis: Virtual and Real World Adaptation for Pedestrian Detection
    Grant: PIF Autonomous University grant.
  • 2009
    2008
    Barcelona

    COMPUTER VISION & ARTIFICIAL INTELLIGENCE - MASTER

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Dissertation: The effect of the distance in Pedestrian Detection.
    Grant: PIF Autonomous University grant.
  • 2008
    2006
    Barcelona

    COMPUTER SCIENCE - DEGREE

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Dissertation: Intrusion Detection in Intelligent Surveillance Systems.
    Grant: SICUE-Seneca mobity grant. Intern at UAB & URJC.
  • 2008
    2006
    A Coruña

    SOFTWARE ENGINEERING - DEGREE

    UNIVERSITY OF A CORUÑA

    Dissertation: Face Recognition in Barajas Airport.
  • ACADEMIC AND PROFESSIONAL POSITIONS
  • 2018
    2016
    Montreal - Barcelona

    POSTDOCTORAL RESEARCHER

    MONTREAL INSTITUTE OF LEARNING ALGORITHMS & COMPUTER VISION CENTER BARCELONA

    Grant: Marie Curie European Grant (Tecniospring)
  • 2017
    2008
    Barcelona

    ASSISTANT PROFESSOR

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Subjects:Machine learning (Master), Visual Recognition (MAster), Artificial Intelligence I & II (BSc), Software Engineering I & II (BSc)
  • 2016
    2013
    Barcelona

    POSTDOCTORAL RESEARCHER

    COMPUTER VISION CENTER BARCELONA

    Grant: Juan de la Cierva Spanish Grant
  • 2013
    2009
    Barcelona

    GRADUATE STUDENT RESEARCHER

    COMPUTER VISION CENTER BARCELONA

    Grant: PIF Autonomous University grant
  • 2009
    2007
    Barcelona

    LAB ASSISTANT

    COMPUTER VISION CENTER BARCELONA

    Grant: Davantis Start-up Company intern
  • HONORS AND AWARDS
  • 2016
    Santa Clara

    BEST PAPER AWARD

    NVIDIA GTC 2016

    Article: GPU-based pedestrian detection for autonomous driving
  • 2015
    Madrid

    BEST PHD THESIS AWARD

    IEEE INTELLIGENT TANSPORTATION SYSTEMS SOCIETY SPANISH SECTION (ITSS)

  • 2015
    Santiago de Compostela

    BEST PHD THESIS AWARD

    SPANISH COMPUTER VISION ASSOCIATION (AERFAI)

  • 2015
    Barcelona

    BEST PHD THESIS AWARD

    AUTONOMOUS UNIVERSITY OF BARCELONA (UAB)

  • 2014
    Barcelona

    BEST PHD THESIS AWARD

    CERCA

  • 2014
    2013
    USA

    RMRC CHALLENGE

    ICCV 2013 & ECCV 2014

    RMRC Challenge (ECCV 2014): Second position in Pedestrian Detection
    RMRC Challenge (ICCV 2013): First position in Pedestrian Detection
  • 2011
    Granada

    BEST PAPER AWARD

    NIPS DOMAIN ADAPTATION WORKSHOP: Theory and Application

    Article: Cool world: domain adaptation of virtual and real worlds for human detection using active learning
  • 2011
    Granada

    BEST PAPER AWARD

    ICMI DOCTORAL CONSORTIUM

    Article: Virtual Worlds and Active Learning for Human Detection
.03

PUBLICATIONS

PUBLICATIONS LIST
img 2017

GPU-ACCELERATED REAL-TIME STIXEL COMPUTATION

IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2017

The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and e...

Conferences Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio M. Lopez and Juan Carlos Moure
img

GPU-ACCELERATED REAL-TIME STIXEL COMPUTATION

Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio M. Lopez and Juan Carlos Moure Conferences Selected

The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energyefficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that produces reliable results at 26 frames per second (real-time) on the Tegra X1 for disparity images of 1024×440 pixels and stixel widths of 5 pixels, and achieves more than 400 frames per second on a high-end Titan X GPU card.

Latex Bibtex Citation:
@INPROCEEDINGS {HEV2017b,
   author     = {Daniel Hernandez and Antonio Espinosa and David Vazquez and Antonio Lopez and Juan Carlos Moure},
   title          = {GPU-accelerated real-time stixel computation},
   booktitle = {IEEE Winter Conference on Applications of Computer Vision},
   year         = {2017}
}
img 2017

PIXELVAE: A LATENT VARIABLE MODEL FOR NATURAL IMAGES

5TH INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS (ICLR), 2017

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but te...

Conferences Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez and Aaron Courville
img

PIXELVAE: A LATENT VARIABLE MODEL FOR NATURAL IMAGES

Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez and Aaron Courville Conferences Selected

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but tend to suffer from image blurriness. PixelCNNs model sharp contours and details very well, but lack an explicit latent representation and have difficulty modeling large-scale structure in a computationally efficient way. In this paper, we present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. The resulting architecture achieves state-of-the-art log-likelihood on binarized MNIST. We extend PixelVAE to a hierarchy of multiple latent variables at different scales; this hierarchical model achieves competitive likelihood on 64x64 ImageNet and generates high-quality samples on LSUN bedrooms.

Latex Bibtex Citation:
@INPROCEEDINGS {GKA2017,
   author     = {Ishaan Gulrajani and Kundan Kumar and Faruk Ahmed and Adrien Ali Taiga and Francesco Visin and David Vazquez and Aaron Courville},
   title          = {PixelVAE: A Latent Variable Model for Natural Images},
   booktitle = {5th International Conference on Learning Representations},
   year         = {2017}
}
img 2017

FROM VIRTUAL TO REAL WORLD VISUAL PERCEPTION USING DOMAIN ADAPTATION -- THE DPM AS EXAMPLE

BOOK CHAPTER: DOMAIN ADAPTATION IN COMPUTER VISION APPLICATIONS, 2017

Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception c...

Book Chapters Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez and German Ros
img

FROM VIRTUAL TO REAL WORLD VISUAL PERCEPTION USING DOMAIN ADAPTATION -- THE DPM AS EXAMPLE

Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez and German Ros Book Chapters Selected

Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter we revisit the DA of a deformable part-based model (DPM) as an exemplifying case of virtual- to-real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as: how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world.

Latex Bibtex Citation:
@INBOOK {LXG2017,
   author     = {Antonio Lopez and Jiaolong Xu and Jose L. Gomez and David Vazquez and German Ros},
   title          = {From Virtual to Real World Visual Perception using Domain Adaptation -- The DPM as Example},
   booktitle = {Domain Adaptation in Computer Vision Applications},
   year         = {2017}
}
img 2017

EMBEDDED REAL-TIME STIXEL COMPUTATION

GPU TECHNOLOGY CONFERENCE (GTC), 2017


Conferences Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio M. Lopez and Juan Carlos Moure
img

EMBEDDED REAL-TIME STIXEL COMPUTATION

Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio M. Lopez and Juan Carlos Moure Conferences Selected

Latex Bibtex Citation:
@INPROCEEDINGS {HEV2017a,
   author     = {Daniel Hernandez and Antonio Espinosa and David Vazquez and Antonio Lopez and Juan Carlos Moure},
   title          = {Embedded Real-time Stixel Computation},
   booktitle = {GPU Technology Conference},
   year         = {2017}
}
img 2017

A BENCHMARK FOR ENDOLUMINAL SCENE SEGMENTATION OF COLONOSCOPY IMAGES

COMPUTER ASSISTED RADIOLOGY AND SURGERY (CARS), 2017

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonosc...

Demonstrations David Vazquez, Jorge Bernal, F. Javier Sanchez, Gloria Fernandez-Esparrach, Antonio M. Lopez, Adriana Romero, Michal Drozdzal and Aaron Courville
img

A BENCHMARK FOR ENDOLUMINAL SCENE SEGMENTATION OF COLONOSCOPY IMAGES

David Vazquez, Jorge Bernal, F. Javier Sanchez, Gloria Fernandez-Esparrach, Antonio M. Lopez, Adriana Romero, Michal Drozdzal and Aaron Courville Demonstrations Selected

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.

Latex Bibtex Citation:
@INPROCEEDINGS {VBS2017,
   author     = {David Vazquez and Jorge Bernal and F. Javier Sanchez and Gloria Fernandez-Esparrach and Antonio Lopez and Adriana Romero and Michal Drozdzal and Aaron Courville},
   title          = {A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images},
   booktitle = {Computer Assisted Radiology and Surgery},
   year         = {2017}
}
img 2017

VISION-BASED ADVANCED DRIVER ASSISTANCE SYSTEMS

BOOK CHAPTER: COMPUTER VISION IN VEHICLE TECHNOLOGY: LAND, SEA, AND AIR, 2017


Book Chapters David Geronimo, David Vazquez and Arturo de la Escalera
img

VISION-BASED ADVANCED DRIVER ASSISTANCE SYSTEMS

David Geronimo, David Vazquez and Arturo de la Escalera Book Chapters Selected

Latex Bibtex Citation:
@INBOOK {GVE2017,
   author     = {David Geronimo and David Vazquez and Arturo de la Escalera},
   title          = {Vision-Based Advanced Driver Assistance Systems},
   booktitle = {Computer Vision in Vehicle Technology: Land, Sea, and Air},
   year         = {2017}
}
img 2017

SEMANTIC SEGMENTATION OF URBAN SCENES VIA DOMAIN ADAPTATION OF SYNTHIA

BOOK CHAPTER: DOMAIN ADAPTATION IN COMPUTER VISION APPLICATIONS, 2017

Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of...

Book Chapters German Ros, Laura Sellart, Gabriel Villalonga, Elias Maidanik, Francisco Molero, Marc Garcia, Adriana Cedeño, Francisco Perez, Didier Ramirez, Eduardo Escobar, Jose Luis Gomez, David Vazquez and Antonio M. Lopez
img

SEMANTIC SEGMENTATION OF URBAN SCENES VIA DOMAIN ADAPTATION OF SYNTHIA

German Ros, Laura Sellart, Gabriel Villalonga, Elias Maidanik, Francisco Molero, Marc Garcia, Adriana Cedeño, Francisco Perez, Didier Ramirez, Eduardo Escobar, Jose Luis Gomez, David Vazquez and Antonio M. Lopez Book Chapters Selected

Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.

Latex Bibtex Citation:
@INBOOK {RSV2017,
   author     = {German Ros and Laura Sellart and Gabriel Villalonga and Elias Maidanik and Francisco Molero and Marc Garcia and Adriana Cedeño and Francisco Perez and Didier Ramirez and Eduardo Escobar and Jose Luis Gomez and David Vazquez and Antonio Lopez},
   title          = {Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA},
   booktitle = {Domain Adaptation in Computer Vision Applications},
   year         = {2017}
}
img 2016

HIERARCHICAL ADAPTIVE STRUCTURAL SVM FOR DOMAIN ADAPTATION

INTERNATIONAL JOURNAL OF COMPUTER VISION (IJCV), 119(2), PP. 159-178, 2016

A key topic in classification is the accuracy loss produced when the data distribution in the training (source) domain differs from that in the testing (target) domain. This is being recognized as a v...

Journal papersSelected Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez
img

HIERARCHICAL ADAPTIVE STRUCTURAL SVM FOR DOMAIN ADAPTATION

Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez Journal papers Selected

A key topic in classification is the accuracy loss produced when the data distribution in the training (source) domain differs from that in the testing (target) domain. This is being recognized as a very relevant problem for many computer vision tasks such as image classification, object detection, and object category recognition. In this paper, we present a novel domain adaptation method that leverages multiple target domains (or sub-domains) in a hierarchical adaptation tree. The core idea is to exploit the commonalities and differences of the jointly considered target domains. Given the relevance of structural SVM (SSVM) classifiers, we apply our idea to the adaptive SSVM (A-SSVM), which only requires the target domain samples together with the existing source-domain classifier for performing the desired adaptation. Altogether, we term our proposal as hierarchical A-SSVM (HA-SSVM). As proof of concept we use HA-SSVM for pedestrian detection, object category recognition and face recognition. In the former we apply HA-SSVM to the deformable partbased model (DPM) while in the rest HA-SSVM is applied to multi-category classifiers. We will show how HA-SSVM is effective in increasing the detection/recognition accuracy with respect to adaptation strategies that ignore the structure of the target data. Since, the sub-domains of the target data are not always known a priori, we shown how HA-SSVM can incorporate sub-domain discovery for object category recognition.

Latex Bibtex Citation:
@ARTICLE {XRV2016,
   author  = {Jiaolong Xu and Sebastian Ramos and David Vazquez and Antonio Lopez},
   title       = {Hierarchical Adaptive Structural SVM for Domain Adaptation},
   journal = {International Journal of Computer Vision},
   year      = {2016},
   volume = {119},
   issue     = {2},
   pages    = {159-178}
}
img 2016

HIERARCHICAL ONLINE DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016

We propose an online domain adaptation method for the deformable part-based model (DPM). The online domain adaptation is based on a two-level hierarchical adaptation tree, which consists of instance d...

Conferences Jiaolong Xu, David Vazquez, Krystian Mikolajczyk and Antonio M. Lopez
img

HIERARCHICAL ONLINE DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

Jiaolong Xu, David Vazquez, Krystian Mikolajczyk and Antonio M. Lopez Conferences Selected

We propose an online domain adaptation method for the deformable part-based model (DPM). The online domain adaptation is based on a two-level hierarchical adaptation tree, which consists of instance detectors in the leaf nodes and a category detector at the root node. Moreover, combined with a multiple object tracking procedure (MOT), our proposal neither requires target-domain annotated data nor revisiting the source-domain data for performing the source-to-target domain adaptation of the DPM. From a practical point of view this means that, given a source-domain DPM and new video for training on a new domain without object annotations, our procedure outputs a new DPM adapted to the domain represented by the video. As proof-of-concept we apply our proposal to the challenging task of pedestrian detection. In this case, each instance detector is an exemplar classifier trained online with only one pedestrian per frame. The pedestrian instances are collected by MOT and the hierarchical model is constructed dynamically according to the pedestrian trajectories. Our experimental results show that the adapted detector achieves the accuracy of recent supervised domain adaptation methods (i.e., requiring manually annotated targetdomain data), and improves the source detector more than 10 percentage points.

Latex Bibtex Citation:
@INPROCEEDINGS {XVM2016,
   author     = {Jiaolong Xu and David Vazquez and Krystian Mikolajczyk and Antonio Lopez},
   title          = {Hierarchical online domain adaptation of deformable part-based models},
   booktitle = {IEEE International Conference on Robotics and Automation},
   year         = {2016}
}
img 2016

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

GPU TECHNOLOGY CONFERENCE (GTC), 2016

Pedestrian detection for autonomous driving is one of the hardest tasks within computer vision, and involves huge computational costs. Obtaining acceptable real-time performance, measured in frames pe...

Conferences Victor Campmany, Sergio Silva, Juan Carlos Moure, Toni Espinosa, David Vazquez and Antonio M. Lopez
img

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

Victor Campmany, Sergio Silva, Juan Carlos Moure, Toni Espinosa, David Vazquez and Antonio M. Lopez Conferences Selected

Pedestrian detection for autonomous driving is one of the hardest tasks within computer vision, and involves huge computational costs. Obtaining acceptable real-time performance, measured in frames per second (fps), for the most advanced algorithms is nowadays a hard challenge. Taking the work in [1] as our baseline, we propose a CUDA implementation of a pedestrian detection system that includes LBP and HOG as feature descriptors and SVM and Random forest as classifiers. We introduce significant algorithmic adjustments and optimizations to adapt the problem to the NVIDIA GPU architecture. The aim is to deploy a real-time system providing reliable results.

Latex Bibtex Citation:
@INPROCEEDINGS {CSM2016,
   author     = {Victor Campmany and Sergio Silva and Juan Carlos Moure and Toni Espinosa and David Vazquez and Antonio Lopez},
   title          = {GPU-based pedestrian detection for autonomous driving},
   booktitle = {GPU Technology Conference},
   year         = {2016}
}
img 2016

REAL-TIME 3D RECONSTRUCTION FOR AUTONOMOUS DRIVING VIA SEMI-GLOBAL MATCHING

GPU TECHNOLOGY CONFERENCE (GTC), 2016

Robust and dense computation of depth information from stereo-camera systems is a computationally demanding requirement for real-time autonomous driving. Semi-Global Matching (SGM) [1] approximates he...

Conferences Daniel Hernandez, Juan Carlos Moure, Toni Espinosa, Alejandro Chacon, David Vazquez and Antonio M. Lopez
img

REAL-TIME 3D RECONSTRUCTION FOR AUTONOMOUS DRIVING VIA SEMI-GLOBAL MATCHING

Daniel Hernandez, Juan Carlos Moure, Toni Espinosa, Alejandro Chacon, David Vazquez and Antonio M. Lopez Conferences Selected

Robust and dense computation of depth information from stereo-camera systems is a computationally demanding requirement for real-time autonomous driving. Semi-Global Matching (SGM) [1] approximates heavy-computation global algorithms results but with lower computational complexity, therefore it is a good candidate for a real-time implementation. SGM minimizes energy along several 1D paths across the image. The aim of this work is to provide a real-time system producing reliable results on energy-efficient hardware. Our design runs on a NVIDIA Titan X GPU at 104.62 FPS and on a NVIDIA Drive PX at 6.7 FPS, promising for real-time platforms

Latex Bibtex Citation:
@INPROCEEDINGS {HME2016,
   author     = {Daniel Hernandez and Juan Carlos Moure and Toni Espinosa and Alejandro Chacon and David Vazquez and Antonio Lopez},
   title          = {Real-time 3D Reconstruction for Autonomous Driving via Semi-Global Matching},
   booktitle = {GPU Technology Conference},
   year         = {2016}
}
img 2016

THE SYNTHIA DATASET: A LARGE COLLECTION OF SYNTHETIC IMAGES FOR SEMANTIC SEGMENTATION OF URBAN SCENES

29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016

Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. The irruption of deep convolutional neural networks (DCNNs) allows to foresee obtaining reliable cl...

Conferences German Ros, Laura Sellart, Joanna Materzynska, David Vazquez and Antonio M. Lopez
img

THE SYNTHIA DATASET: A LARGE COLLECTION OF SYNTHETIC IMAGES FOR SEMANTIC SEGMENTATION OF URBAN SCENES

German Ros, Laura Sellart, Joanna Materzynska, David Vazquez and Antonio M. Lopez Conferences Selected

Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. The irruption of deep convolutional neural networks (DCNNs) allows to foresee obtaining reliable classifiers to perform such a visual task. However, DCNNs require to learn many parameters from raw images; thus, having a sufficient amount of diversified images with this class annotations is needed. These annotations are obtained by a human cumbersome labour specially challenging for semantic segmentation, since pixel-level annotations are required. In this paper, we propose to use a virtual world for automatically generating realistic synthetic images with pixel-level annotations. Then, we address the question of how useful can be such data for the task of semantic segmentation; in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic diversified collection of urban images, named SynthCity, with automatically generated class annotations. We use SynthCity in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments on a DCNN setting that show how the inclusion of SynthCity in the training stage significantly improves the performance of the semantic segmentation task

Latex Bibtex Citation:
@INPROCEEDINGS {RSM2016,
   author     = {German Ros and Laura Sellart and Joanna Materzynska and David Vazquez and Antonio Lopez},
   title          = {The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes},
   booktitle = {29th IEEE Conference on Computer Vision and Pattern Recognition},
   year         = {2016}
}
img 2016

EMBEDDED REAL-TIME STEREO ESTIMATION VIA SEMI-GLOBAL MATCHING ON THE GPU

16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2016

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomou...

Conferences Daniel Hernandez, Alejandro Chacon, Antonio Espinosa, David Vazquez, Juan Carlos Moure and Antonio M. Lopez
img

EMBEDDED REAL-TIME STEREO ESTIMATION VIA SEMI-GLOBAL MATCHING ON THE GPU

Daniel Hernandez, Alejandro Chacon, Antonio Espinosa, David Vazquez, Juan Carlos Moure and Antonio M. Lopez Conferences Selected

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results on the new embedded energy-efficient GPU devices. Our design runs on a Tegra X1 at 41 frames per second for an image size of 640x480, 128 disparity levels, and using 4 path directions for the SGM method.

Latex Bibtex Citation:
@INPROCEEDINGS {HCE2016a,
   author     = {Daniel Hernandez and Alejandro Chacon and Antonio Espinosa and David Vazquez and Juan Carlos Moure and Antonio Lopez},
   title          = {Embedded real-time stereo estimation via Semi-Global Matching on the GPU},
   booktitle = {16th International Conference on Computational Science},
   year         = {2016}
}
img 2016

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2016

We propose a real-time pedestrian detection system for the embedded Nvidia Tegra X1 GPU-CPU hybrid platform. The pipeline is composed by the following state-of-the-art algorithms: Histogram of Local B...

Conferences Victor Campmany, Sergio Silva, Antonio Espinosa, Juan Carlos Moure, David Vazquez and Antonio M. Lopez
img

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

Victor Campmany, Sergio Silva, Antonio Espinosa, Juan Carlos Moure, David Vazquez and Antonio M. Lopez Conferences Selected

We propose a real-time pedestrian detection system for the embedded Nvidia Tegra X1 GPU-CPU hybrid platform. The pipeline is composed by the following state-of-the-art algorithms: Histogram of Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG) features extracted from the input image; Pyramidal Sliding Window technique for foreground segmentation; and Support Vector Machine (SVM) for classification. Results show a 8x speedup in the target Tegra X1 platform and a better performance/watt ratio than desktop CUDA platforms in study.

Latex Bibtex Citation:
@INPROCEEDINGS {CSE2016,
   author     = {Victor Campmany and Sergio Silva and Antonio Espinosa and Juan Carlos Moure and David Vazquez and Antonio Lopez},
   title          = {GPU-based pedestrian detection for autonomous driving},
   booktitle = {16th International Conference on Computational Science},
   year         = {2016}
}
img 2016

COMPARISON OF TWO NON-LINEAR MODEL-BASED CONTROL STRATEGIES FOR AUTONOMOUS VEHICLES

24TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2016

This paper presents the comparison of two nonlinear model-based control strategies for autonomous cars. A control oriented model of vehicle based on a bicycle model is used. The two control strategies...

Conferences Eugenio Alcala, Laura Sellart, Vicenc Puig, Joseba Quevedo, Jordi Saludes, David Vazquez and Antonio M. Lopez
img

COMPARISON OF TWO NON-LINEAR MODEL-BASED CONTROL STRATEGIES FOR AUTONOMOUS VEHICLES

Eugenio Alcala, Laura Sellart, Vicenc Puig, Joseba Quevedo, Jordi Saludes, David Vazquez and Antonio M. Lopez Conferences Selected

This paper presents the comparison of two nonlinear model-based control strategies for autonomous cars. A control oriented model of vehicle based on a bicycle model is used. The two control strategies use a model reference approach. Using this approach, the error dynamics model is developed. Both controllers receive as input the longitudinal, lateral and orientation errors generating as control outputs the steering angle and the velocity of the vehicle. The first control approach is based on a non-linear control law that is designed by means of the Lyapunov direct approach. The second approach is based on a sliding mode-control that defines a set of sliding surfaces over which the error trajectories will converge. The main advantage of the sliding-control technique is the robustness against non-linearities and parametric uncertainties in the model. However, the main drawback of first order sliding mode is the chattering, so it has been implemented a high order sliding mode control. To test and compare the proposed control strategies, different path following scenarios are used in simulation.

Latex Bibtex Citation:
@INPROCEEDINGS {ASP2016,
   author     = {Eugenio Alcala and Laura Sellart and Vicenc Puig and Joseba Quevedo and Jordi Saludes and David Vazquez and Antonio Lopez},
   title          = {Comparison of two non-linear model-based control strategies for autonomous vehicles},
   booktitle = {24th Mediterranean Conference on Control and Automation},
   year         = {2016}
}
img 2016

PEDESTRIAN DETECTION AT DAY/NIGHT TIME WITH VISIBLE AND FIR CAMERAS: A COMPARISON

SENSORS (SENS), 16(6), PP. 820, 2016

Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditio...

Journal papersSelected Alejandro Gonzalez Alzate, Zhijie Fang, Yainuvis Socarras, Joan Serrat, David Vazquez, Jiaolong Xu and Antonio M. Lopez
img

PEDESTRIAN DETECTION AT DAY/NIGHT TIME WITH VISIBLE AND FIR CAMERAS: A COMPARISON

Alejandro Gonzalez Alzate, Zhijie Fang, Yainuvis Socarras, Joan Serrat, David Vazquez, Jiaolong Xu and Antonio M. Lopez Journal papers Selected

Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and night time. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images, (b) just infrared images and (c) both of them. In order to obtain results for the last item we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset we have built for this purpose as well as on the publicly available KAIST multispectral dataset.

Latex Bibtex Citation:
@ARTICLE {GFS2016,
   author  = {Alejandro Gonzalez Alzate and Zhijie Fang and Yainuvis Socarras and Joan Serrat and David Vazquez and Jiaolong Xu and Antonio Lopez},
   title       = {Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison},
   journal = {Sensors},
   year      = {2016},
   volume = {16},
   issue     = {6},
   pages    = {820}
}
img 2016

STEREO MATCHING USING SGM ON THE GPU

PROGRAMMING AND TUNING MASSIVELY PARALLEL SYSTEMS (PUMPS), 2016

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomou...

Theses Daniel Hernandez, Alejandro Chacon, Antonio Espinosa, David Vazquez, Juan Carlos Moure and Antonio M. Lopez
img

STEREO MATCHING USING SGM ON THE GPU

Daniel Hernandez, Alejandro Chacon, Antonio Espinosa, David Vazquez, Juan Carlos Moure and Antonio M. Lopez Theses Selected

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results on the new embedded energy efficient GPU devices. Our design runs on a Tegra X1 at 42 frames per second (fps) for an image size of 640x480, 128 disparity levels, and using 4 path directions for the SGM method.

Latex Bibtex Citation:
@INPROCEEDINGS {HCE2016b,
   author     = {Daniel Hernandez and Alejandro Chacon and Antonio Espinosa and David Vazquez and Juan Carlos Moure and Antonio Lopez},
   title          = {Stereo Matching using SGM on the GPU},
   booktitle = {Programming and Tuning Massively Parallel Systems},
   year         = {2016}
}
img 2016

ON-BOARD OBJECT DETECTION: MULTICUE, MULTIMODAL, AND MULTIVIEW RANDOM FOREST OF LOCAL EXPERTS

IEEE TRANSACTIONS ON CYBERNETICS (CYBER), (99), PP. 1-11, 2016

Despite recent significant advances, object detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditio...

Journal papersSelected Alejandro Gonzalez Alzate, David Vazquez, Antonio M. Lopez and Jaume Amores
img

ON-BOARD OBJECT DETECTION: MULTICUE, MULTIMODAL, AND MULTIVIEW RANDOM FOREST OF LOCAL EXPERTS

Alejandro Gonzalez Alzate, David Vazquez, Antonio M. Lopez and Jaume Amores Journal papers Selected

Despite recent significant advances, object detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities, and a strong multiview (MV) classifier that accounts for different object views and poses. In this paper, we provide an extensive evaluation that gives insight into how each of these aspects (multicue, multimodality, and strong MV classifier) affect accuracy both individually and when integrated together. In the multimodality component, we explore the fusion of RGB and depth maps obtained by high-definition light detection and ranging, a type of modality that is starting to receive increasing attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the accuracy, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient.

Latex Bibtex Citation:
@ARTICLE {GVL2016,
   author  = {Alejandro Gonzalez Alzate and David Vazquez and Antonio Lopez and Jaume Amores},
   title       = {On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts},
   journal = {IEEE Transactions on cybernetics},
   year      = {2016},
   volume = {},
   issue     = {99},
   pages    = {1-11}
}
img 2016

THE ONE HUNDRED LAYERS TIRAMISU: FULLY CONVOLUTIONAL DENSENETS FOR SEMANTIC SEGMENTATION

ARXIV (), 2016

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible ...

Demonstrations Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero and Yoshua Bengio
img

THE ONE HUNDRED LAYERS TIRAMISU: FULLY CONVOLUTIONAL DENSENETS FOR SEMANTIC SEGMENTATION

Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero and Yoshua Bengio Demonstrations Selected

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.

Latex Bibtex Citation:
@INPROCEEDINGS {JDV2016,
   author     = {Simon Jégou and Michal Drozdzal and David Vazquez and Adriana Romero and Yoshua Bengio},
   title          = {The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation},
   booktitle = {arXiv},
   year         = {2016}
}
img 2016

NODE-ADAPT, PATH-ADAPT AND TREE-ADAPT:MODEL-TRANSFER DOMAIN ADAPTATION FOR RANDOM FOREST

ARXIV (), 2016

Random Forest (RF) is a successful paradigm for learning classifiers due to its ability to learn from large feature spaces and seamlessly integrate multi-class classification, as well as the achieved ...

Demonstrations Azadeh S. Mozafari, David Vazquez, Mansour Jamzad and Antonio M. Lopez
img

NODE-ADAPT, PATH-ADAPT AND TREE-ADAPT:MODEL-TRANSFER DOMAIN ADAPTATION FOR RANDOM FOREST

Azadeh S. Mozafari, David Vazquez, Mansour Jamzad and Antonio M. Lopez Demonstrations Selected

Random Forest (RF) is a successful paradigm for learning classifiers due to its ability to learn from large feature spaces and seamlessly integrate multi-class classification, as well as the achieved accuracy and processing efficiency. However, as many other classifiers, RF requires domain adaptation (DA) provided that there is a mismatch between the training (source) and testing (target) domains which provokes classification degradation. Consequently, different RF-DA methods have been proposed, which not only require target-domain samples but revisiting the source-domain ones, too. As novelty, we propose three inherently different methods (Node-Adapt, Path-Adapt and Tree-Adapt) that only require the learned source-domain RF and a relatively few target-domain samples for DA, i.e. source-domain samples do not need to be available. To assess the performance of our proposals we focus on image-based object detection, using the pedestrian detection problem as challenging proof-of-concept. Moreover, we use the RF with expert nodes because it is a competitive patch-based pedestrian model. We test our Node-, Path- and Tree-Adapt methods in standard benchmarks, showing that DA is largely achieved.

Latex Bibtex Citation:
@INPROCEEDINGS {MVJ2016,
   author     = {Azadeh S. Mozafari and David Vazquez and Mansour Jamzad and Antonio M. Lopez},
   title          = {Node-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation for Random Forest},
   booktitle = {arXiv},
   year         = {2016}
}
img 2016

A BENCHMARK FOR ENDOLUMINAL SCENE SEGMENTATION OF COLONOSCOPY IMAGES

ARXIV (), 2016

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonosc...

Demonstrations David Vazquez, Jorge Bernal, F. Javier Sanchez, Gloria Fernandez-Esparrach, Antonio M. Lopez, Adriana Romero, Michal Drozdzal and Aaron Courville
img

A BENCHMARK FOR ENDOLUMINAL SCENE SEGMENTATION OF COLONOSCOPY IMAGES

David Vazquez, Jorge Bernal, F. Javier Sanchez, Gloria Fernandez-Esparrach, Antonio M. Lopez, Adriana Romero, Michal Drozdzal and Aaron Courville Demonstrations Selected

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.

Latex Bibtex Citation:
@INPROCEEDINGS {VBS2016,
   author     = {David Vazquez and Jorge Bernal and F. Javier Sanchez and Gloria Fernandez-Esparrach and Antonio Lopez and Adriana Romero and Michal Drozdzal and Aaron Courville},
   title          = {A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images},
   booktitle = {arXiv},
   year         = {2016}
}
img 2015

SPATIOTEMPORAL STACKED SEQUENTIAL LEARNING FOR PEDESTRIAN DETECTION

PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS OF 7TH IBERIAN CONFERENCE , IBPRIA 2015 (IBPRIA), 2015

Pedestrian classifiers decide which image windows contain a pedestrian. In practice, such classifiers provide a relatively high response at neighbor windows overlapping a pedestrian, while the respons...

Conferences Alejandro Gonzalez Alzate, Sebastian Ramos, David Vazquez, Antonio M. Lopez and Jaume Amores
img

SPATIOTEMPORAL STACKED SEQUENTIAL LEARNING FOR PEDESTRIAN DETECTION

Alejandro Gonzalez Alzate, Sebastian Ramos, David Vazquez, Antonio M. Lopez and Jaume Amores Conferences Selected

Pedestrian classifiers decide which image windows contain a pedestrian. In practice, such classifiers provide a relatively high response at neighbor windows overlapping a pedestrian, while the responses around potential false positives are expected to be lower. An analogous reasoning applies for image sequences. If there is a pedestrian located within a frame, the same pedestrian is expected to appear close to the same location in neighbor frames. Therefore, such a location has chances of receiving high classification scores during several frames, while false positives are expected to be more spurious. In this paper we propose to exploit such correlations for improving the accuracy of base pedestrian classifiers. In particular, we propose to use two-stage classifiers which not only rely on the image descriptors required by the base classifiers but also on the response of such base classifiers in a given spatiotemporal neighborhood. More specifically, we train pedestrian classifiers using a stacked sequential learning (SSL) paradigm. We use a new pedestrian dataset we have acquired from a car to evaluate our proposal at different frame rates. We also test on a well known dataset: Caltech. The obtained results show that our SSL proposal boosts detection accuracy significantly with a minimal impact on the computational cost. Interestingly, SSL improves more the accuracy at the most dangerous situations, i.e. when a pedestrian is close to the camera.

Latex Bibtex Citation:
@INPROCEEDINGS {GRV2015,
   author     = {Alejandro Gonzalez Alzate and Sebastian Ramos and David Vazquez and Antonio Lopez and Jaume Amores},
   title          = {Spatiotemporal Stacked Sequential Learning for Pedestrian Detection},
   booktitle = {Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015},
   year         = {2015}
}
img 2015

VISION-BASED OFFLINE-ONLINE PERCEPTION PARADIGM FOR AUTONOMOUS DRIVING

IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV2015 (WACV), 2015

Autonomous driving is a key factor for future mobility. Properly perceiving the environment of the vehicles is essential for a safe driving, which requires computing accurate geometric and semantic in...

Conferences German Ros, Sebastian Ramos, Manuel Granados, Amir Bakhtiary, David Vazquez and Antonio M. Lopez
img

VISION-BASED OFFLINE-ONLINE PERCEPTION PARADIGM FOR AUTONOMOUS DRIVING

German Ros, Sebastian Ramos, Manuel Granados, Amir Bakhtiary, David Vazquez and Antonio M. Lopez Conferences Selected

Autonomous driving is a key factor for future mobility. Properly perceiving the environment of the vehicles is essential for a safe driving, which requires computing accurate geometric and semantic information in real-time. In this paper, we challenge state-of-the-art computer vision algorithms for building a perception system for autonomous driving. An inherent drawback in the computation of visual semantics is the trade-off between accuracy and computational cost. We propose to circumvent this problem by following an offline-online strategy. During the offline stage dense 3D semantic maps are created. In the online stage the current driving area is recognized in the maps via a re-localization process, which allows to retrieve the pre-computed accurate semantics and 3D geometry in realtime. Then, detecting the dynamic obstacles we obtain a rich understanding of the current scene. We evaluate quantitatively our proposal in the KITTI dataset and discuss the related open challenges for the computer vision community.

Latex Bibtex Citation:
@INPROCEEDINGS {RRG2015,
   author     = {German Ros and Sebastian Ramos and Manuel Granados and Amir Bakhtiary and David Vazquez and Antonio Lopez},
   title          = {Vision-based Offline-Online Perception Paradigm for Autonomous Driving},
   booktitle = {IEEE Winter Conference on Applications of Computer Vision WACV2015},
   year         = {2015}
}
img 2015

MULTIVIEW RANDOM FOREST OF LOCAL EXPERTS COMBINING RGB AND LIDAR DATA FOR PEDESTRIAN DETECTION

IEEE INTELLIGENT VEHICLES SYMPOSIUM IV2015 (IV), 2015

Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these cond...

Conferences Alejandro Gonzalez Alzate, Gabriel Villalonga, Jiaolong Xu, David Vazquez, Jaume Amores and Antonio M. Lopez
img

MULTIVIEW RANDOM FOREST OF LOCAL EXPERTS COMBINING RGB AND LIDAR DATA FOR PEDESTRIAN DETECTION

Alejandro Gonzalez Alzate, Gabriel Villalonga, Jiaolong Xu, David Vazquez, Jaume Amores and Antonio M. Lopez Conferences Selected

Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multimodality and strong multi-view classifier) affect performance both individually and when integrated together. In the multimodality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.

Latex Bibtex Citation:
@INPROCEEDINGS {GVX2015,
   author     = {Alejandro Gonzalez Alzate and Gabriel Villalonga and Jiaolong Xu and David Vazquez and Jaume Amores and Antonio Lopez},
   title          = {Multiview Random Forest of Local Experts Combining RGB and LIDAR data for Pedestrian Detection},
   booktitle = {IEEE Intelligent Vehicles Symposium IV2015},
   year         = {2015}
}
img 2015

3D-GUIDED MULTISCALE SLIDING WINDOW FOR PEDESTRIAN DETECTION

PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS OF 7TH IBERIAN CONFERENCE , IBPRIA 2015 (IBPRIA), 2015

The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified ...

Conferences Alejandro Gonzalez Alzate, Gabriel Villalonga, German Ros, David Vazquez and Antonio M. Lopez
img

3D-GUIDED MULTISCALE SLIDING WINDOW FOR PEDESTRIAN DETECTION

Alejandro Gonzalez Alzate, Gabriel Villalonga, German Ros, David Vazquez and Antonio M. Lopez Conferences Selected

The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified as containing a pedestrian or not. Much attention has being paid to the classification module, while candidate generation has mainly relied on (multiscale) sliding window pyramid. However, candidate generation is critical for achieving real-time. In this paper we assume a context of autonomous driving based on stereo vision. Accordingly, we evaluate the effect of taking into account the 3D information (derived from the stereo) in order to prune the hundred of thousands windows per image generated by classical pyramidal sliding window. For our study we use a multimodal (RGB, disparity) and multi-descriptor (HOG, LBP, HOG+LBP) holistic ensemble based on linear SVM. Evaluation on data from the challenging KITTI benchmark suite shows the effectiveness of using 3D information to dramatically reduce the number of candidate windows, even improving the overall pedestrian detection accuracy.

Latex Bibtex Citation:
@INPROCEEDINGS {GVR2015,
   author     = {Alejandro Gonzalez Alzate and Gabriel Villalonga and German Ros and David Vazquez and Antonio Lopez},
   title          = {3D-Guided Multiscale Sliding Window for Pedestrian Detection},
   booktitle = {Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015},
   year         = {2015}
}
img 2015

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

PROGRAMMING AND TUNNING MASSIVE PARALLEL SYSTEMS (PUMPS), 2015

Pedestrian detection for autonomous driving has gained a lot of prominence during the last few years. Besides the fact that it is one of the hardest tasks within computer vision, it involves huge comp...

Demonstrations Victor Campmany, Sergio Silva, Juan Carlos Moure, Antoni Espinosa, David Vazquez and Antonio M. Lopez
img

GPU-BASED PEDESTRIAN DETECTION FOR AUTONOMOUS DRIVING

Victor Campmany, Sergio Silva, Juan Carlos Moure, Antoni Espinosa, David Vazquez and Antonio M. Lopez Demonstrations Selected

Pedestrian detection for autonomous driving has gained a lot of prominence during the last few years. Besides the fact that it is one of the hardest tasks within computer vision, it involves huge computational costs. The real-time constraints in the field are tight, and regular processors are not able to handle the workload obtaining an acceptable ratio of frames per second (fps). Moreover, multiple cameras are required to obtain accurate results, so the need to speed up the process is even higher. Taking the work in [1] as our baseline, we propose a CUDA implementation of a pedestrian detection system. Further, we introduce significant algorithmic adjustments and optimizations to adapt the problem to the GPU architecture. The aim is to provide a system capable of running in real-time obtaining reliable results.

Latex Bibtex Citation:
@INPROCEEDINGS {CSM2015,
   author     = {Victor Campmany and Sergio Silva and Juan Carlos Moure and Antoni Espinosa and David Vazquez and Antonio Lopez},
   title          = {GPU-based pedestrian detection for autonomous driving},
   booktitle = {Programming and Tunning Massive Parallel Systems},
   year         = {2015}
}
img 2015

AUTONOMOUS GPU-BASED DRIVING

PROGRAMMING AND TUNNING MASSIVE PARALLEL SYSTEMS (PUMPS), 2015

Human factors cause most driving accidents; this is why nowadays is common to hear about autonomous driving as an alternative. Autonomous driving will not only increase safety, but also will develop a...

Demonstrations Sergio Silva, Victor Campmany, Laura Sellart, Juan Carlos Moure, Antoni Espinosa, David Vazquez and Antonio M. Lopez
img

AUTONOMOUS GPU-BASED DRIVING

Sergio Silva, Victor Campmany, Laura Sellart, Juan Carlos Moure, Antoni Espinosa, David Vazquez and Antonio M. Lopez Demonstrations Selected

Human factors cause most driving accidents; this is why nowadays is common to hear about autonomous driving as an alternative. Autonomous driving will not only increase safety, but also will develop a system of cooperative self-driving cars that will reduce pollution and congestion. Furthermore, it will provide more freedom to handicapped people, elderly or kids. Autonomous Driving requires perceiving and understanding the vehicle environment (e.g., road, traffic signs, pedestrians, vehicles) using sensors (e.g., cameras, lidars, sonars, and radars), selflocalization (requiring GPS, inertial sensors and visual localization in precise maps), controlling the vehicle and planning the routes. These algorithms require high computation capability, and thanks to NVIDIA GPU acceleration this starts to become feasible. NVIDIA® is developing a new platform for boosting the Autonomous Driving capabilities that is able of managing the vehicle via CAN-Bus: the Drive™ PX. It has 8 ARM cores with dual accelerated Tegra® X1 chips. It has 12 synchronized camera inputs for 360º vehicle perception, 4G and Wi-Fi capabilities allowing vehicle communications and GPS and inertial sensors inputs for self-localization. Our research group has been selected for testing Drive™ PX. Accordingly, we are developing a Drive™ PX based autonomous car. Currently, we are porting our previous CPU based algorithms (e.g., Lane Departure Warning, Collision Warning, Automatic Cruise Control, Pedestrian Protection, or Semantic Segmentation) for running in the GPU.

Latex Bibtex Citation:
@INPROCEEDINGS {SCS2015,
   author     = {Sergio Silva and Victor Campmany and Laura Sellart and Juan Carlos Moure and Antoni Espinosa and David Vazquez and Antonio Lopez},
   title          = {Autonomous GPU-based Driving},
   booktitle = {Programming and Tunning Massive Parallel Systems},
   year         = {2015}
}
img 2014

DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI), 36(12), PP. 2367-2380, 2014

The accuracy of object classifiers can significantly drop when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, adapting the classif...

Journal papersSelected Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez
img

DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez Journal papers Selected

The accuracy of object classifiers can significantly drop when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, adapting the classifiers to the scenario in which they must operate is of paramount importance. We present novel domain adaptation (DA) methods for object detection. As proof of concept, we focus on adapting the state-of-the-art deformable part-based model (DPM) for pedestrian detection. We introduce an adaptive structural SVM (A-SSVM) that adapts a pre-learned classifier between different domains. By taking into account the inherent structure in feature space (e.g., the parts in a DPM), we propose a structure-aware A-SSVM (SA-SSVM). Neither A-SSVM nor SA-SSVM needs to revisit the source-domain training data to perform the adaptation. Rather, a low number of target-domain training examples (e.g., pedestrians) are used. To address the scenario where there are no target-domain annotated samples, we propose a self-adaptive DPM based on a self-paced learning (SPL) strategy and a Gaussian Process Regression (GPR). Two types of adaptation tasks are assessed: from both synthetic pedestrians and general persons (PASCAL VOC) to pedestrians imaged from an on-board camera. Results show that our proposals avoid accuracy drops as high as 15 points when comparing adapted and non-adapted detectors.

Latex Bibtex Citation:
@ARTICLE {XRV2014b,
   author  = {Jiaolong Xu and Sebastian Ramos and David Vazquez and Antonio Lopez},
   title       = {Domain Adaptation of Deformable Part-Based Models},
   journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
   year      = {2014},
   volume = {36},
   issue     = {12},
   pages    = {2367-2380}
}
img 2014

VIRTUAL AND REAL WORLD ADAPTATION FOR PEDESTRIAN DETECTION

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI), 36(4), PP. 797-809, 2014

Pedestrian detection is of paramount interest for many applications. Most promising detectors rely on discriminatively learnt classifiers, i.e., trained with annotated samples. However, the annotation...

Journal papersSelected David Vazquez, Javier Marin, Antonio M. Lopez, Daniel Ponsa and David Geronimo
img

VIRTUAL AND REAL WORLD ADAPTATION FOR PEDESTRIAN DETECTION

David Vazquez, Javier Marin, Antonio M. Lopez, Daniel Ponsa and David Geronimo Journal papers Selected

Pedestrian detection is of paramount interest for many applications. Most promising detectors rely on discriminatively learnt classifiers, i.e., trained with annotated samples. However, the annotation step is a human intensive and subjective task worth to be minimized. By using virtual worlds we can automatically obtain precise and rich annotations. Thus, we face the question: can a pedestrian appearance model learnt in realistic virtual worlds work successfully for pedestrian detection in realworld images?. Conducted experiments show that virtual-world based training can provide excellent testing accuracy in real world, but it can also suffer the dataset shift problem as real-world based training does. Accordingly, we have designed a domain adaptation framework, V-AYLA, in which we have tested different techniques to collect a few pedestrian samples from the target domain (real world) and combine them with the many examples of the source domain (virtual world) in order to train a domain adapted pedestrian classifier that will operate in the target domain. V-AYLA reports the same detection accuracy than when training with many human-provided pedestrian annotations and testing with real-world images of the same domain. To the best of our knowledge, this is the first work demonstrating adaptation of virtual and real worlds for developing an object detector.

Latex Bibtex Citation:
@ARTICLE {VML2014,
   author  = {David Vazquez and Javier Marin and Antonio Lopez and Daniel Ponsa and David Geronimo},
   title       = {Virtual and Real World Adaptation for Pedestrian Detection},
   journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
   year      = {2014},
   volume = {36},
   issue     = {4},
   pages    = {797-809}
}
img 2014

OCCLUSION HANDLING VIA RANDOM SUBSPACE CLASSIFIERS FOR HUMAN DETECTION

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS (PART B) (TSMCB), 44(3), PP. 342-354, 2014

This paper describes a general method to address partial occlusions for human detection in still images. The Random Subspace Method (RSM) is chosen for building a classifier ensemble robust against pa...

Journal papersSelected Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores and Ludmila I. Kuncheva
img

OCCLUSION HANDLING VIA RANDOM SUBSPACE CLASSIFIERS FOR HUMAN DETECTION

Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores and Ludmila I. Kuncheva Journal papers Selected

This paper describes a general method to address partial occlusions for human detection in still images. The Random Subspace Method (RSM) is chosen for building a classifier ensemble robust against partial occlusions. The component classifiers are chosen on the basis of their individual and combined performance. The main contribution of this work lies in our approach’s capability to improve the detection rate when partial occlusions are present without compromising the detection performance on non occluded data. In contrast to many recent approaches, we propose a method which does not require manual labelling of body parts, defining any semantic spatial components, or using additional data coming from motion or stereo. Moreover, the method can be easily extended to other object classes. The experiments are performed on three large datasets: the INRIA person dataset, the Daimler Multicue dataset, and a new challenging dataset, called PobleSec, in which a considerable number of targets are partially occluded. The different approaches are evaluated at the classification and detection levels for both partially occluded and non-occluded data. The experimental results show that our detector outperforms state-of-the-art approaches in the presence of partial occlusions, while offering performance and reliability similar to those of the holistic approach on non-occluded data. The datasets used in our experiments have been made publicly available for benchmarking purposes

Latex Bibtex Citation:
@ARTICLE {MVL2014,
   author  = {Javier Marin and David Vazquez and Antonio Lopez and Jaume Amores and Ludmila I. Kuncheva},
   title       = {Occlusion handling via random subspace classifiers for human detection},
   journal = {IEEE Transactions on Systems, Man, and Cybernetics (Part B)},
   year      = {2014},
   volume = {44},
   issue     = {3},
   pages    = {342-354}
}
img 2014

LEARNING A PART-BASED PEDESTRIAN DETECTOR IN VIRTUAL WORLD

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (TITS), 15(5), PP. 2121-2131, 2014

Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification mo...

Journal papersSelected Jiaolong Xu, David Vazquez, Antonio M. Lopez, Javier Marin and Daniel Ponsa
img

LEARNING A PART-BASED PEDESTRIAN DETECTOR IN VIRTUAL WORLD

Jiaolong Xu, David Vazquez, Antonio M. Lopez, Javier Marin and Daniel Ponsa Journal papers Selected

Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification module, which aims at deciding if a given image window contains a pedestrian. Given the difficulty of this task, many classifiers have been proposed during the last fifteen years. Among them, the so-called (deformable) part-based classifiers including multi-view modeling are usually top ranked in accuracy. Training such classifiers is not trivial since a proper aspect clustering and spatial part alignment of the pedestrian training samples are crucial for obtaining an accurate classifier. In this paper, first we perform automatic aspect clustering and part alignment by using virtual-world pedestrians, i.e., human annotations are not required. Second, we use a mixture-of-parts approach that allows part sharing among different aspects. Third, these proposals are integrated in a learning framework which also allows to incorporate real-world training data to perform domain adaptation between virtual- and real-world cameras. Overall, the obtained results on four popular on-board datasets show that our proposal clearly outperforms the state-of-the-art deformable part-based detector known as latent SVM.

Latex Bibtex Citation:
@ARTICLE {XVL2014,
   author  = {Jiaolong Xu and David Vazquez and Antonio Lopez and Javier Marin and Daniel Ponsa},
   title       = {Learning a Part-based Pedestrian Detector in Virtual World},
   journal = {IEEE Transactions on Intelligent Transportation Systems},
   year      = {2014},
   volume = {15},
   issue     = {5},
   pages    = {2121-2131}
}
img 2014

COST-SENSITIVE STRUCTURED SVM FOR MULTI-CATEGORY DOMAIN ADAPTATION

22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014

Domain adaptation addresses the problem of accuracy drop that a classifier may suffer when the training data (source domain) and the testing data (target domain) are drawn from different distributions...

Conferences Jiaolong Xu, Sebastian Ramos,David Vazquez and Antonio M. Lopez
img

COST-SENSITIVE STRUCTURED SVM FOR MULTI-CATEGORY DOMAIN ADAPTATION

Jiaolong Xu, Sebastian Ramos,David Vazquez and Antonio M. Lopez Conferences Selected

Domain adaptation addresses the problem of accuracy drop that a classifier may suffer when the training data (source domain) and the testing data (target domain) are drawn from different distributions. In this work, we focus on domain adaptation for structured SVM (SSVM). We propose a cost-sensitive domain adaptation method for SSVM, namely COSS-SSVM. In particular, during the re-training of an adapted classifier based on target and source data, the idea that we explore consists in introducing a non-zero cost even for correctly classified source domain samples. Eventually, we aim to learn a more targetoriented classifier by not rewarding (zero loss) properly classified source-domain training samples. We assess the effectiveness of COSS-SSVM on multi-category object recognition.

Latex Bibtex Citation:
@INPROCEEDINGS {XRV2014a,
   author     = {Jiaolong Xu and Sebastian Ramos andDavid Vazquez and Antonio Lopez},
   title          = {Cost-sensitive Structured SVM for Multi-category Domain Adaptation},
   booktitle = {22nd International Conference on Pattern Recognition},
   year         = {2014}
}
img 2014

INCREMENTAL DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

25TH BRITISH MACHINE VISION CONFERENCE (BMVC), 2014

Nowadays, classifiers play a core role in many computer vision tasks. The underlying assumption for learning classifiers is that the training set and the deployment environment (testing) follow the sa...

Conferences Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez
img

INCREMENTAL DOMAIN ADAPTATION OF DEFORMABLE PART-BASED MODELS

Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez Conferences Selected

Nowadays, classifiers play a core role in many computer vision tasks. The underlying assumption for learning classifiers is that the training set and the deployment environment (testing) follow the same probability distribution regarding the features used by the classifiers. However, in practice, there are different reasons that can break this constancy assumption. Accordingly, reusing existing classifiers by adapting them from the previous training environment (source domain) to the new testing one (target domain) is an approach with increasing acceptance in the computer vision community. In this paper we focus on the domain adaptation of deformable part-based models (DPMs) for object detection. In particular, we focus on a relatively unexplored scenario, i.e. incremental domain adaptation for object detection assuming weak-labeling. Therefore, our algorithm is ready to improve existing source-oriented DPM-based detectors as soon as a little amount of labeled target-domain training data is available, and keeps improving as more of such data arrives in a continuous fashion. For achieving this, we follow a multiple instance learning (MIL) paradigm that operates in an incremental per-image basis. As proof of concept, we address the challenging scenario of adapting a DPM-based pedestrian detector trained with synthetic pedestrians to operate in real-world scenarios. The obtained results show that our incremental adaptive models obtain equally good accuracy results as the batch learned models, while being more flexible for handling continuously arriving target-domain data.

Latex Bibtex Citation:
@INPROCEEDINGS {xrv2014c,
   author     = {Jiaolong Xu and Sebastian Ramos and David Vazquez and Antonio Lopez},
   title          = {Incremental Domain Adaptation of Deformable Part-based Models},
   booktitle = {25th British Machine Vision Conference},
   year         = {2014}
}
img 2014

3D PEDESTRIAN DETECTION VIA RANDOM FOREST

EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV-DEMO), 2014

Our demo focuses on showing the extraordinary performance of our novel 3D pedestrian detector along with its simplicity and real-time capabilities. This detector has been designed for autonomous drivi...

Demonstrations G. Villalonga, Sebastian Ramos, German Ros, David Vazquez and Antonio M. Lopez
img

3D PEDESTRIAN DETECTION VIA RANDOM FOREST

G. Villalonga, Sebastian Ramos, German Ros, David Vazquez and Antonio M. Lopez Demonstrations Selected

Our demo focuses on showing the extraordinary performance of our novel 3D pedestrian detector along with its simplicity and real-time capabilities. This detector has been designed for autonomous driving applications, but it can also be applied in other scenarios that cover both outdoor and indoor applications. Our pedestrian detector is based on the combination of a random forest classifier with HOG-LBP features and the inclusion of a preprocessing stage based on 3D scene information in order to precisely determinate the image regions where the detector should search for pedestrians. This approach ends up in a high accurate system that runs real-time as it is required by many computer vision and robotics applications.

Latex Bibtex Citation:
@INPROCEEDINGS {VRR2014,
   author     = {G. Villalonga and Sebastian Ramos and German Ros and David Vazquez and Antonio Lopez},
   title          = {3d Pedestrian Detection via Random Forest},
   booktitle = {European Conference on Computer Vision},
   year         = {2014}
}
img 2013

ADAPTING PEDESTRIAN DETECTION FROM SYNTHETIC TO FAR INFRARED IMAGES

ICCV WORKSHOP ON VISUAL DOMAIN ADAPTATION AND DATASET BIAS (ICCVW-VISDA), 2013

We present different techniques to adapt a pedestrian classifier trained with synthetic images and the corresponding automatically generated annotations to operate with far infrared (FIR) images. The ...

Conferences Yainuvis Socarras, Sebastian Ramos, David Vazquez, Antonio M. Lopez and Theo Gevers
img

ADAPTING PEDESTRIAN DETECTION FROM SYNTHETIC TO FAR INFRARED IMAGES

Yainuvis Socarras, Sebastian Ramos, David Vazquez, Antonio M. Lopez and Theo Gevers Conferences Selected

We present different techniques to adapt a pedestrian classifier trained with synthetic images and the corresponding automatically generated annotations to operate with far infrared (FIR) images. The information contained in this kind of images allow us to develop a robust pedestrian detector invariant to extreme illumination changes.

Latex Bibtex Citation:
@INPROCEEDINGS {SRV2013,
   author     = {Yainuvis Socarras and Sebastian Ramos and David Vazquez and Antonio Lopez and Theo Gevers},
   title          = {Adapting Pedestrian Detection from Synthetic to Far Infrared Images},
   booktitle = {ICCV Workshop on Visual Domain Adaptation and Dataset Bias},
   year         = {2013}
}
img 2013

COMPUTER VISION TRENDS AND CHALLENGES

BOOK: COMPUTER VISION TRENDS AND CHALLENGES, 2013

This book contains the papers presented at the Eighth CVC Workshop on Computer Vision Trends and Challenges (CVCR&D'2013). The workshop was held at the Computer Vision Center (Universitat Autònoma de ...

Book Jorge Bernal, David Vazquez (eds)
img

COMPUTER VISION TRENDS AND CHALLENGES

Jorge Bernal, David Vazquez (eds) Book Selected

This book contains the papers presented at the Eighth CVC Workshop on Computer Vision Trends and Challenges (CVCR&D'2013). The workshop was held at the Computer Vision Center (Universitat Autònoma de Barcelona), the October 25th, 2013. The CVC workshops provide an excellent opportunity for young researchers and project engineers to share new ideas and knowledge about the progress of their work, and also, to discuss about challenges and future perspectives. In addition, the workshop is the welcome event for new people that recently have joined the institute. The program of CVCR&D is organized in a single-track single-day workshop. It comprises several sessions dedicated to specific topics. For each session, a doctor working on the topic introduces the general research lines. The PhD students expose their specific research. A poster session will be held for open questions. Session topics cover the current research lines and development projects of the CVC: Medical Imaging, Medical Imaging, Color & Texture Analysis, Object Recognition, Image Sequence Evaluation, Advanced Driver Assistance Systems, Machine Vision, Document Analysis, Pattern Recognition and Applications. We want to thank all paper authors and Program Committee members. Their contribution shows that the CVC has a dynamic, active, and promising scientific community. We hope you all enjoy this Eighth workshop and we are looking forward to meeting you and new people next year in the Ninth CVCR&D.

Latex Bibtex Citation:
@BOOK {BeV2013,
   author     = {Jorge Bernal and David Vazquez (eds)},
   title          = {Computer vision Trends and Challenges},
   booktitle = {Computer vision Trends and Challenges},
   year         = {2013}
}
img 2013

MULTI-TASK BILINEAR CLASSIFIERS FOR VISUAL DOMAIN ADAPTATION

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS WORKSHOP (NIPSW), 2013

We propose a method that aims to lessen the significant accuracy degradation that a discriminative classifier can suffer when it is trained in a specific domain (source domain) and applied in a diffe...

Conferences Jiaolong Xu, Sebastian Ramos, Xu Hu, David Vazquez and Antonio M. Lopez
img

MULTI-TASK BILINEAR CLASSIFIERS FOR VISUAL DOMAIN ADAPTATION

Jiaolong Xu, Sebastian Ramos, Xu Hu, David Vazquez and Antonio M. Lopez Conferences Selected

We propose a method that aims to lessen the significant accuracy degradation that a discriminative classifier can suffer when it is trained in a specific domain (source domain) and applied in a different one (target domain). The principal reason for this degradation is the discrepancies in the distribution of the features that feed the classifier in different domains. Therefore, we propose a domain adaptation method that maps the features from the different domains into a common subspace and learns a discriminative domain-invariant classifier within it. Our algorithm combines bilinear classifiers and multi-task learning for domain adaptation. The bilinear classifier encodes the feature transformation and classification parameters by a matrix decomposition. In this way, specific feature transformations for multiple domains and a shared classifier are jointly learned in a multi-task learning framework. Focusing on domain adaptation for visual object detection, we apply this method to the state-of-the-art deformable part-based model for cross domain pedestrian detection. Experimental results show that our method significantly avoids the domain drift and improves the accuracy when compared to several baselines.

Latex Bibtex Citation:
@INPROCEEDINGS {XRH2013,
   author     = {Jiaolong Xu and Sebastian Ramos and Xu Hu and David Vazquez and Antonio Lopez},
   title          = {Multi-task Bilinear Classifiers for Visual Domain Adaptation},
   booktitle = {Advances in Neural Information Processing Systems Workshop},
   year         = {2013}
}
img 2013

RANDOM FORESTS OF LOCAL EXPERTS FOR PEDESTRIAN DETECTION

15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013

Pedestrian detection is one of the most challenging tasks in computer vision, and has received a lot of attention in the last years. Recently, some authors have shown the advantages of using combinati...

Conferences Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores and Bastian Leibe
img

RANDOM FORESTS OF LOCAL EXPERTS FOR PEDESTRIAN DETECTION

Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores and Bastian Leibe Conferences Selected

Pedestrian detection is one of the most challenging tasks in computer vision, and has received a lot of attention in the last years. Recently, some authors have shown the advantages of using combinations of part/patch-based detectors in order to cope with the large variability of poses and the existence of partial occlusions. In this paper, we propose a pedestrian detection method that efficiently combines multiple local experts by means of a Random Forest ensemble. The proposed method works with rich block-based representations such as HOG and LBP, in such a way that the same features are reused by the multiple local experts, so that no extra computational cost is needed with respect to a holistic method. Furthermore, we demonstrate how to integrate the proposed approach with a cascaded architecture in order to achieve not only high accuracy but also an acceptable efficiency. In particular, the resulting detector operates at five frames per second using a laptop machine. We tested the proposed method with well-known challenging datasets such as Caltech, ETH, Daimler, and INRIA. The method proposed in this work consistently ranks among the top performers in all the datasets, being either the best method or having a small difference with the best one.

Latex Bibtex Citation:
@INPROCEEDINGS {MVL2013,
   author     = {Javier Marin and David Vazquez and Antonio Lopez and Jaume Amores and Bastian Leibe},
   title          = {Random Forests of Local Experts for Pedestrian Detection},
   booktitle = {15th IEEE International Conference on Computer Vision},
   year         = {2013}
}
img 2013

DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR PEDESTRIAN DETECTION

BOOK: PHD THESIS, UNIVERSITAT DE BARCELONA-CVC, 2013

Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors...

Book David Vazquez
img

DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR PEDESTRIAN DETECTION

David Vazquez Book Selected

Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.

Latex Bibtex Citation:
@BOOK {Vaz2013,
   author     = {David Vazquez},
   title          = {Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection},
   booktitle = {PhD Thesis, Universitat de Barcelona-CVC},
   year         = {2013}
}
img 2013

INTERACTIVE TRAINING OF HUMAN DETECTORS

BOOK CHAPTER: MULTIMODAL INTERACTION IN IMAGE AND VIDEO APPLICATIONS INTELLIGENT SYSTEMS REFERENCE LIBRARY, 2013

Image based human detection remains as a challenging problem. Most promising detectors rely on classifiers trained with labelled samples. However, labelling is a manual labor intensive step. To overco...

Book Chapters David Vazquez, Antonio M. Lopez, Daniel Ponsa and David Geronimo
img

INTERACTIVE TRAINING OF HUMAN DETECTORS

David Vazquez, Antonio M. Lopez, Daniel Ponsa and David Geronimo Book Chapters Selected

Image based human detection remains as a challenging problem. Most promising detectors rely on classifiers trained with labelled samples. However, labelling is a manual labor intensive step. To overcome this problem we propose to collect images of pedestrians from a virtual city, i.e., with automatic labels, and train a pedestrian detector with them, which works fine when such virtual-world data are similar to testing one, i.e., real-world pedestrians in urban areas. When testing data is acquired in different conditions than training one, e.g., human detection in personal photo albums, dataset shift appears. In previous work, we cast this problem as one of domain adaptation and solve it with an active learning procedure. In this work, we focus on the same problem but evaluating a different set of faster to compute features, i.e., Haar, EOH and their combination. In particular, we train a classifier with virtual-world data, using such features and Real AdaBoost as learning machine. This classifier is applied to real-world training images. Then, a human oracle interactively corrects the wrong detections, i.e., few miss detections are manually annotated and some false ones are pointed out too. A low amount of manual annotation is fixed as restriction. Real- and virtual-world difficult samples are combined within what we call cool world and we retrain the classifier with this data. Our experiments show that this adapted classifier is equivalent to the one trained with only real-world data but requiring 90% less manual annotations.

Latex Bibtex Citation:
@INBOOK {vlp2013,
   author     = {David Vazquez and Antonio Lopez and Daniel Ponsa and David Geronimo},
   title          = {Interactive Training of Human Detectors},
   booktitle = {Multimodal Interaction in Image and Video Applications Intelligent Systems Reference Library},
   year         = {2013}
}
img 2013

LEARNING A MULTIVIEW PART-BASED MODEL IN VIRTUAL WORLD FOR PEDESTRIAN DETECTION

IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2013

State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with autom...

Conferences Jiaolong Xu, David Vazquez, Antonio M. Lopez, Javier Marin and Daniel Ponsa
img

LEARNING A MULTIVIEW PART-BASED MODEL IN VIRTUAL WORLD FOR PEDESTRIAN DETECTION

Jiaolong Xu, David Vazquez, Antonio M. Lopez, Javier Marin and Daniel Ponsa Conferences Selected

State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).

Latex Bibtex Citation:
@INPROCEEDINGS {xvl2013a,
   author     = {Jiaolong Xu and David Vazquez and Antonio Lopez and Javier Marin and Daniel Ponsa},
   title          = {Learning a Multiview Part-based Model in Virtual World for Pedestrian Detection},
   booktitle = {IEEE Intelligent Vehicles Symposium},
   year         = {2013}
}
img 2013

WEAKLY SUPERVISED AUTOMATIC ANNOTATION OF PEDESTRIAN BOUNDING BOXES

CVPR WORKSHOP ON GROUND TRUTH - WHAT IS A GOOD DATASET? (CVPRW), 2013

Among the components of a pedestrian detector, its trained pedestrian classifier is crucial for achieving the desired performance. The initial task of the training process consists in collecting sampl...

Conferences David Vazquez, Jiaolong Xu, Sebastian Ramos, Antonio M. Lopez and Daniel Ponsa
img

WEAKLY SUPERVISED AUTOMATIC ANNOTATION OF PEDESTRIAN BOUNDING BOXES

David Vazquez, Jiaolong Xu, Sebastian Ramos, Antonio M. Lopez and Daniel Ponsa Conferences Selected

Among the components of a pedestrian detector, its trained pedestrian classifier is crucial for achieving the desired performance. The initial task of the training process consists in collecting samples of pedestrians and background, which involves tiresome manual annotation of pedestrian bounding boxes (BBs). Thus, recent works have assessed the use of automatically collected samples from photo-realistic virtual worlds. However, learning from virtual-world samples and testing in real-world images may suffer the dataset shift problem. Accordingly, in this paper we assess an strategy to collect samples from the real world and retrain with them, thus avoiding the dataset shift, but in such a way that no BBs of real-world pedestrians have to be provided. In particular, we train a pedestrian classifier based on virtual-world samples (no human annotation required). Then, using such a classifier we collect pedestrian samples from real-world images by detection. After, a human oracle rejects the false detections efficiently (weak annotation). Finally, a new classifier is trained with the accepted detections. We show that this classifier is competitive with respect to the counterpart trained with samples collected by manually annotating hundreds of pedestrian BBs.

Latex Bibtex Citation:
@INPROCEEDINGS {VXR2013a,
   author     = {David Vazquez and Jiaolong Xu and Sebastian Ramos and Antonio Lopez and Daniel Ponsa},
   title          = {Weakly Supervised Automatic Annotation of Pedestrian Bounding Boxes},
   booktitle = {CVPR Workshop on Ground Truth - What is a good dataset?},
   year         = {2013}
}
img 2013

ADAPTING A PEDESTRIAN DETECTOR BY BOOSTING LDA EXEMPLAR CLASSIFIERS

CVPR WORKSHOP ON GROUND TRUTH - WHAT IS A GOOD DATASET? (CVPRW), 2013

Training vision-based pedestrian detectors using synthetic datasets (virtual world) is a useful technique to collect automatically the training examples with their pixel-wise ground truth. However, as...

Conferences Jiaolong Xu, David Vazquez, Sebastian Ramos, Antonio M. Lopez and Daniel Ponsa
img

ADAPTING A PEDESTRIAN DETECTOR BY BOOSTING LDA EXEMPLAR CLASSIFIERS

Jiaolong Xu, David Vazquez, Sebastian Ramos, Antonio M. Lopez and Daniel Ponsa Conferences Selected

Training vision-based pedestrian detectors using synthetic datasets (virtual world) is a useful technique to collect automatically the training examples with their pixel-wise ground truth. However, as it is often the case, these detectors must operate in real-world images, experiencing a significant drop of their performance. In fact, this effect also occurs among different real-world datasets, i.e. detectors' accuracy drops when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, in order to avoid this problem, it is required to adapt the detector trained with synthetic data to operate in the real-world scenario. In this paper, we propose a domain adaptation approach based on boosting LDA exemplar classifiers from both virtual and real worlds. We evaluate our proposal on multiple real-world pedestrian detection datasets. The results show that our method can efficiently adapt the exemplar classifiers from virtual to real world, avoiding drops in average precision over the 15%.

Latex Bibtex Citation:
@INPROCEEDINGS {xvr2013a,
   author     = {Jiaolong Xu and David Vazquez and Sebastian Ramos and Antonio Lopez and Daniel Ponsa},
   title          = {Adapting a Pedestrian Detector by Boosting LDA Exemplar Classifiers},
   booktitle = {CVPR Workshop on Ground Truth - What is a good dataset?},
   year         = {2013}
}
img 2013

DA-DPM PEDESTRIAN DETECTION

ICCV WORKSHOP ON RECONSTRUCTION MEETS RECOGNITION (ICCVW-RR), 2013


Conferences Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez
img

DA-DPM PEDESTRIAN DETECTION

Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio M. Lopez Conferences Selected

Latex Bibtex Citation:
@INPROCEEDINGS {XRV2013,
   author     = {Jiaolong Xu and Sebastian Ramos and David Vazquez and Antonio Lopez},
   title          = {DA-DPM Pedestrian Detection},
   booktitle = {ICCV Workshop on Reconstruction meets Recognition},
   year         = {2013}
}
img 2012

PEDESTRIAN DETECTION: EXPLORING VIRTUAL WORLDS

BOOK CHAPTER: HANDBOOK OF PATTERN RECOGNITION: METHODS AND APPLICATION, 2012

Handbook of pattern recognition will include contributions from university educators and active research experts. This Handbook is intended to serve as a basic reference on methods and applications of...

Book Chapters Javier Marin, David Geronimo, David Vazquez and Antonio M. Lopez
img

PEDESTRIAN DETECTION: EXPLORING VIRTUAL WORLDS

Javier Marin, David Geronimo, David Vazquez and Antonio M. Lopez Book Chapters Selected

Handbook of pattern recognition will include contributions from university educators and active research experts. This Handbook is intended to serve as a basic reference on methods and applications of pattern recognition. The primary aim of this handbook is providing the community of pattern recognition with a readable, easy to understand resource that covers introductory, intermediate and advanced topics with equal clarity. Therefore, the Handbook of pattern recognition can serve equally well as reference resource and as classroom textbook. Contributions cover all methods, techniques and applications of pattern recognition. A tentative list of relevant topics might include: 1- Statistical, structural, syntactic pattern recognition. 2- Neural networks, machine learning, data mining. 3- Discrete geometry, algebraic, graph-based techniques for pattern recognition. 4- Face recognition, Signal analysis, image coding and processing, shape and texture analysis. 5- Document processing, text and graphics recognition, digital libraries. 6- Speech recognition, music analysis, multimedia systems. 7- Natural language analysis, information retrieval. 8- Biometrics, biomedical pattern analysis and information systems. 9- Other scientific, engineering, social and economical applications of pattern recognition. 10- Special hardware architectures, software packages for pattern recognition.

Latex Bibtex Citation:
@INBOOK {MGV2012,
   author     = {Javier Marin and David Geronimo and David Vazquez and Antonio Lopez},
   title          = {Pedestrian Detection: Exploring Virtual Worlds},
   booktitle = {Handbook of Pattern Recognition: Methods and Application},
   year         = {2012}
}
img 2012

IMPROVING HOG WITH IMAGE SEGMENTATION: APPLICATION TO HUMAN DETECTION

11TH INTERNATIONAL CONFERENCE ON ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS), 2012

In this paper we improve the histogram of oriented gradients (HOG), a core descriptor of state-of-the-art object detection, by the use of higher-level information coming from image segmentation. The i...

Conferences Yainuvis Socarras, David Vazquez, Antonio M. Lopez, David Geronimo and Theo Gevers
img

IMPROVING HOG WITH IMAGE SEGMENTATION: APPLICATION TO HUMAN DETECTION

Yainuvis Socarras, David Vazquez, Antonio M. Lopez, David Geronimo and Theo Gevers Conferences Selected

In this paper we improve the histogram of oriented gradients (HOG), a core descriptor of state-of-the-art object detection, by the use of higher-level information coming from image segmentation. The idea is to re-weight the descriptor while computing it without increasing its size. The benefits of the proposal are two-fold: (i) to improve the performance of the detector by enriching the descriptor information and (ii) take advantage of the information of image segmentation, which in fact is likely to be used in other stages of the detection system such as candidate generation or refinement. We test our technique in the INRIA person dataset, which was originally developed to test HOG, embedding it in a human detection system. The well-known segmentation method, mean-shift (from smaller to larger super-pixels), and different methods to re-weight the original descriptor (constant, region-luminance, color or texture-dependent) has been evaluated. We achieve performance improvements of 4:47% in detection rate through the use of differences of color between contour pixel neighborhoods as re-weighting function.

Latex Bibtex Citation:
@INPROCEEDINGS {SLV2012,
   author     = {Yainuvis Socarras and David Vazquez and Antonio Lopez and David Geronimo and Theo Gevers},
   title          = {Improving HOG with Image Segmentation: Application to Human Detection},
   booktitle = {11th International Conference on Advanced Concepts for Intelligent Vision Systems},
   year         = {2012}
}
img 2012

UNSUPERVISED DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR PEDESTRIAN DETECTION

21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2012

Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate, and l...

Conferences David Vazquez, Antonio M. Lopez and Daniel Ponsa
img

UNSUPERVISED DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR PEDESTRIAN DETECTION

David Vazquez, Antonio M. Lopez and Daniel Ponsa Conferences Selected

Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate, and lead it to self-learn how to distinguish the objects of interest, i.e., without human intervention. However, the learning of each object model requires labelled samples collected through a tiresome manual process. For instance, we are interested in exploring the self-training of a pedestrian detector for driver assistance systems. Our first approach to avoid manual labelling consisted in the use of samples coming from realistic computer graphics, so that their labels are automatically available [12]. This would make possible the desired self-training of our pedestrian detector. However, as we showed in [14], between virtual and real worlds it may be a dataset shift. In order to overcome it, we propose the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process. In particular, this paper explores the use of the transductive SVM (T-SVM) learning algorithm in order to adapt virtual and real worlds for pedestrian detection (Fig. 1).

Latex Bibtex Citation:
@INPROCEEDINGS {VLP2012,
   author     = {David Vazquez and Antonio Lopez and Daniel Ponsa},
   title          = {Unsupervised Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection},
   booktitle = {21st International Conference on Pattern Recognition},
   year         = {2012}
}
img 2011

OPPONENT COLORS FOR HUMAN DETECTION

5TH IBERIAN CONFERENCE ON PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA), 2011

Human detection is a key component in fields such as advanced driving assistance and video surveillance. However, even detecting non-occluded standing humans remains a challenge of intensive research....

Conferences Muhammad Anwer Rao, David Vazquez and Antonio M. Lopez
img

OPPONENT COLORS FOR HUMAN DETECTION

Muhammad Anwer Rao, David Vazquez and Antonio M. Lopez Conferences Selected

Human detection is a key component in fields such as advanced driving assistance and video surveillance. However, even detecting non-occluded standing humans remains a challenge of intensive research. Finding good features to build human models for further detection is probably one of the most important issues to face. Currently, shape, texture and motion features have deserve extensive attention in the literature. However, color-based features, which are important in other domains (e.g., image categorization), have received much less attention. In fact, the use of RGB color space has become a kind of choice by default. The focus has been put in developing first and second order features on top of RGB space (e.g., HOG and co-occurrence matrices, resp.). In this paper we evaluate the opponent colors (OPP) space as a biologically inspired alternative for human detection. In particular, by feeding OPP space in the baseline framework of Dalal et al. for human detection (based on RGB, HOG and linear SVM), we will obtain better detection performance than by using RGB space. This is a relevant result since, up to the best of our knowledge, OPP space has not been previously used for human detection. This suggests that in the future it could be worth to compute co-occurrence matrices, self-similarity features, etc., also on top of OPP space, i.e., as we have done with HOG in this paper.

Latex Bibtex Citation:
@INPROCEEDINGS {RVL2011a,
   author     = {Muhammad Anwer Rao and David Vazquez and Antonio Lopez},
   title          = {Opponent Colors for Human Detection},
   booktitle = {5th Iberian Conference on Pattern Recognition and Image Analysis},
   year         = {2011}
}
img 2011

COLOR CONTRIBUTION TO PART-BASED PERSON DETECTION IN DIFFERENT TYPES OF SCENARIOS

14TH INTERNATIONAL CONFERENCE ON COMPUTER ANALYSIS OF IMAGES AND PATTERNS (CAIP), 2011

Camera-based person detection is of paramount interest due to its potential applications. The task is diffcult because the great variety of backgrounds (scenarios, illumination) in which persons are p...

Conferences Muhammad Anwer Rao, David Vazquez and Antonio M. Lopez
img

COLOR CONTRIBUTION TO PART-BASED PERSON DETECTION IN DIFFERENT TYPES OF SCENARIOS

Muhammad Anwer Rao, David Vazquez and Antonio M. Lopez Conferences Selected

Camera-based person detection is of paramount interest due to its potential applications. The task is diffcult because the great variety of backgrounds (scenarios, illumination) in which persons are present, as well as their intra-class variability (pose, clothe, occlusion). In fact, the class person is one of the included in the popular PASCAL visual object classes (VOC) challenge. A breakthrough for this challenge, regarding person detection, is due to Felzenszwalb et al. These authors proposed a part-based detector that relies on histograms of oriented gradients (HOG) and latent support vector machines (LatSVM) to learn a model of the whole human body and its constitutive parts, as well as their relative position. Since the approach of Felzenszwalb et al. appeared new variants have been proposed, usually giving rise to more complex models. In this paper, we focus on an issue that has not attracted suficient interest up to now. In particular, we refer to the fact that HOG is usually computed from RGB color space, but other possibilities exist and deserve the corresponding investigation. In this paper we challenge RGB space with the opponent color space (OPP), which is inspired in the human vision system.We will compute the HOG on top of OPP, then we train and test the part-based human classifer by Felzenszwalb et al. using PASCAL VOC challenge protocols and person database. Our experiments demonstrate that OPP outperforms RGB. We also investigate possible differences among types of scenarios: indoor, urban and countryside. Interestingly, our experiments suggest that the beneficts of OPP with respect to RGB mainly come for indoor and countryside scenarios, those in which the human visual system was designed by evolution.

Latex Bibtex Citation:
@INPROCEEDINGS {RVL2011b,
   author     = {Muhammad Anwer Rao and David Vazquez and Antonio Lopez},
   title          = {Color Contribution to Part-Based Person Detection in Different Types of Scenarios},
   booktitle = {14th International Conference on Computer Analysis of Images and Patterns},
   year         = {2011}
}
img 2011

VIRTUAL WORLDS AND ACTIVE LEARNING FOR HUMAN DETECTION

13TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI), 2011

Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occlu...

Conferences David Vazquez, Antonio M. Lopez, Daniel Ponsa and Javier Marin
img

VIRTUAL WORLDS AND ACTIVE LEARNING FOR HUMAN DETECTION

David Vazquez, Antonio M. Lopez, Daniel Ponsa and Javier Marin Conferences Selected

Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e., trained with labelled samples. However, labeling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e., using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples (Fig. 1), which has not been done before. We present quantitative results showing that this approach is valid.

Latex Bibtex Citation:
@INPROCEEDINGS {VLP2011a,
   author     = {David Vazquez and Antonio Lopez and Daniel Ponsa and Javier Marin},
   title          = {Virtual Worlds and Active Learning for Human Detection},
   booktitle = {13th International Conference on Multimodal Interaction},
   year         = {2011}
}
img 2011

COOL WORLD: DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR HUMAN DETECTION USING ACTIVE LEARNING

NIPS DOMAIN ADAPTATION WORKSHOP: THEORY AND APPLICATION (DA-NIPS), 2011

Image based human detection is of paramount interest for different applications. The most promising human detectors rely on discriminatively learnt classifiers, i.e., trained with labelled samples. Ho...

Conferences David Vazquez, Antonio M. Lopez, Daniel Ponsa and Javier Marin
img

COOL WORLD: DOMAIN ADAPTATION OF VIRTUAL AND REAL WORLDS FOR HUMAN DETECTION USING ACTIVE LEARNING

David Vazquez, Antonio M. Lopez, Daniel Ponsa and Javier Marin Conferences Selected

Image based human detection is of paramount interest for different applications. The most promising human detectors rely on discriminatively learnt classifiers, i.e., trained with labelled samples. However, labelling is a manual intensive task, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, in Marin et al. we have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera and the same type of scenario. Accordingly, in Vazquez et al. we cast the problem as one of supervised domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we use an active learning technique. Thus, ultimately our human model is learnt by the combination of virtual- and real-world labelled samples which, to the best of our knowledge, was not done before. Here, we term such combined space cool world. In this extended abstract we summarize our proposal, and include quantitative results from Vazquez et al. showing its validity.

Latex Bibtex Citation:
@INPROCEEDINGS {VLP2011b,
   author     = {David Vazquez and Antonio Lopez and Daniel Ponsa and Javier Marin},
   title          = {Cool world: domain adaptation of virtual and real worlds for human detection using active learning},
   booktitle = {NIPS Domain Adaptation Workshop: Theory and Application},
   year         = {2011}
}
img 2010

LEARNING APPEARANCE IN VIRTUAL SCENARIOS FOR PEDESTRIAN DETECTION

23RD IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010

Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled sam...

Conferences Javier Marin, David Vazquez, David Geronimo and Antonio M. Lopez
img

LEARNING APPEARANCE IN VIRTUAL SCENARIOS FOR PEDESTRIAN DETECTION

Javier Marin, David Vazquez, David Geronimo and Antonio M. Lopez Conferences Selected

Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled samples. This paper addresses the following question: can a pedestrian appearance model learnt in virtual scenarios work successfully for pedestrian detection in real images? (Fig. 1). Our experiments suggest a positive answer, which is a new and relevant conclusion for research in pedestrian detection. More specifically, we record training sequences in virtual scenarios and then appearance-based pedestrian classifiers are learnt using HOG and linear SVM. We test such classifiers in a publicly available dataset provided by Daimler AG for pedestrian detection benchmarking. This dataset contains real world images acquired from a moving car. The obtained result is compared with the one given by a classifier learnt using samples coming from real images. The comparison reveals that, although virtual samples were not specially selected, both virtual and real based training give rise to classifiers of similar performance.

Latex Bibtex Citation:
@INPROCEEDINGS {MVG2010,
   author     = {Javier Marin and David Vazquez and David Geronimo and Antonio Lopez},
   title          = {Learning Appearance in Virtual Scenarios for Pedestrian Detection},
   booktitle = {23rd IEEE Conference on Computer Vision and Pattern Recognition},
   year         = {2010}
}
img 2009

THE EFFECT OF THE DISTANCE IN PEDESTRIAN DETECTION

CVC TECHNICAL REPORT (M.SC.), 2009

Pedestrian accidents are one of the leading preventable causes of death. In order to reduce the number of accidents, in the last decade the pedestrian protection systems have been introduced, a specia...

Theses David Vazquez, David Geronimo and Antonio M. Lopez
img

THE EFFECT OF THE DISTANCE IN PEDESTRIAN DETECTION

David Vazquez, David Geronimo and Antonio M. Lopez Theses Selected

Pedestrian accidents are one of the leading preventable causes of death. In order to reduce the number of accidents, in the last decade the pedestrian protection systems have been introduced, a special type of advanced driver assistance systems, in witch an on-board camera explores the road ahead for possible collisions with pedestrians in order to warn the driver or perform braking actions. As a result of the variability of the appearance, pose and size, pedestrian detection is a very challenging task. So many techniques, models and features have been proposed to solve the problem. As the appearance of pedestrians varies signi cantly as a function of distance, a system based on multiple classi ers specialized on diferent depths is likely to improve the overall performance with respect to a typical system based on a general detector. Accordingly, the main aim of this work is to explore the e ect of the distance in pedestrian detection. We have evaluated three pedestrian detectors (HOG, HAAR and EOH) in two di erent databases (INRIA and Daimler09) for two di erent sizes (small and big). By a extensive set of experiments we answer to questions like which datasets and evaluation methods are the most adequate, which is the best method for each size of the pedestrians and why or how do the method optimum parameters vary with respect to the distance

Latex Bibtex Citation:
@INPROCEEDINGS {VGL2009,
   author     = {David Vazquez and David Geronimo and Antonio Lopez},
   title          = {The effect of the distance in pedestrian detection},
   booktitle = {CVC Technical Report},
   year         = {2009}
}
img 2008

INTRUSION CLASSIFICATION IN INTELLIGENT VIDEO SURVEILLANCE SYSTEMS

ESTUDIS D'ENGINYERIA SUPERIOR EN INFORMÁTICA (PFC), 2008

An intelligent video surveillance system (IVS) is a camera-based installation able to process in real-time the images coming from the cameras. The aim is to automatically warn about different events o...

Theses David Vazquez, Antonio M. Lopez
img

INTRUSION CLASSIFICATION IN INTELLIGENT VIDEO SURVEILLANCE SYSTEMS

David Vazquez, Antonio M. Lopez Theses Selected

An intelligent video surveillance system (IVS) is a camera-based installation able to process in real-time the images coming from the cameras. The aim is to automatically warn about different events of interest at the moment they happen. Daview system of Davantis is a com mercial example of IVS system. The problems addressed by any IVS system, and so Daview, are so challenging that none IVS system is perfect, thus, they need continuous improvement. Accordingly, this project aims to study different approaches in order to outperform current Daview performance, in particular, we bet for improving its classification core. We present an in deep study of the state of the art on IVS systems, as well as on how Daview works. Based on that knowledge, we propose four possibilities for improving Daview classification capabilities: improve existent classifiers; improve existing classifiers combination; create new classifiers and create new classifier-based architectures. Our main contribution has been the incorporation of state-of-the-art feature selection and machine learning techniques for the classification tasks, a viewpoint not fully addressed in current Daview system. After a comprehensive quantitative evaluation we will see how one of our proposals clearly outperforms the overall performance of current Daview system. In particular the classification core that we finally propose consists in an AdaBoost One-Against-All architecture that uses appearance and motion features that were already present in current Daview system

Latex Bibtex Citation:
@INPROCEEDINGS {VL2008a,
   author     = {David Vazquez and Antonio Lopez},
   title          = {Intrusion Classification in Intelligent Video Surveillance Systems},
   booktitle = {Estudis d'Enginyeria Superior en Informática},
   year         = {2008}
}
img 2007

EMPLEO DE SISTEMAS BIOMÉTRICOS FACIALES APLICADOS AL RECONOCIMIENTO DE PERSONAS EN AEROPUERTOS

INGENIERÍA TÉCNICA EN INFORMÁTICA DE SISTEMAS (), 2007

El presente proyecto se desarrolló a lo largo del año 2005 y 2006, probando un prototipo de un sistema de verificación facial con imágenes extraídas de las cámaras de video-vigilancia del aeropuerto d...

Theses David Vazquez, Enrique Cabello
img

EMPLEO DE SISTEMAS BIOMÉTRICOS FACIALES APLICADOS AL RECONOCIMIENTO DE PERSONAS EN AEROPUERTOS

David Vazquez, Enrique Cabello Theses Selected

El presente proyecto se desarrolló a lo largo del año 2005 y 2006, probando un prototipo de un sistema de verificación facial con imágenes extraídas de las cámaras de video-vigilancia del aeropuerto de Barajas. Se diseñaron varios experimentos, agrupados en dos clases. En el primer tipo, el sistema es entre- nado con imágenes obtenidas en condiciones de laboratorio y luego probado con imágenes extraídas de las cámaras de video-vigilancia del aeropuerto de Barajas. En el segundo caso, tanto las imágenes de entrenamiento como las de prueba corresponden a imágenes extraídas de Barajas. Se ha desarrollado un sistema completo, que incluye adquisición y digitalización de las imágenes, localización y recorte de las caras en escena, verificación de sujetos y obtención de resultados. Los resultados muestran que, en general, un sistema de verificación facial basado en imágenes puede ser una valiosa ayuda a un operario que deba estar vigilando amplias zonas.

Latex Bibtex Citation:
@INPROCEEDINGS {VC2007a,
   author     = {David Vazquez and Enrique Cabello},
   title          = {Empleo de sistemas biométricos faciales aplicados al reconocimiento de personas en aeropuertos},
   booktitle = {Ingeniería Técnica en Informática de Sistemas},
   year         = {2007}
}
img 2006

EMPLEO DE SISTEMAS BIOMÉTRICOS PARA EL RECONOCIMIENTO DE PERSONAS EN AEROPUERTOS

INSTITUTO UNIVERSITARIO DE INVESTIGACIÓN SOBRE SEGURIDAD INTERIOR (IUSI 2006) (), 2006

El presente proyecto se desarrolló a lo largo del año 2005, probando un prototipo de un sistema de verificación facial con imágenes extraídas de las cámaras de video vigilancia del aeropuerto de Baraj...

Journal papersSelected Enrique Cabello, Cristina Conde, Angel Serrano, Licesio Rodriguez and David Vazquez
img

EMPLEO DE SISTEMAS BIOMÉTRICOS PARA EL RECONOCIMIENTO DE PERSONAS EN AEROPUERTOS

Enrique Cabello, Cristina Conde, Angel Serrano, Licesio Rodriguez and David Vazquez Journal papers Selected

El presente proyecto se desarrolló a lo largo del año 2005, probando un prototipo de un sistema de verificación facial con imágenes extraídas de las cámaras de video vigilancia del aeropuerto de Barajas. Se diseñaron varios experimentos, agrupados en dos clases. En el primer tipo, el sistema es entrenado con imágenes obtenidas en condiciones de laboratorio y luego probado con imágenes extraídas de las cámaras de video vigilancia del aeropuerto de Barajas. En el segundo caso, tanto las imágenes de entrenamiento como las de prueba corresponden a imágenes extraídas de Barajas. Se ha desarrollado un sistema completo, que incluye adquisición y digitalización de las imágenes, localización y recorte de las caras en escena, verificación de sujetos y obtención de resultados. Los resultados muestran, que, en general, un sistema de verificación facial basado en imágenes puede ser una ayuda a un operario que deba estar vigilando amplias zonas.

Latex Bibtex Citation:
@ARTICLE {CCS2006a,
   author  = {Enrique Cabello and Cristina Conde and Angel Serrano and Licesio Rodriguez and David Vazquez},
   title       = {Empleo de sistemas biométricos para el reconocimiento de personas en aeropuertos},
   journal = {Instituto Universitario de Investigación sobre Seguridad Interior (IUSI 2006)},
   year      = {2006},
   volume = {},
   issue     = {},
   pages    = {}
}
.04

RESEARCH

PUBLIC RESEARCH PROJECTS

ELEKTRA

AUTONOMOUS VEHICLE

img

Elektra is an autonomous driving platform formed by more than 20 professionals from different backgrounds: CVC-UAB (Environment perception), CAOS-UAB (Enmbedded hardware) UPC-Tarrasa (Control & planning), CTTC-UPC (Positioning), UAB-DEIC (Communications), UAB-CEPHIS (Electronics), CT Ingenieros (Vehicle enginering) and the municipality of Sant Quirze - Barcelona (vehicle testing). Elektra pretends to be the Catalan hub of autonomous driving, with a pool of professionals with whom to carry out research applied to intelligent mobility and technology transfer.

The project is highly relying on computer vision techniques for perception (stereo, stixels, obstacle detection, scene understanding) which tend to be computationally demanding, localization (GPS + IMU and vision) and navigation (Control and Planning).

In order to have a car that can drive, several things are needed. Firstly, accurate pedestrian (obstacle) detection. Secondly, free navigable space detection, which is no more than detecting the lane without obstacles or interferences. Thirdly, localization. The car needs to know where it is at and where it is going towards. Fourthly, planning. The car has to plan its way from point A to point B in the smoothest way possible and thus define a global trajectory. And last but not least, control: to execute the motion plan performing the necessary manoeuvres.

SYNTHIA

AUTONOMOUS DRIVING SIMULATOR

img

SYNTHIA has been one of our latest Projects, where we have created a driving simulator in order to teach driverless cars how to drive. The simulator has been licensed to various international companies.

SYNTHIA allows to generate datasets with the purpose of aiding scene understanding problems in the context of driving scenarios. The datasets consists of photo-realistic images rendered from a virtual city and comes with precise pixel-level semantic class and instance annotations (currently, it is cityscapes compatible), depth and optical flow. The CVC team is constantly incrementing the functionalities of SYNTHIA for generating more types of ground truth. Different conditions can be forced such as illumination, weather, season, and camera locations (multi-camera calibrated settings are possible). The current version of SYNTHIA allows generating more than 10000 images per hour with such a ground truth, using a game-graded regular GPU.

ACDC

AUTONOMOUS COOPERATIVE DRIVING IN THE CITY

img

The massive use of automobiles has been a major benefit in terms of personal mobility and sense of freedom, but at the expense of significant drawbacks: traffic accidents and environmental pollution; both factors ending up in health issues and in a huge economic cost. These considerations lead us to think that personal automobiles may not fit in the future cities that industrialized countries have been designing for years.

Accordingly, in addition to improving the current public system focused on transporting masses at once (trams, metro), we imagined a centralized system in the city receiving mobility petitions from users all around. The city would control a fleet of automated (driverless) vehicles that would cooperate among them and with other elements such as infrastructure surveillance cameras, traffic lights, etc., to move safely, comfortably, and therefore saving energy and minimizing congestion. The “Automated and Cooperative Driving in the City (ACDC)” proposal was our first step towards this dream. The project started in 2015 after receiving funding from the Spanish Ministry of Economy and is planned to end in 2018.

ACDC has been focused on developing advanced software for data processing coming from relatively cheap sensors, rather than assuming the use of expensive sensors just to reduce the complexity of interpreting the data. Thus, from the scientific point of view, ACDC is in the realm of artificial intelligence, machine learning, planning and control, as well as computer vision and general sensor information processing and fusion.

ECO-DRIVERS

COOPERATIVE DRIVING

img

The aim of eCo-Drivers (Ecologic Cooperative Driver and Road Intelligent Visual Exploration for Route Safety) was to deepen into the research of technologies for bringing Automatic Driving Assistance Systems to urban oriented electric vehicles. The features of this proposal were mainly two: the use of vision as an “eco-sensor” and, secondly, a driver-centric approach. Rather than thinking in road and driver monitoring as working-alone ADAS, we made both technologies cooperate and so assist the driver only when necessary, working as actual co-drivers.

In order to accomplish our goals within this project, we decided to acquire our electric car and thus obtain a prototype in which to test our research. This project was, in fact, the fusion of three complementary and collaborative subprojects (1) “Vision-based Driver Assistance Systems for Urban Environments (ViDAS-UrbE)”; (2) “Driver Distraction Detection System (D3System)”; and (3) “Intelligent Agent-based Driver Decision Support (i-Support)”. The core research of subprojects (1) and (2) focused on Computer Vision, while (3) addressed research on machine learning and reasoning under uncertain and incomplete data. Based on this research, the project aimed to develop two urban-oriented and vision-based co-drivers for both a driver-centric obstacle detection; and a driver-centric pedestrian detection.

MAPEA2

MAPPING PEDESTRIAN BEHAVIOUR

img

Reducing vehicle accidents with pedestrians is one of the aims within vehicle security. In fact, it is know that, in order to obtain a 5 star EuroNCAP, it will be necessary to incorporate pedestrian detectors (PDs) within vehicles, with the aim to avoid the highest number of accidents possible. In this Project, ‘Mapping pedestrian behaviour’ we proposed to adapt our Pedestrian detector to create risk maps in relation to accidents.

These maps will be crucial information for more advanced Pedestrian detectors, for future autonomous cars which work in cities having door to door trajectories. The system we wanted to develop pretended to be compact and easily installable in any car in a non-invasive way and without the need of complex calibration processes. This restriction is crucial in order to create risk maps with a higher rapidness and lower cost. As a proof of concept, within MAPEA2, we have planned to create the pedestrian accident risk map within the Bellaterra Campus of the Autonomous University of Barcelona (in which the CVC is located).

.05

TEACHING

  • CURRENT
  • NOW
    2008

    ASSISTANT PROFESSOR

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Master of Computer Vision. Coordinator of the Deep Learning modules.
  • TEACHING HISTORY
  • 2009
    2010

    ASSISTANT PROFESSOR

    AUTONOMOUS UNIVERSITY OF BARCELONA

    Teaching in the subjects: Artificial Intelligence (I & II), Master of Computer Vision, Software Degign & Software Engineering.
.06

SKILLS

PROGRAMMING SKIILLS
I started programming as a child. I'm passionate about software engineering. My favourite languajes are C++, Python, TensorFlow and Keras.
LEVEL: EXPERT EXPERIENCE: 10 YEARS
C/C++ OpenCV Matlab

LEVEL: ADVANCED EXPERIENCE: 3 YEARS
Python Numpy Java CUDA GitHub

LEVEL: INTERMEDIATE EXPERIENCE: 2 YEARS
Theano Tensorflow Lasagne Keras Caffe
COMPUTER VISION AND MACHINE LEARNING SKILLS
I'm profficient solving computer vision problems by using machine learning techniques.
LEVEL: EXPERT EXPERIENCE: 10 YEARS
3D Reconstruction Recognition Detection Semantic Segmentation Image Generation


LEVEL: ADVANCED EXPERIENCE: 10 YEARS
Deep Learning SVM RF AdaBoost
DESIGN AND OFFICE SKILLS
I enjoy designing websites and using collaborative tools such GitHub or Overleaf.
LEVEL: INTERMEDIATE EXPERIENCE: 5 YEARS
Html Php Css Wordpress

LEVEL: ADVANCED EXPERIENCE: 10 YEARS
Office LaTex Overleaf
SOCIAL SKILLS
I'm focused on research & development. I have a strong sense of leadership (I'm leading the autonomous driving vehicle Elektra); Highly organized; dynamic; excellent planning skills with great attention to detail and ability to prioritize work.
LEVEL: INTERMEDIATE EXPERIENCE: 5 YEARS
Research Analytical thinking Open-minded Adaptability Leadership Communication

.07

WORKS

img11
Autonomous Driving

Elektra Autonomous Vehicle

Elektra Autonomous Vehicle

Elektra is an autonomous driving platform formed by more than 20 professionals from different backgrounds: CVC-UAB (Environment perception), CAOS-UAB (Enmbedded hardware) UPC-Tarrasa (Control & planning), CTTC-UPC (Positioning), UAB-DEIC (Communications), UAB-CEPHIS (Electronics), CT Ingenieros (Vehicle enginering) and the municipality of Sant Quirze - Barcelona (vehicle testing). Elektra pretends to be the Catalan hub of autonomous driving, with a pool of professionals with whom to carry out research applied to intelligent mobility and technology transfer.

The project is highly relying on computer vision techniques for perception (stereo, stixels, obstacle detection, scene understanding) which tend to be computationally demanding, localization (GPS + IMU and vision) and navigation (Control and Planning).

In order to have a car that can drive, several things are needed. Firstly, accurate pedestrian (obstacle) detection. Secondly, free navigable space detection, which is no more than detecting the lane without obstacles or interferences. Thirdly, localization. The car needs to know where it is at and where it is going towards. Fourthly, planning. The car has to plan its way from point A to point B in the smoothest way possible and thus define a global trajectory. And last but not least, control: to execute the motion plan performing the necessary manoeuvres.

img11
3D recostruction

Stereo Disparity

Semi-Global Matching

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results on the new embedded energyefficient GPU devices. Our design runs on a Tegra X1 at 42 frames per second (fps) for an image size of 640×480, 128 disparity levels, and using 4 path directions for the SGM method.

img11
Stixel Representation

Stixel Representation in the GPU

GPU-accelerated real-time stixel computation

The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energy-efficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that produces reliable results at 26 frames per second (real-time) on the Tegra X1 for disparity images of 1024x440 pixels and stixel widths of 5 pixels, and achieves more than 400 frames per second on a high-end Titan X GPU card.

img11
Polyp Segmentation

Image Semantic Segmentation

img

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.

img11
Image Generation

PixelVAE: A Latent Variable Model for Natural Images

img

PixelVAE: A Latent Variable Model for Natural Images

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but tend to suffer from image blurriness. PixelCNNs model sharp contours and details very well, but lack an explicit latent representation and have difficulty modeling large-scale structure in a computationally efficient way. In this paper, we present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. The resulting architecture achieves state-of-the-art log-likelihood on binarized MNIST. We extend PixelVAE to a hierarchy of multiple latent variables at different scales; this hierarchical model achieves competitive likelihood on 64x64 ImageNet and generates high-quality samples on LSUN bedrooms.

img11
SYNTHIA

dataset

SYNTHIA dataset

SYNTHIA has been one of our latest Projects, where we have created a driving simulator in order to teach driverless cars how to drive. The simulator has been licensed to various international companies.

SYNTHIA allows to generate datasets with the purpose of aiding scene understanding problems in the context of driving scenarios. The datasets consists of photo-realistic images rendered from a virtual city and comes with precise pixel-level semantic class and instance annotations (currently, it is cityscapes compatible), depth and optical flow. The CVC team is constantly incrementing the functionalities of SYNTHIA for generating more types of ground truth. Different conditions can be forced such as illumination, weather, season, and camera locations (multi-camera calibrated settings are possible). The current version of SYNTHIA allows generating more than 10000 images per hour with such a ground truth, using a game-graded regular GPU.

.08

CONTACT

Get in touch


I'm always open to new opportunities
Simply use the form below to get in touch

SEND MESSAGE