Dr Ferran Diego (July 2011)
Title Probabilistic alignment of video sequences recorded by moving cameras
Advisors Dr Joan Serrat
Tribunal Josep Lladós (Universitat Autònoma de Barcelona & Computer Vision Center, Spain)
David Marimon (Telefónica I+D, Spain)
Christian Bauckhage (Fraunhofer Institute Intelligent Analysis and Information Systems, Germany)
Keywords video, synchronization, spatio-temporal, registration, Bayesian network, road detection, geolocation
Abstract Video alignment consists of integrating multiple video sequences recorded independently into a single video sequence. This means to register both in time (synchronize frames) and space (image registration) so that the two videos sequences can be fused or compared pixel-wise. In spite of being relatively unknown, many applications today may benefit from the availability of robust and efficient video alignment methods. For instance, video surveillance requires to integrate video sequences that are recorded of the same scene at different times in order to detect changes. The problem of aligning videos has been addressed before, but in the relatively simple cases of fixed or rigidly attached cameras and simultaneous acquisition. In addition, most works rely on restrictive assumptions which reduce its difficulty such as linear time correspondence or the knowledge of the complete trajectories of corresponding scene points on the images; to some extent, these assumptions limit the practical applicability of the solutions developed until now. In this thesis, we focus on the challenging problem of aligning sequences recorded at different times from independent moving cameras following similar but not coincident trajectories. More precisely, this thesis covers four studies that advance the state-of-the-art in video alignment. First, we focus on analyzing and developing a probabilistic framework for video alignment, that is, a principled way to integrate multiple observations and prior information. In this way, two different approaches are presented to exploit the combination of several purely visual features (image-intensities, visual words and dense motion field descriptor), and global positioning system (GPS) information. Second, we focus on reformulating the problem into a single alignment framework since previous works on video alignment adopt a divide and conquer strategy, i.e., first solve the synchronization, and then register corresponding frames. This also generalizes the ‘classic’ case of fixed geometric transform and linear time mapping. Third, we focus on exploiting directly the time domain of the video sequences in order to avoid exhaustive cross-rame search. This provides relevant information used for learning the temporal mapping between pairs of video sequences. Finally, we focus on adapting these methods to the online setting for road detection and vehicle geolocation. The qualitative and quantitative results presented in this thesis on a variety of real-world pairs of video sequences show that the proposed method is: robust to varying imaging conditions, different image content (e.g., incoming and outgoing vehicles), variations on camera velocity, and different scenarios (indoor and outdoor) going beyond the state-of-the-art. Since it is difficult to completely present in a written document the results for the investigated datasets, they can be viewed at



Dr Jose M. Álvarez (October 2010)
Title Combining Context and Appearance for Road Detection
Advisors Dr Antonio M. López and Dr Theo Gevers
Tribunal Urbano Nunes (Universidade de Coimbra, Portugal)
Joost Van De Weijer (Universitat Autònoma de Barcelona & Computer Vision Center, Spain)
Jan-Mark Geusebroek (University of Amsterdam, The Netherlands)
Abstract Road traffic crashes have become a major cause of death and injury throughout the world. Hence, in order to improve road safety, the automobile manufacture is moving towards the development of vehicles with autonomous functionalities such as keeping in the right lane, safe distance keeping between vehicles or regulating the speed of the vehicle according to the traffic conditions. A key component of these systems is vision-based road detection that aims to detect the free road surface ahead the moving vehicle. Detecting the road using a monocular vision system is very challenging since the road is an outdoor scenario imaged from a mobile platform. Hence, the detection algorithm must be able to deal with continuously changing imaging conditions such as the presence of different objects (vehicles, pedestrians), different environments (urban, highways, off-road), different road types (shape, color), and different imaging conditions (varying illumination, different viewpoints and changing weather conditions). Therefore, in this thesis, we focus on vision-based road detection using a single color camera. More precisely, we first focus on analyzing and grouping pixels according to their low-level properties. In this way, two different approaches are presented to exploit color and photometric invariance. Then, we focus the research of the thesis on exploiting context information. This information provides relevant knowledge about the road not using pixel features from road regions but semantic information from the analysis of the scene. In this way, we present two different approaches to infer the geometry of the road ahead the moving vehicle. Finally, we focus on combining these context and appearance (color) approaches to improve the overall performance of road detection algorithms. The qualitative and quantitative results presented in this thesis on real-world driving sequences show that the proposed method is robust to varying imaging conditions, road types and scenarios going beyond the state-of-the-art.
Download Not available



Dr David Gerónimo (February 2010)
Title A Global Approach To Vision-Based Pedestrian Detection for Advanced Driver Assistance Systems
Advisors Dr Antonio M. López
Tribunal Krystian Mikolajczyk (University of Surrey, United Kingdom)
Jaume Amores (Computer Vision Center & Universitat Autònoma de Barcelona, Spain)
Dariu M. Gavrila (University of Amsterdam and Daimler AG, The Netherlands)
Keywords ADAS, Computer vision, Pedestrian detection
Abstract At the beginning of the 21th century, traffic accidents have become a major problem not only for developed countries but also for emerging ones. As in other scientific areas in which Artificial Intelligence is becoming a key actor, advanced driver assistance systems, and concretely pedestrian protection systems based on Computer Vision, are becoming a strong topic of research aimed at improving the safety of pedestrians. However, the challenge is of considerable complexity due to the varying appearance of humans (e.g., clothes, size, aspect ratio, shape, etc.), the dynamic nature of on-board systems and the unstructured moving environments that urban scenarios represent. In addition, the required performance is demanding both in terms of computational time and detection rates. In this thesis, instead of focusing on improving specific tasks as it is frequent in the literature, we present a global approach to the problem. Such a global overview starts by the proposal of a generic architecture to be used as a framework both to review the literature and to organize the studied techniques along the thesis. We then focus the research on tasks such as foreground segmentation, object classification and refinement following a general viewpoint and exploring aspects that are not usually analyzed. In order to perform the experiments, we also present a novel pedestrian dataset that consists of three subsets, each one addressed to the evaluation of a different specific task in the system. The results presented in this thesis not only end with a proposal of a pedestrian detection system but also go one step beyond by pointing out new insights, formalizing existing and proposed algorithms, introducing new techniques and evaluating their performance, which we hope will provide new foundations for future research in the area.



Dr Carme Julià (February 2008)
Title Missing Data Matrix Factorization Addressing the Structure from Motion Problem
Advisors Dr Angel D. Sappa and Dr Felipe Lumbreras
Tribunal Pedro Aguiar (Institute for Systems and Robotics, Technical University of Lisbon, Portugal)
Xavier Lladó (Institut d’Informàtica i Aplicacions, Universitat de Girona, Spain)
Xavier Binefa (Universitat Pompeu Fabra, Spain)
Enrique Cabello (Face Recognition and Artifial Vision Group, Universidad Rey Juan Carlos, Madrid, Spain)
Antonio M. López (Centre de Visió per Computador, Universitat Autònoma de Barcelona, Spain)
Keywords Alternation techniques, Matrix factorization, Structure from motion
Abstract This work is focused on the missing data matrix factorization addressing the Structure from Motion (SFM) problem. The aim is to decompose a matrix of feature point trajectories into the motion and shape matrices, which contain the relative camera-object motion and the 3D positions of tracked feature points, respectively. This decomposition can be found by using the fact that the matrix of trajectories has a reduced rank. Although several techniques have been proposed to tackle this problem, they may give undesirable results when the percentage of missing data is high. An iterative multiresolution scheme is presented to deal with matrices with high percentages of missing data. Experimental results show the viability of the proposed approach.
In the multiple objects case, factorization techniques can not be directly applied to obtain the SFM of every object, since trajectories are not sorted into different objects. Furthermore, another problem should be faced out: the estimation of the rank of the matrix of trajectories. The problem is that, in this case, the rank of the matrix of trajectories is not bounded, since any prior knowledge about the number of objects nor about their motion is used. This problem becomes more difficult with missing data, since singular values can not be computed to estimate the rank. A technique to estimate the rank of a missing data matrix of trajectories is presented. The good performance of the proposed technique is empirically shown considering sequences with both, synthetic and real data. Once the rank is estimated and the matrix of trajectories is full, the motion segmentation of trajectories is computed. Finally, any factorization technique for the single object case gives the shape and motion of every object.
In addition to the SFM problem, this thesis also shows other applications that can be addressed by means of factorization techniques. Concretely, the Alternation technique, which is used through the thesis, is adapted to address each particular problem. The first proposed application is the photometric stereo: the goal is to recover the reflectance and surface normals and the light source direction at each frame, from a set of images taken under different lighting conditions. In a second application, the aim is to fill in missing data in gene expression matrices by using the Alternation technique. Finally, the Alternation technique is adapted to be applied in recommender systems, widely considered in E-commerce. For each application, experimental results are given in order to show the good performance of the proposed Alternation-based strategy.



Dr Daniel Ponsa (June 2007)
Title Model-Based Visual Localisation Of Contours And Vehicles
Advisors Dr Antonio M. López and Dr Xavier Roca
Tribunal Juan José Villanueva (Computer Vision Center and Universitat Autònoma de Barcelona)
Juan Andrade (Institut de Robòtic Industrial, Universitat Politècnica de Catalunya)
Thorsten Graf (Volkswagen AG)
Georg Schneider (Audi Electronics Venture GmbH)
Joan Serrat (Computer Vision Center and Universitat Autònoma de Barcelona)
Keywords Bayesian Tracking, Contour Tracking,  Vehicle Detection and Tracking.
Abstract This thesis focuses the analysis of video sequences, applying model-based techniques for extracting quantitative information. In particular, we make several proposals in two application areas: shape tracking based on contour models, and detection and tracking of vehicles in images acquired by a camera installed on a mobile platform.
The work devoted to shape tracking follows the paradigm of active contours, from which we present a review of the existent approaches. First, we measure the performance of the most common algorithms (Kalman based filters and particle filters), and then we evaluate its implementation aspects trough an extensive experimental study, where several synthetic sequences are considered, distorted with different degrees of noise. Thus, we determine the best way to implement in practice these classical tracking algorithms, and we identify its benefits and drawbacks.
Next, the work is oriented towards the improvement of contour tracking algorithms based on particle filters. These algorithms reach good results provided that the number of particles is high enough, but unfortunately the required number of particles grows exponentially with the number of parameters to be estimated. Therefore, and in the context of contour tracking, we present three variants of the classical particle filter, corresponding to three new strategies to deal with this problem. First, we propose to improve the contour tracking by propagating more accurately the particles from one image to the next one. This is done by using a linear approximation of the optimal propagation function. The second proposed strategy is based in estimating part of the parameters analytically. Thus, we aim to do a more productive use of the particles, reducing the amount of model parameters that must be estimated through them. The third proposed method aims to exploit the fact that, in contour tracking applications, the parameters related to the rigid transform can be estimated accurately enough independently from the local deformation presented by the contour. This is used to perform a better propagation of the particles, concentrating them more densely in the zone where the tracked contour is located. These three proposals are validated extensively in sequences with different noise levels, on which the reached improvement is evaluated.
After this study, we propose to deal directly with the origin of the previous problem by reducing the number of parameters to be estimated in order to follow a given shape of interest. To reach that, we propose to model the shape using multiple models, where each one requires a lower quantity of parameters than when using a unique model. We propose a new method to learn these models from a training set, and a new algorithm to use the obtained models for tracking the contours. The experimental results certify the validity of this proposal.
Finally, the thesis focuses on the development of a system for the detection and tracking of vehicles. The proposals include: a vehicle detection module, a module devoted to the determination of the three-dimensional position and velocity of the detected vehicles, and a tracking module for updating the location of vehicles on the road in a precise and efficient manner. Several original contributions are done in these three subjects, and their performance is evaluated empirically.