Adaptive Skew-Sensitive Ensembles for Video-to-Video Face Recognition
Video-based face recognition (FR) is employed more and more to assist operators of intelligent video surveillance (VS) systems in industry and public sectors, due in large part to the low cost camera technologies and the advances in the areas of biometrics, pattern recognition and computer vision. Decision support systems are employed in crowded scenes (airports, shopping centers, stadiums, etc.), where an human operator monitors live or archived videos to analyze a scene (Hampapur et al., 2005). VS systems perform a growing number of functions, ranging from real time recognition and video footage analysis to fusion of video data from different sources (Gouaillier, 2009). FR in VS (FRiVS) can be employed in a range of still-to-video (as found in, e.g., watchlist screening) and video-to-video (as found in, e.g., face re-identification) applications. In still-to-video FR, a gallery of still images is employed in the construction of facial models, whereas in video-to-video FR facial models are designed from video streams.
Of special interest in this Thesis is the automatic detection of a target individual of interest enrolled to a video-to-video FR system. In this human-centeric scenario, live or archived videos are analyzed, and the operator receives an alarm if it detects the presence of a target individual enrolled to the system. Due to the high amount of non-target individuals appearing in crowded scenes, avoiding false alarms while maintaining a high detection rate is challenging for such a system. The design of a FR system for real world applications raises many challenges.
Objective and contributions
In this Thesis, a new framework for adaptive MCSs is proposed for partially-supervised learning of facial models over time based on facial trajectories. This framework is designed to implement systems for video-to-video FR, as needed for face re-identification applications, where gradual or abrupt environmental changes occur over time. In Bayesian decision theory, these changes correspond to changes in the probability density function of the faces (e.g. appearance of the face), or the prior probabilities (class proportions). The main contribution of this Thesis includes the proposal of an adaptive MCS for video-to-video FR for video surveillance, capable of spatio-temporal recognition and self-updating based on highly confident facial trajectories captured in scene. The system is also capable of adapting the fusion function of individual-specific classifiers to the operational imbalance in video-to-video FR. This contribution is divided into three parts.
A REVIEW OF TECHNIQUES FOR ADAPTIVE FACE RECOGNITION IN VIDEO
Intelligent video surveillance systems that employ face recognition (FR) for decision support are important in many private, but mostly public sector applications. The extensive use of FR systems is due in part to the universality of the human face as a biometric trait that can be covertly captured, the availability of low cost cameras, and to advances in biometrics, pattern recognition and image/video processing. These systems are being considered for video surveillance in crowded scenes (airports, shopping centers, stadiums, etc.) In these scenes, an operator observes the scene through surveillance cameras and monitor who or what is in scene (Hampapur et al., 2005). Although many decision support systems exist, there are still many functions to be developed or improved. These areas of opportunity for researchers range from the real time recognition to fusion of video data from different sources, passing through the design of compact biometric models and the preservation of performance over time (Gouaillier, 2009; Ahmad et al., 2008). Of special interest in this Thesis is the automatic detection of individuals of interest enrolled to a system, based on the appearance of their face, and the preservation of system’s performance regardless of variations over time of a target individual’s appearance.
Challenges of FRiVS
Many challenges have been found in FRiVS that remain as a research area. As stated by Zhao et al. in (Zhao et al., 2003), FR from outdoor images of dense scenes, under unconstrained conditions, is still a research problem. This problem has been addressed by considering time information in video-based approaches (Matta and Dugelay, 2009). However noisy sensed data from the complex, changing environment may lead biometric model that does not correspond to the true biometric samples, which affects directly the accuracy of the matching algorithm.
Overlapping class distributions due to inter-class similarity also increases the number of false alarms produced by the system. Facial models designed with a limited set of training data from the complex data distribution of faces in feature space are scarcely representative. Even if the facial models are representative, most FR systems assume that face samples in operation are acquired by the same sensor as the used to acquire training data, which is not necessarily true and affect accuracy. Also factors like an inappropriate interaction of the biometric system with the sensor, and inherent scene properties like environmental or temporal changes of the true distribution of faces in feature space, may degrade the accuracy of the system (Rattani, 2010; Poh et al., 2009). The quality of facial models is then a critical issue in the overall biometric application performance. The recognition problem becomes more challenging if we consider that faces do not remain static over time, and present either gradual (e.g. aging) or abrupt (e.g.
pose, illumination) changes along the system’s operation.
Adaptive Face Recognition
Many researchers have recently focused on the interesting area of updating biometric models over time employing new acquired data. These adaptive biometric systems can be categorized according to the way class labels are obtained. Unsupervised approaches do not require class labels to update biometric models, and a simultaneous recognition and update is performed.
On the other hand, Supervised approaches use only labeled data previously acquired in an offline update. Approaches in which biometric models are built supervised, and unsupervised adaptation is performed online, are also called partially-supervised or semi-supervised. Table 1.2 shows different approaches to adapt facial models as new data becomes available, either from daily operations or security reports.
A Self-Updating System for Spatio-Temporal Face Recognition
The structure of the adaptive MCS for video-to-video FR is shown in Fig. 3.2. It is composed of 7 subsystems: 5 used in normal operation and 2 used in the design/self-update phase. The segmentation module is used for face detection, the feature extraction/selection module and the matcher with one EoD per enrolled individual produces classification predictions. The IVT face tracker follows faces in scene allowing the spatio-temporal fusion system to regroup and accumulate target predictions over a fixed size window for enhanced spatio-temporal FR. Detection (γd k ) and update (γu k ) thresholds for spatio-temporal fusion are estimated using validation trajectories, and the design/update module avoids knowledge corruption by using a learn-andcombine strategy. Individual-specific EoDs are designed by the design/update module, by training a pool of PFAM 2-class classifiers using a DPSO training strategy, and estimating the fusion function with BC. The sample selection system allows to reduce the negative bias of the training and validation sets using the OSS and random selection strategies.
The analysis of simulation results has been divided into three levels. First, transaction-based analysis shows the performance of the system based on classification decisions on each ROI.
Then, a subject-based analysis allows a focus on specific individuals, which in turn allows for levels of performance depending on particular characteristics. Finally, a trajectory based analysis shows the overall performance of the system after the decision fusion accumulates predictions for complete input trajectories (shown in Fig. 3.5).
Systems for face recognition (FR) in video surveillance are applied in a range of scenarios like watchlist screening, face re-identification and search and retrieval. Several challenges are present in these applications, including the common assumption that the facial appearance of target individuals do not change over time, and that the proportions of faces captured for target and non-target individuals are balanced, known a priori and remain fixed. However, faces captured during operations vary due to capture conditions, the proportions of target and nontarget individuals continuously change during operations, and facial models used matching are commonly not representative since they are designed a priori, with a limited amount of reference samples that are collected and labeled at a high cost.
In this Thesis, a framework for adaptive systems for video-to-video face recognition (FR) in video surveillance is proposed, contributing with new techniques to adapt the facial models for enrolled individuals of interest. This framework allows the systems for trajectory-based selfupdating to automatically update facial models, considering gradual and abrupt changes in the classification environment. Besides, with the use of a modification to SSBC, the systems are capable to adapt the individual-specific ensembles to the operational imbalance.
Table des matières
CHAPTER 1 A REVIEW OF TECHNIQUES FOR ADAPTIVE FACE
RECOGNITION IN VIDEO SURVEILLANCE
1.1 Face Recognition in Video-Surveillance
1.1.1 Specialized Architectures for FRiVS
1.1.2 Challenges of FRiVS
1.2 Adaptive Face Recognition
1.2.1 Semi-Supervisd Learning
1.2.2 Adaptive Biometrics
1.2.3 Challenges of Adaptive FR Systems
1.3 Incremental and On-Line Learning of Classifiers
1.3.1 Fuzzy ARTMAP
1.3.2 PFAM Neural Classifier
1.4 Adaptive Ensembles
1.4.1 Generation of Pools
1.4.2 Selection and Fusion
22.214.171.124 Iterative Boolean Combination
1.4.3 Ensembles for Class Imbalance
126.96.36.199 Passive Approaches
188.8.131.52 Active Approaches
184.108.40.206 Skew-Sensitive Boolean Combination
1.4.4 Challenges on Adaptive Ensembles for Class Imbalance
1.5 Measuring Classification Performance
1.6 Summary of Overall Challenges
CHAPTER 2 PARTIALLY-SUPERVISED LEARNING FROMFACIAL TRAJECTORIES
FOR FACE RECOGNITION IN VIDEO SURVEILLANCE
2.2 Video-to-video Face Recognition
2.2.1 Face Tracking
2.2.2 Specialized Classification Architectures
2.2.3 Decision Fusion
2.2.4 Challenges of Facial Modeling
2.3 Adaptive Biometric Systems
2.3.1 Selection of Representative Samples
2.3.2 Update of Biometric Systems
2.3.3 Adaptive Face Recognition
2.4 A Self-Updating System for Face Recognition in Video Surveillance
2.4.1 Modular Classification System
2.4.2 Tracking System
2.4.3 Decision Fusion System
2.4.4 Design/Update System
2.4.5 Sample Selection
2.5 Experimental Methodology
2.5.1 Video Surveillance Database
2.5.2 Implementation of the Proposed MCS
2.5.3 Experimental Protocol
2.5.4 Performance Analysis
2.6.1 Transaction-Based Analysis
2.6.2 Subject-Based Analysis
2.6.3 Trajectory-Based Analysis
CHAPTER 3 AN ADAPTIVE ENSEMBLE-BASED SYSTEM FOR FACE
RECOGNITION IN PERSON RE-IDENTIFICATION
3.2 Video-to-Video Face Recognition in Person Re-identification
3.2.1 Face Tracking
3.2.2 Face Matching
3.2.3 Spatio-Temporal Fusion
3.2.4 Key Challenges in Person Re-Identification
3.3 Update of Facial Models
3.3.1 Adaptive Biometrics
3.3.2 Adaptive Face Recognition Systems
3.4 A Self-Updating System for Spatio-Temporal Face Recognition
3.4.1 Modular Classification System
3.4.2 Tracking System
3.4.3 Spatio-Temporal Fusion System
3.4.4 Design/Update System
3.4.5 Sample Selection
3.5 Experimental Methodology
3.5.1 Database for Face Re-Identification
3.5.2 Experimental Protocol
3.5.3 Performance Analysis
3.6.1 Subject-Based Analysis
3.6.2 LTM management
3.6.3 Trajectory-Based Analysis
CHAPTER 4 ADAPTIVE SKEW-SENSITIVE ENSEMBLES FOR FACE
RECOGNITION IN VIDEO SURVEILLANCE
4.2 Ensemble Methods for Class Imbalance
4.2.1 Passive Approaches
4.2.2 Active Approaches
4.2.3 Estimation of Class Imbalance
4.3 Adaptive Skew-Sensitive Ensembles for Video-to-Video Face Recognition
4.3.1 Approximation of Operational Imbalance
4.3.2 Design and Adaptation of Ensembles
4.4 Synthetic Experiments
4.4.1 Experimental Protocol
220.127.116.11 Classification on Imbalanced Problems
18.104.22.168 Ensemble Generation
22.214.171.124 Using Several Classifiers per Imbalance
126.96.36.199 Approximation of Imbalance Through Quantification
4.5 Experiments on Video Data
4.5.1 Experimental Protocol
4.5.2 Video Surveillance Data
4.5.3 Experimental Protocol
188.8.131.52 Transaction-Based Analysis
184.108.40.206 Individual-Specific Analysis
220.127.116.11 Approximation of Operational Imbalance
4.5.5 Trajectory-Level Analysis