People Tracking and Re-Identification from Multiple Cameras
In many surveillance or monitoring applications, one or more cameras view several people that move in an environment. Multi-person tracking amounts to using the videos from these cameras to determine who is where at all times. The problem is very challenging both computationally and conceptually. On one hand the amount of videos to process in enormous while near real-time performance is desired. On the other hand people's varying appearance due to lighting, occlusions, viewpoint changes, and unpredictable motion in blind spots make person re-identification challenging.
This dissertation makes several contributions to person re-identification and multi-person tracking from multiple cameras. We present a weighted triplet loss for learning appearance descriptors which addresses both problems uniformly, doesn't suffer from the imbalance between positive and negative examples, and remains robust to outliers. We introduce the largest tracking benchmark to date, DukeMTMC, and adequate performance measures that emphasize correct person identification. A formulation for associating person observations is then introduced which maximizes agreements on the evidence graph. We assemble a tracker called DeepCC that combines an existing person detector, hierarchical and online reasoning, our appearance features and correlation clustering association. DeepCC achieves increased performance on two challenging sequences from the DukeMTMC benchmark, and ablation experiments demonstrate the merits of individual components.