Multi-camera tracking; Structure from Motion; Computer vision; Machine learning
Lettry Louis, Vanhoey Kenneth, Gool Luc Van (2018), DARN: a Deep Adversial Residual Network for Intrinsic Image Decomposition, in
Winter Application Conference of Vision, -, -.
Jose C., Cisse M., Fleuret F. (2018), Kronecker Recurrent Units, in
Proceedings of the Workshop Track of the International Conference on Learning Representationsthe (IC, n.a., n.a..
Chavdarova T., Fleuret F. (2018), SGAN: An Alternative Training of Generative Adversarial Networks, in
Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR), n.a., n.a..
Chavdarova T., Baqué P., Bouquet S., Maksai A., Jose C., Bagautdinov T., Lettry L., Fua P., Van Gool L., Fleuret F. (2018), WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection, in
Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR), n.a., n.a..
Chavdarova T., Fleuret F. (2017), Deep Multi-Camera People Detection, in
Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), 848-853, n.a., n.a.848-853.
Maksai A., Wang X., Fleuret F., Fua P. (2017), Non-Markovian Globally Consistent Multi-Object Tracking, in
Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2563-2573, n.a., n.a.2563-2573.
Perdoch Michal, Vanhoey Kenneth, Gool Luc Van, Lettry Louis (2017), Repeated Pattern Detection using CNN activations, in
Winter Application Conference of Vision, -, -.
Kroeger Till, Timofte Radu, Dai Dengxin, Gool Luc Van (2016), Fast Optical Flow using Dense Inverse Search, in
Proceedings of the European Conference on Computer Vision, 9908, 471-488, Springer, - 9908, 471-488.
Canévet O., Jose C., Fleuret F. (2016), Importance Sampling Tree for Large-scale Empirical Expectation, in
Proceedings of the International Conference on Machine Learning (ICML), 1454-1462, n.a., n.a.1454-1462.
Canévet O., Jose C., Fleuret F. (2016), Importance Sampling Tree for Large-scale Empirical Expectation, in
Proceedings of the International Conference on Machine Learning (ICML), -, -.
Lettry L., Dragon R., Gool L. Van (2016), Markov Chain Monte Carlo Cascade for Camera Network Calibration based on Unconstrained Pedestrian Tracklets, in
Asian Conference of Computer Vision 2016, -, -.
Baqué P., Bagautdinov T., Fleuret F., Fua P. (2016), Principled Parallel Mean-Field Inference for Discrete Random Fields, in
Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR), -, -.
Jose C., Fleuret F. (2016), Scalable Metric Learning via Weighted Approximate Rank Component Analysis, in
Proceedings of the European Conference on Computer Vision (ECCV), -, -.
Jose C., Fleuret F. (2016), Scalable Metric Learning via Weighted Approximate Rank Component Analysis, in
Proceedings of the European Conference on Computer Vision (ECCV), 875-890, n.a., n.a.875-890.
Mohammad Emtiyaz Khan, Pierre Baqué, François Fleuret, Pascal Fua (2015), Kullback-Leibler Proximal Variational Inference, in
Proceedings of the international conference on Neural Information Processing Systems (NIPS), -, -.
People tracking is central to many applications ranging from surveillance in complex urban environments to behavioral analysis in cluttered work spaces. However, in spite of years of sustained research, existing approaches can still only operate successfully in constrained environments, such as a sport arena, or for restricted subsets of human activities, such as walking along a city street. The goal of this project is to dramatically broaden the scope of current methods so that the resulting algorithms can be ``taken into the wild,'' that is, to far more unconstrained and generic settings.To this end, we will build on the multi-camera multi-target approach we have developed jointly over many years (Fleuret et al., 2008; Berclaz et al., 2011). Recently, we have focused on tracking basketball and soccer players (Ben Shitrit et al., 2011, 2012) and outperform state-of-the-art approaches. However, this is only true in controlled environments, and our current system cannot operate in cluttered real-world public spaces for the following three reasons: First, it requires multiple cameras, each carefully calibrated, and assumes a planar area of interest in which the only moving things are the people. Second, it relies on background subtraction, which is sensitive to global illumination changes, and on appearance models that have to be learned prior to using the system. Third, it cannot leverage sophisticated motion models either at the individual target level or at the group level.The objective of WildTrack is to eliminate these weaknesses, which will require the joint expertise of all three partners. The research will be organized around a joint benchmarking platform and will be decomposed into the following three sub-projects.Sub-project 1 - Environment Modeling and Camera Calibration:Our existing tracking system relies on camera calibration, a planar ground, and entrances and exits restricted to the edges of the area of interest. It also ignores potential occluders, such as pillars that limit the field of views of some cameras. This sub-project will rely on Structure-from-Motion (SfM) techniques combined with object class detections and tracking results to produce a more refined 3D model of the environment. This includes breaking loose of planarity restrictions on the ground geometry, obtaining knowledge about typical trajectories, as well as about probable sources and sinks for the moving objects. We will consider both cases where traditional uncalibrated SfM can be applied---sufficiently many cameras with enough fields-of-view overlap---and cases where that is not true and information about object classes and probable trajectories is all the more important to compensate for the failure of normal SfM.Sub-project 2 - Large-Scale Learning for Detection and Recognition:Our current implementation relies on background-subtraction to detect humans and on crude color-based appearance models to disambiguate difficult tracking situations, neither of which is robust to changes in imaging conditions. This sub-project will tackle this weakness by learning from large training sets. We will first collect videos with multiple cameras and keep trajectories for which we have high prediction confidence. Data gathered along these trajectories will be used to train predictors to detect moving objects visible in a limited number of views, potentially corrupted by noise. It will also allow the use of transfer learning that will make it possible to learn someone's appearance from a handful of exemplar images.Sub-project 3 -- Convex High-dimension Tracking:At present, we characterize people solely by their 2D ground positions, thus ignoring their 3D poses and interactions with each other and inanimate objects. This is sufficient when tracking pedestrians whose range of motion is small but is limiting when dealing with more complex behaviors, such as those of people sitting, standing, or reclining in the course of their daily lives. Removing these limitations will require performing our multi-target tracking in much higher dimensional state-spaces than the ones we have worked with so far, and will be the focus of this sub-project.These objectives require in-depth expertise in many core areas of Computer Vision and Machine Learning, which the three research groups involved in this project possess collectively. As a result, it will improve the state-of-art and yield a people-tracking approach that can truly be deployed in the wild.