Deep learning; Computer vision; Machine learning; Pattern recognition
Katharopoulos A., Vyas A., Pappas N., Fleuret F. (2020), Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, in Proceedings of the International Conference on Machine Learning (ICML)
, PMLR, Boston.
Srinivas Suraj, Fleuret Françcois (2019), Full-Gradient Representation for Neural Network Visualization, in Advances in Neural Information Processing Systems 32
, 4124-4133, Curran Associates, Inc., New-York4124-4133.
Katharopoulos Angelos, Fleuret Francois (2019), Processing Megapixel Images with Deep Attention-Sampling Models, in Proceedings of the 36th International Conference on Machine Learning
, 97, 3282-3291, PMLR, New-York 97, 3282-3291.
Srinivas Suraj, Fleuret Francois (2018), Knowledge Transfer with Jacobian Matching, in Proceedings of the 35th International Conference on Machine Learning
, 80, 4723-4731, PMLR, New-York 80, 4723-4731.
Katharopoulos Angelos, Fleuret Francois (2018), Not All Samples Are Created Equal: Deep Learning with Importance Sampling, in Proceedings of the 35th International Conference on Machine Learning
, 80, 2525-2534, PMLR, New-York 80, 2525-2534.
This project aims at investigating two key issues for the training of large neural networks over large scale training sets. All the developed techniques will be benchmarked on standard image classification and object-detection data-sets, on pedestrian detection and re-identification, and on controlled data-sets produced over the course of the project.We structure this proposal in two sub-projects. The first will investigate a general strategy based on importance sampling to deal with very large training sets. While the current common approach to learning is the stochastic gradient descent, very few efforts have been invested in the choice of the ``re-sampling'' strategy. Virtually every state-of-the-art method uses a uniform visit of the samples over the learning, without prioritizing according to computation done previously.Our objective is to develop a general framework to apply importance-sampling to gradient-descent and other optimization schemes so that they concentrate the computational effort over problematic and informative samples.The second sub-project will investigate a central, and currently poorly addressed, practical issue when using large neural networks, which is the meta-optimization of the network structure itself. We are interested in studying approaches to avoid the requirement for the standard and computational intensive combination of meta-parameter grid-search and hand-tuning.