Project

Back to overview

Probabilistic Motifs for Video Action Recognition (PROMOVAR)

English title Probabilistic Motifs for Video Action Recognition (PROMOVAR)
Applicant Odobez Jean-Marc
Number 138107
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.02.2012 - 31.07.2013
Approved amount 128'775.00
Show all

Keywords (4)

Activity recognition; Probabilistic models; Temporal sequence processing; Topic models

Lay Summary (English)

Lead
Lay summary

Action recognition is key for many tasks such as automatic annotation of videos, improved human-computer interaction and guidance in monitoring public spaces. As the amount of available videos from different sources (from raw personal videos to more professional content) has dramatically increased in the last few years, new methodologies are needed to organize these datasets.


Most recent state-of-the-art  recent techniques for action recognition in naturalistic and unconstrained video documents rely on Bag-of-Word (BoW) representations built from Spatio-Temporal interest point (STIP) descriptors and collected over video segments. Such methods, however, often suffer from  two severe and related drawbacks:

(i) the time information is discarded, although actions are often characterized by  strong temporal components; alternatively, fixed temporal grid schemes are used, assuming that the video clip is already temporally segmented. 

(ii) activities in the same video segments are mixed in the representation, and plagues recognition algorithms that are based on these. 

To address these issues, we will investigate  novel techniques relying on principled probablistic techniques (topic models) and symbolic pattern mining to  capture information lying in the temporal relationships  between recognized ``action'' units. To this end we will extend our previous work  on the automatic extraction of temporal motifs from word x time documents, which not only captures the co-occurrence between words, but also the order in which they occur, and can handle interleaved activites. Investigated techniques will be focused around three main axes.

A. Motif  representation,  will investigate models with a hierarchical structure relying on recurring sequences of lower-level temporal motifs, and  improving the robustness of the motif representation to handle the usually small amount of annotated data available  in supervised action classification.

B. Action recognition in unconstrained video documents, which  investigate the use of  Motifs extracted from STIP BoW representations, leveraging on our modeling to identify meaningful and interleaved temporal patterns with longer temporal support than those of STIP, and addressing corresponding challenges (generative vs discriminative modeling, vocabulary size, complexity).

C. Joint temporal and spatial action learning and recognition will address  the  learning of action motifs while jointly infering where these motifs occur in the images in addition to when they occur as currently performed  by our model, allowing to address weakly supervised action recognition tasks. 

Evaluation on standard human action, movie, and sports databases from the litterature will be conducted to assess the performances of our algorithms.

Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Time-sensitive topic models for action recognition in videos.
Tavenard Romain, Emonet Rémi, Odobez Jean-Marc (2013), Time-sensitive topic models for action recognition in videos., in IEEE International Conference on Image Processing, Melbourne.
Investigating time-sensitive topic model approaches for action recognition
Tavenard R., Emonet R., Odobez J-M. (2013), Investigating time-sensitive topic model approaches for action recognition.
Discovering temporal patterns in water quality time series, focusing on floods with the LDA method
Aubert A., Tavenard R., Emonet R., Malinowski S., Guyet T., Quiniou R., Odobez J-M., Gascuel-Odoux C. (2013), Discovering temporal patterns in water quality time series, focusing on floods with the LDA method, in European Geosciences Union (EGU) annual workshop, Vienna.

Collaboration

Group / person Country
Types of collaboration
Institut National de Recherche en Agronomie (INRA) France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
European Geosciences Assembly 07.04.2013 Vienna


Abstract

Action recognition is key for many tasks such as automatic annotation of videos, improved human-computer interaction and guidance in monitoring public spaces. As the amount of available videos from different sources (from raw personal videos to more professional content) has dramatically increased in the last few years, new propositions are needed to organize this new data.Most recent state-of-the-art recent techniques for action recognition in naturalistic and unconstrained video documents such as movies or broadcast data rely on Bag-of-Word representations built from Spatio-Temporal interest point descriptors and collected over long video segments. Such methods, however, often suffer from two severe and related drawbacks:1. the time information is discarded, although actions are often characterized by strong temporal components;2. activities in the same video segments are mixed in the representation, and plagues recognition algorithms that are based on these. To address these issues, we will investigate novel techniques relying on principled probablistic techniques (the so called topic models) and symbolic pattern mining to capture information lying in the temporal relationships between recognized ``action'' units in order to enhance performances of action recognition algorithms.To this end we will rely on and greatly extend our previous work on the automatic extraction of temporal motifs from word x time documents as a basis to investigate video-based action recognition. This method, applied to large amounts of surveillance data, not only captures the co-occurrence between words, but also the order in which they occur,and can handle interleaved activites. Investigated techniques will be focused around three main axes.Motif representation will address the development of models with a hierarchical structure allowing to identify recurring sequences of lower-level temporal motifs, and improving the robustness of the motif representation to handle the usually small amount of data available in supervised action classification.Action recognition in unconstrained video documents which investigate the recognition of actions in videos using Motifs extracted from spatio-temporal interest point (STIP) descriptor BoW representations, leveraging on our modeling to identify meaningful and interleaved temporal patterns with longer temporal support than those of STIP, and addressing corresponding challenges (generative vs discriminative modeling, vocabulary size, complexity).Joint temporal and spatial action learning and recognition will address the learning of action motifs while jointly infering {\it where} these motifs occur in the images in addition to {\it when} they occur as currently performed by our model, allowing to address weakly supervised action recognition tasksEvaluation on standard human action, movie, and sports databases from the litterature will be conducted to assess the performances of our algorithms.
-