In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. We believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing events happening in the scene. In this paper we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over the scenario of recognizing different kinds of kick events in soccer videos.
Visual Reasoning on Complex Events in Soccer Videos Using Answer Set Programming
Bozzato, Loris;
2019-01-01
Abstract
In the context of computer vision, most of the traditional action recognition techniques assign a single label to a video after analyzing the whole video. We believe that understanding of the visual world is not limited to recognizing a specific action class or individual object instances, but also extends to how those objects interact in the scene, which implies recognizing events happening in the scene. In this paper we present an approach for identifying complex events in videos, starting from detection of objects and simple events using a state-of-the-art object detector (YOLO). We provide a logic based representation of events by using a realization of the Event calculus that allows us to define complex events in terms of logical rules. Axioms of the calculus are encoded in a logic program under Answer Set semantics in order to reason and formulate queries over the extracted events. The applicability of the framework is demonstrated over the scenario of recognizing different kinds of kick events in soccer videos.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.