In this proposal, Applied Research LLC (ARLLC) and the Arizona State University (ASU) propose to investigate the problem of recognizing long and complex events with varying action rhythms, which has not been considered in the literature but is a practical challenge. Our work is inspired in part by how humans identify events with varying rhythms: quickly catching frames contributing most to a specific event. We propose a two-stage end-to-end framework, in which the first stage selects the most significant frames while the second stage recognizes the event using the selected frames. Our model needs only event-level labels in the training stage, and thus is more practical when the sub-activity labels are missing or difficult to obtain. The results of extensive experiments show that our model can achieve significant improvement in event recognition from long videos while maintaining high accuracy even if the test videos suffer from severe rhythm changes. This demonstrates the potential of our method for real-world video-based applications, where test and training videos can differ drastically in rhythms of sub-activities.
Benefit: Our proposed system can be used for surveillance and situation awareness monitoring in different environments. It can also be used for Army operations in desert environments where sand storms may happen frequently. Our system consists of hardware (imager and processor) and software. We expect our system will have a unit price of $25k (may change depending upon cost of imager in the market) per device. With an estimated sales of over 200 units in the next decade, the military market potential results in more than 5 million dollars.
Keywords: Deep Learning, Deep Learning, recurrent neural network, varying rhythm, Event Recognition, video anomaly detection