Yan Yan, Multi-task and Multi-modal Learning for Image and Video Scene Understanding

Yan Yan, Illinois Institute of Technology, Computer Science.
November 23, 2021. 12:45 – 1:45 pm.
John T. Rettaliata Engineering Center, Room 106.

Image and video scene understanding is to acquire the information about What, When, Where, Who, How, and Why for the situation that captures the attributes and structure of a scene in images/videos. Image and video scene understanding is a challenging task in computer vision and is an important step to realize artificial intelligence. Multi-task learning, as one important branch of machine learning, has developed fast during the past decade. Multi-task learning methods aim to simultaneously learn classification or regression models for a set of related tasks. This typically leads to better models as compared to a learner that does not account for task relationships. In this talk, we will investigate a multi-task learning framework for image/video scene understanding from low-level to high-level tasks including human pose estimation, action/activity recognition, semantic segmentation and event detection. Moreover, extracting useful information from images/videos requires robustness which depends on different sensors. As we know, humans explore the world via different perceptions and multi-modal signals such as audio, visual and language. Therefore we will also investigate the multi-modal learning approach to improve image/video scene understanding.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.