1st SEMINAR – PROF. NICU SEBE

The ELLIS Unit Barcelona organizes the first seminar in the seminar series!

They aim to bring to Catalonia leading ELLIS researchers in the area of machine learning and AI to share their latest findings and give the audience the chance to connect and interact with the invited speakers.

Our first seminar will feature a talk by Prof. Nicu Sebe, who will speak on “Cross-modal understanding and generation of multimodal content”.

0
0
0
0
Days
0
0
Hrs
0
0
Min
0
0
Sec

DETAILS

📆 Wednesday 18th September, 2024

🕙️ 17:00 h – 19:00 h

📍 Ateneu Barcelonès (Sala Oriol Bohigas), Barcelona

SEMINAR SCHEDULE

Univ. of Trento, Co-director of ELLIS Unit Trento; Co-director of Multimodal Learning Systems ELLIS Program

Cross-modal understanding and generation of multimodal content

Prof. Nicu Sebe

Nicu Sebe is a professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He received his PhD from the University of Leiden, The Netherlands and has been in the past with the University of Amsterdam, The Netherlands and the University of Illinois at Urbana-Champaign, USA. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Multimedia Retrieval (ICMR) 2017 and ACM Multimedia 2013. He was a program chair of ACM Multimedia 2011 and 2007, ECCV 2016, ICCV 2017, ICPR 2020 and a general chair of ACM Multimedia 2022. He is a fellow of ELLIS, IAPR and a Senior member of ACM and IEEE.

Video generation consists of generating a video sequence so that an object in a source image is animated according to some external information (a conditioning label, a driving video, a piece of text). In this talk I will present some of our recent achievements addressing generating videos without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. Based on this, I will present our framework to train game-engine-like neural models, solely from monocular annotated videos. The result —a Learnable Game Engine (LGE)— maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint. Similarly to a game engine, it models the logic of the game and the underlying rules of physics, to make it possible for a user to play the game by specifying both high- and low-level action sequences. Our LGE can also unlock the director’s mode, where the game is played by plotting behind the scenes, specifying high-level actions and goals for the agents in the form of language and desired states. This requires learning “game AI”, encapsulated by our animation model, to navigate the scene using high-level constraints, play against an adversary, devise the strategy to win a point.