Looking for the CVPR 2022 Workshop Page?

2nd International Ego4D Workshop @ ECCV 2022

For details about the Ego4D project and data, please refer to the dataset's webpage

Challenge Winners and Validation Reports

Ego4D Audio-Visual Diarization Challenge

Place Team Validation Report Code
1st Place Intel Labs
  • Kyle Min, Intel Labs
Validation Report Code

Audio-Only Diarization Challenge

AV Transcription Challenge

Place Team Validation Report Code
1st Place pyannote
  • Hervé Bredin, IRIT, Université de Toulouse, CNRS
Validation Report Code
2nd Place diart
  • Juan M. Coria, Universite Paris-Saclay CNRS, LISN
  • Sahar Ghannay, Universite Paris-Saclay CNRS, LISN
Validation Report Code
Place Team Validation Report Code
1st Place AVATAR-Google
  • Paul Hongsuck Seo, Google Research
  • Arsha Nagrani, Google Research
  • Cordelia Schmid, Google Research
Validation Report Code

Visual Queries 2D Localization

Place Team Validation Report Code
1st Place Thereisnospoon
  • Mengmeng Xu, KAUST
  • Juan-Manuel Perez-Rua, Facebook
  • Cheng-Yang Fu, Facebook
  • Yanghao Li, Facebook
  • Bernard Ghanem, KAUST
  • Tao Xiang, Facebook
Validation Report Code

Visual Queries 3D Localization

Natural Language Queries

Place Team Validation Report Code
1st Place IVUL
  • Jinjie Mai, KAUST
  • Chen Zhao, KAUST
  • Abdullah Hamdi, KAUST
  • Silvio Giancola, KAUST
  • Bernard Ghanem, KAUST
Validation Report Code

Moments Queries

Place Team Validation Report Code
1st Place VideoIntern
  • Guo Chen, Shanghai AI Laboratory, Nanjing University
  • Jiahao Wang, Nanjing University
  • Yi Liu, SIAT, Shanghai AI Laboratory
  • Yifei Huang, University of Tokyo
  • Jiashuo Yu, Shanghai AI Laboratory
  • Yi Wang, Shanghai AI Laboratory
  • Yali Wang, SIAT, Shanghai AI Laboratory
  • Tong Lu, Nanjing University
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
  • Yu Qiao, Shanghai AI Laboratory
Validation Report Code
2nd Place University of Wisconsin-Madison
  • Sicheng Mo, University of Wisconsin-Madison
  • Fangzhou Mu, University of Wisconsin-Madison
  • Yin Li, University of Wisconsin-Madison
Validation Report Forthcoming

Looking at Me

Place Team Validation Report Code
1st Place VideoIntern
  • Guo Chen, Shanghai AI Laboratory, Nanjing University
  • Jiahao Wang, Nanjing University
  • Yi Liu, SIAT, Shanghai AI Laboratory
  • Yifei Huang, University of Tokyo
  • Jiashuo Yu, Shanghai AI Laboratory
  • Yi Wang, Shanghai AI Laboratory
  • Yali Wang, SIAT, Shanghai AI Laboratory
  • Tong Lu, Nanjing University
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
  • Yu Qiao, Shanghai AI Laboratory
Validation Report Code
2nd Place University of Wisconsin-Madison
  • Fangzhou Mu, University of Wisconsin-Madison
  • Sicheng Mo, University of Wisconsin-Madison
  • Gillian Wang, University of Wisconsin-Madison
  • Yin Li, University of Wisconsin-Madison
Validation Report Code

Talking to Me

Place Team Validation Report Code
1st Place PKU-WICT-MIPL
  • Xiyu Wei, Wangxuan Institute of Computer Technology, Peking University
  • Dejie Yang, Wangxuan Institute of Computer Technology, Peking University
  • Minghang Zheng, Wangxuan Institute of Computer Technology, Peking University
  • Qingchao Chen, National Institute of Health Data Science, Peking University
  • Yuxin Peng, Wangxuan Institute of Computer Technology, Peking University
  • Yang Liu, Wangxuan Institute of Computer Technology, Peking University; Beijing Institute for General Artificial Intelligence
Forthcoming Code
2nd Place KeioEgo
  • Haowen Hu, Graduate School of Science and Technology, Keio University
  • Ryo Hachiuma, Graduate School of Science and Technology, Keio University
  • Hideo Saito, Graduate School of Science and Technology, Keio University
  • Research code: https://github.com/Huhaowen0130/EgoFlow
Validation Report Forthcoming
Place Team Validation Report Code
1st Place University of Texas at Austin & Meta AI
  • Zihui Xue, The University of Texas at Austin and Meta AI
  • Yale Song, Meta AI
  • Lorenzo Torresani, Meta AI
  • Kristen Grauman, The University of Texas at Austin and Meta AI
Validation Report Forthcoming

Long Term Action Anticipation

Place Team Validation Report Code
1st Place Autonomous Systems
  • Esteve Valls Mascaro, Autonomous Systems, Technische Universitat Wien (TU Wien)
  • Hyemin Ahn, Ulsan National Institute of Science and Technology (UNIST)
  • Dongheui Lee, Autonomous Systems, Technische Universitat Wien (TU Wien) & Institute of Robotics and Mechatronics, German Aerospace Center (DLR)
Validation Report Forthcoming

Future Hand Prediction

Short Term Object Interaction Anticipation

Place Team Validation Report Code
1st Place VideoIntern
  • Guo Chen, Shanghai AI Laboratory, Nanjing University
  • Yizhuo Li, The University of Hong Kong, Shanghai AI Laboratory
  • Kunchang Li, SIAT, Shanghai AI Laboratory
  • Yinan He, Shanghai AI Laboratory
  • Bingkun Huang, Nanjing University, Shanghai AI Laboratory
  • Yifei Huang, University of Tokyo
  • Yi Wang, Shanghai AI Laboratory
  • Yali Wang, SIAT, Shanghai AI Laboratory
  • Tong Lu, Nanjing University
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
  • Yu Qiao, Shanghai AI Laboratory
Validation Report Code
2nd Place HVRL
  • Masashi Hatano, Graduate School of Science and Technology, Keio University
  • Ryo Hachiuma, Graduate School of Science and Technology, Keio University
  • Hideo Saito, Graduate School of Science and Technology, Keio University
Forthcoming Forthcoming
Place Team Validation Report Code
1st Place Video Intern
  • Sen Xing, Shanghai AI Laboratory, Tsinghua University
  • Guo Chen, Nanjing University, Shanghai AI Laboratory
  • Zhe Chen, Nanjing University, Shanghai AI Laboratory
  • Junting Pan, Chinese University of Hong Kong, Shanghai AI Laboratory
  • Yifei Huang, University of Tokyo
  • Yi Wang, Shanghai AI Laboratory
  • Yali Wang, SIAT, Shanghai AI Laboratory
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
  • Yu Qiao, Shanghai AI Laboratory
Validation Report Code

State Change Object Detection

Place Team Validation Report Code
1st Place VideoIntern
  • Guo Chen, Shanghai AI Laboratory, Nanjing University
  • Zhe Chen, Nanjing University, Shanghai AI Laboratory
  • Yi Wang, Shanghai AI Laboratory
  • Wenhai Wang, Shanghai AI Laboratory
  • Yali Wang, SIAT, Shanghai AI Laboratory
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
  • Yu Qiao, Shanghai AI Laboratory
Validation Report Code

Object State Change Classification

PNR Temporal Localization

Place Team Validation Report Code
1st Place Red Panda@IMAGINE
  • Yin-Dong Zheng, Nanjing University
  • Guo Chen, Nanjing University, Shanghai AI Laboratory
  • Jiahao Wang, Nanjing University
  • Tong Lu, Nanjing University
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
Validation Report Code
2nd Place EgoMotion-COMPASS
  • Jianchen Lei, Zhejiang University
  • Shuang Ma, Microsoft
  • Zhongjie Ba, Zhejiang University
  • Kui Ren, Zhejiang University
Validation Report Code
Place Team Validation Report Code
1st Place Red Panda@IMAGINE
  • Yin-Dong Zheng, Nanjing University
  • Guo Chen, Nanjing University, Shanghai AI Laboratory
  • Jiahao Wang, Nanjing University
  • Tong Lu, Nanjing University
  • Limin Wang, Nanjing University, Shanghai AI Laboratory
Validation Report Code
2nd Place EgoMotion-COMPASS
  • Jianchen Lei, Zhejiang University
  • Shuang Ma, Microsoft
  • Zhongjie Ba, Zhejiang University
  • Kui Ren, Zhejiang University
Validation Report Code
3rd Place University of Texas at Austin & Meta AI
  • Zihui Xue, The University of Texas at Austin and Meta AI
  • Yale Song, Meta AI
  • Lorenzo Torresani, Meta AI
  • Kristen Grauman, The University of Texas at Austin and Meta AI
Validation Report Forthcoming

Call for Extended Abstracts

You are invited to submit extended abstracts to the second edition of the International Workshop in Ego4D which will be held alongside ECCV2022 in Tel Aviv.

These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):

Format

The length of the extended abstracts is 2-4 pages, including figures and tables, but not references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The 2nd international Ego4D workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the ECCV submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the Ego4D@ECCV2022 CMT website.

Important Dates

Challenge Deadline 18 September 2022
Challenge Report Deadline 25 September 2022
Extended Abstract Deadline 30 September 2022
Notification to Authors 7 October 2022
Workshop Date 24 October 2022

Invited Speakers

We have several invited talks scheduled for the workshop


Lihi Zelnik-Manor

ECE, Technion

Imagine being able to touch virtual objects, interact physically with computer games, or feel items that are located elsewhere on the globe. The breadth of applications of such haptic technology would be diverse and broad. Interestingly, while excellent visual and auditory feedback devices exist, cutaneous feedback devices are in infancy stages. In this talk I will present a brief introduction to the world of haptic feedback devices and the challenges it poses. Then I will present HUGO, a device designed in a human-centered process, triggering the mechanoreceptors in our skin thus enabling people to experience the touch of digitized surfaces “in-the-wild". This talk is likely to leave us with many open questions that require research to answer.


Michal Irani

Weizmann Institute of Science

Can we reconstruct natural images & videos that a person saw, directly from his/her fMRI brain recordings?  This is a particularly difficult problem, given the very few “paired” training examples available (images/videos with their corresponding fMRI recordings). In this talk I will show how such image/video reconstruction can be performed, despite the few training examples, by exploiting Self-Supervised training on many “unpaired” data – i.e., images & videos without any fMRI recordings. I will further show how large-scale image classification (to more than 1000 classes!) can be performed on sparse fMRI data.


Abhinav Gupta

Carnegie Mellon University


Christoph Feichtenhofer

Meta AI

"This talk discusses recent research for self-supervised learning from video. I first present a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels. A high masking ratio leads to a large speedup, e.g., > 4x in wall-clock time or even more. Then I will present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients, a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. Finally, I will talk about a simple extension of MAE to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. We observe that masked pre-training can outperform supervised pre-training by large margins. We further report encouraging results of training on real-world, uncurated Instagram data. Our studies suggests that the general framework of masked autoencoding (BERT, MAE, etc.) can be a unified methodology for representation learning with minimal domain knowledge. "


João Carreira

Google DeepMind

Schedule

All dates are local to Tel Aviv's time, GMT+3.

Start Time End Time Title Speaker
9:00 AM 9:15 AM Welcome Remarks Giovanni Maria Farinella
9:15 AM 9:45 AM Invited Talk -- Digitizing Touch

Imagine being able to touch virtual objects, interact physically with computer games, or feel items that are located elsewhere on the globe. The breadth of applications of such haptic technology would be diverse and broad. Interestingly, while excellent visual and auditory feedback devices exist, cutaneous feedback devices are in infancy stages. In this talk I will present a brief introduction to the world of haptic feedback devices and the challenges it poses. Then I will present HUGO, a device designed in a human-centered process, triggering the mechanoreceptors in our skin thus enabling people to experience the touch of digitized surfaces “in-the-wild". This talk is likely to leave us with many open questions that require research to answer.

Lihi Zelnik-Manor
9:45 AM 10:00 AM Ego4D Challenge Results: Insights from the Winning Approaches across Five Benchmarks Rohit Girdhar
10:00 AM 10:30 AM Break
10:30 AM 11:00 AM Invited Talk -- Masked Video Representation Learning

This talk discusses recent research for self-supervised learning from video. I first present a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels. A high masking ratio leads to a large speedup, e.g., > 4x in wall-clock time or even more. Then I will present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients, a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. Finally, I will talk about a simple extension of MAE to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. We observe that masked pre-training can outperform supervised pre-training by large margins. We further report encouraging results of training on real-world, uncurated Instagram data. Our studies suggests that the general framework of masked autoencoding (BERT, MAE, etc.) can be a unified methodology for representation learning with minimal domain knowledge.

Christoph Feichtenhofer
11:00 AM 11:15 AM Episodic Memory for Egocentric Perception: Sharing the Leading Approaches to Moments, Visual and Natural Language Queries Satwik Kottur
11:15 AM 11:30 AM Hand + Object Interactions: Examining Leading Approaches to Temporal Localization, Active Object Detection and State-Change Classification Siddhant Bansal
11:30 AM 11:45 AM Forecasting Activities in First-Person Videos: What Works for Action, Object Interaction, and Hand Position Anticipation Antonino Furnari
11:45 AM 12:00 PM Understanding Interaction: Sharing Insights from Social Understanding and AV Diarization in the Context of Egocentric Videos Mike Z. Shou
12:00 PM 12:15 PM Winners felicitation/certificate Andrew Westbury
12:15 PM 12:20 PM Spotlight Talk -- SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition Victor Escorcia
12:20 PM 12:25 PM Spotlight Talk -- UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture Hiroyasu Akada
12:25 PM 12:30 PM Spotlight Talk -- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices Siwei Zhang
12:30 PM 12:35 PM Spotlight Talk -- OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos Victor Escorcia
12:35 PM 12:40 PM Spotlight Talk -- Students taught by multimodal teachers are superior action recognizers (VIRTUAL) Gorjan Radevski
12:40 PM 12:45 PM Spotlight Talk -- Event-based Stereo Depth Estimation from Ego-motion using Ray Density Fusion (VIRTUAL) Suman Ghosh
12:45 PM 12:50 PM Spotlight Talk -- Hand and Object Detection in Egocentric Videos with Color Local Features and Random Forest (VIRTUAL) María Elena Buemi
12:50 PM 12:55 PM Spotlight Talk -- Egocentric Activity Recognition and Localization on a 3D Map James Rehg
1:00 PM 2:00 PM Lunch Break
2:00 PM 2:30 PM Invited talk -- “Mind Reading”: Self-supervised decoding of visual data from brain activity

Can we reconstruct natural images & videos that a person saw, directly from his/her fMRI brain recordings?  This is a particularly difficult problem, given the very few “paired” training examples available (images/videos with their corresponding fMRI recordings). In this talk I will show how such image/video reconstruction can be performed, despite the few training examples, by exploiting Self-Supervised training on many “unpaired” data – i.e., images & videos without any fMRI recordings. I will further show how large-scale image classification (to more than 1000 classes!) can be performed on sparse fMRI data.

Michal Irani
2:30 PM 3:30 PM Project Aria: Devices and Machine Perception Services Supporting Academic Research in Egocentric Perception Prince Gupta
3:30 PM 4:30 PM Break, poster session + invited papers/posters
4:30 PM 5:00 PM Invited Talk João Carreira
5:00 PM 5:30 PM Ego4D Battle Royale: Who Knows the Dataset Best? Devansh Kukreja
5:30 PM 6:00 PM Invited Talk Abhinav Gupta
6:00 PM 6:10 PM Closing Remarks Rohit Girdhar

Instructions for Presentation and Poster session

In-person/Virtual Presentation Information

Poster Preparation Information

FAQs

Does the workshop have any proceedings?

No, we will only accept extended abstracts.

When will the challenges open/close?

The Ego4D challenges are open - please see the challenge documentation here.

Workshop Organisers


Rohit Girdhar

Meta AI


Andrew Westbury

Meta AI


Michael Wray

University of Bristol


Antonino Furnari

University of Catania


Siddhant Bansal

IIIT Hyderabad


Devansh Kukreja

Meta AI


Kristen Grauman

UT Austin


Jitendra Malik

UC Berkeley


Dima Damen

University of Bristol


Giovanni Maria Farinella

University of Catania


James Rehg

Georgia Institute of Technology


David Crandall

Indiana University


Hyun Soo Park

University of Minnesota


C.V. Jawahar

IIIT Hyderabad


Jianbo Shi

University of Pennsylvania


Yoichi Sato

University of Tokyo


Pablo Arbelaez

Universidad de los Andes