Does the workshop have any proceedings?
No, we will only accept extended abstracts.
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Intel Labs
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
pyannote
|
Validation Report | Code |
2nd Place |
diart
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
AVATAR-Google
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Thereisnospoon
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
IVUL
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
VideoIntern
|
Validation Report | Code |
2nd Place |
University of Wisconsin-Madison
|
Validation Report | Forthcoming |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
VideoIntern
|
Validation Report | Code |
2nd Place |
University of Wisconsin-Madison
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
PKU-WICT-MIPL
|
Forthcoming | Code |
2nd Place |
KeioEgo
|
Validation Report | Forthcoming |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
University of Texas at Austin & Meta AI
|
Validation Report | Forthcoming |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Autonomous Systems
|
Validation Report | Forthcoming |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
VideoIntern
|
Validation Report | Code |
2nd Place |
HVRL
|
Forthcoming | Forthcoming |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Video Intern
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
VideoIntern
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Red Panda@IMAGINE
|
Validation Report | Code |
2nd Place |
EgoMotion-COMPASS
|
Validation Report | Code |
Place | Team | Validation Report | Code |
---|---|---|---|
1st Place |
Red Panda@IMAGINE
|
Validation Report | Code |
2nd Place |
EgoMotion-COMPASS
|
Validation Report | Code |
3rd Place |
University of Texas at Austin & Meta AI
|
Validation Report | Forthcoming |
You are invited to submit extended abstracts to the second edition of the International Workshop in Ego4D which will be held alongside ECCV2022 in Tel Aviv.
These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):
The length of the extended abstracts is 2-4 pages, including figures and tables, but not references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The 2nd international Ego4D workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the ECCV submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the Ego4D@ECCV2022 CMT website.
Challenge Deadline | 18 September 2022 |
Challenge Report Deadline | 25 September 2022 |
Extended Abstract Deadline | 30 September 2022 |
Notification to Authors | 7 October 2022 |
Workshop Date | 24 October 2022 |
Imagine being able to touch virtual objects, interact physically with computer games, or feel items that are located elsewhere on the globe. The breadth of applications of such haptic technology would be diverse and broad. Interestingly, while excellent visual and auditory feedback devices exist, cutaneous feedback devices are in infancy stages. In this talk I will present a brief introduction to the world of haptic feedback devices and the challenges it poses. Then I will present HUGO, a device designed in a human-centered process, triggering the mechanoreceptors in our skin thus enabling people to experience the touch of digitized surfaces “in-the-wild". This talk is likely to leave us with many open questions that require research to answer.
Can we reconstruct natural images & videos that a person saw, directly from his/her fMRI brain recordings? This is a particularly difficult problem, given the very few “paired” training examples available (images/videos with their corresponding fMRI recordings). In this talk I will show how such image/video reconstruction can be performed, despite the few training examples, by exploiting Self-Supervised training on many “unpaired” data – i.e., images & videos without any fMRI recordings. I will further show how large-scale image classification (to more than 1000 classes!) can be performed on sparse fMRI data.
"This talk discusses recent research for self-supervised learning from video. I first present a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels. A high masking ratio leads to a large speedup, e.g., > 4x in wall-clock time or even more. Then I will present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients, a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. Finally, I will talk about a simple extension of MAE to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. We observe that masked pre-training can outperform supervised pre-training by large margins. We further report encouraging results of training on real-world, uncurated Instagram data. Our studies suggests that the general framework of masked autoencoding (BERT, MAE, etc.) can be a unified methodology for representation learning with minimal domain knowledge. "
All dates are local to Tel Aviv's time, GMT+3.
Start Time | End Time | Title | Speaker |
9:00 AM | 9:15 AM | Welcome Remarks | Giovanni Maria Farinella |
9:15 AM | 9:45 AM | Invited Talk -- Digitizing Touch
Imagine being able to touch virtual objects, interact physically with computer games, or feel items that are located elsewhere on the globe. The breadth of applications of such haptic technology would be diverse and broad. Interestingly, while excellent visual and auditory feedback devices exist, cutaneous feedback devices are in infancy stages. In this talk I will present a brief introduction to the world of haptic feedback devices and the challenges it poses. Then I will present HUGO, a device designed in a human-centered process, triggering the mechanoreceptors in our skin thus enabling people to experience the touch of digitized surfaces “in-the-wild". This talk is likely to leave us with many open questions that require research to answer. |
Lihi Zelnik-Manor |
9:45 AM | 10:00 AM | Ego4D Challenge Results: Insights from the Winning Approaches across Five Benchmarks | Rohit Girdhar |
10:00 AM | 10:30 AM | Break | |
10:30 AM | 11:00 AM | Invited Talk -- Masked Video Representation Learning
This talk discusses recent research for self-supervised learning from video. I first present a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels. A high masking ratio leads to a large speedup, e.g., > 4x in wall-clock time or even more. Then I will present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients, a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. Finally, I will talk about a simple extension of MAE to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. We observe that masked pre-training can outperform supervised pre-training by large margins. We further report encouraging results of training on real-world, uncurated Instagram data. Our studies suggests that the general framework of masked autoencoding (BERT, MAE, etc.) can be a unified methodology for representation learning with minimal domain knowledge. |
Christoph Feichtenhofer |
11:00 AM | 11:15 AM | Episodic Memory for Egocentric Perception: Sharing the Leading Approaches to Moments, Visual and Natural Language Queries | Satwik Kottur |
11:15 AM | 11:30 AM | Hand + Object Interactions: Examining Leading Approaches to Temporal Localization, Active Object Detection and State-Change Classification | Siddhant Bansal |
11:30 AM | 11:45 AM | Forecasting Activities in First-Person Videos: What Works for Action, Object Interaction, and Hand Position Anticipation | Antonino Furnari |
11:45 AM | 12:00 PM | Understanding Interaction: Sharing Insights from Social Understanding and AV Diarization in the Context of Egocentric Videos | Mike Z. Shou |
12:00 PM | 12:15 PM | Winners felicitation/certificate | Andrew Westbury |
12:15 PM | 12:20 PM | Spotlight Talk -- SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition | Victor Escorcia |
12:20 PM | 12:25 PM | Spotlight Talk -- UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture | Hiroyasu Akada |
12:25 PM | 12:30 PM | Spotlight Talk -- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices | Siwei Zhang |
12:30 PM | 12:35 PM | Spotlight Talk -- OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos | Victor Escorcia |
12:35 PM | 12:40 PM | Spotlight Talk -- Students taught by multimodal teachers are superior action recognizers (VIRTUAL) | Gorjan Radevski |
12:40 PM | 12:45 PM | Spotlight Talk -- Event-based Stereo Depth Estimation from Ego-motion using Ray Density Fusion (VIRTUAL) | Suman Ghosh |
12:45 PM | 12:50 PM | Spotlight Talk -- Hand and Object Detection in Egocentric Videos with Color Local Features and Random Forest (VIRTUAL) | María Elena Buemi |
12:50 PM | 12:55 PM | Spotlight Talk -- Egocentric Activity Recognition and Localization on a 3D Map | James Rehg |
1:00 PM | 2:00 PM | Lunch Break | |
2:00 PM | 2:30 PM | Invited talk -- “Mind Reading”: Self-supervised decoding of visual data from brain
activity
Can we reconstruct natural images & videos that a person saw, directly from his/her fMRI brain recordings? This is a particularly difficult problem, given the very few “paired” training examples available (images/videos with their corresponding fMRI recordings). In this talk I will show how such image/video reconstruction can be performed, despite the few training examples, by exploiting Self-Supervised training on many “unpaired” data – i.e., images & videos without any fMRI recordings. I will further show how large-scale image classification (to more than 1000 classes!) can be performed on sparse fMRI data. |
Michal Irani |
2:30 PM | 3:30 PM | Project Aria: Devices and Machine Perception Services Supporting Academic Research in Egocentric Perception | Prince Gupta |
3:30 PM | 4:30 PM | Break, poster session + invited papers/posters | |
4:30 PM | 5:00 PM | Invited Talk | João Carreira |
5:00 PM | 5:30 PM | Ego4D Battle Royale: Who Knows the Dataset Best? | Devansh Kukreja |
5:30 PM | 6:00 PM | Invited Talk | Abhinav Gupta |
6:00 PM | 6:10 PM | Closing Remarks | Rohit Girdhar |
No, we will only accept extended abstracts.
The Ego4D challenges are open - please see the challenge documentation here.