Benchmarks Overview
Episodic Memory
The Episodic Memory task aims to make past video queryable and requires localizing where the answer can be seen within the user’s past video.
Hands and Objects
Hands & Objects aims to understand the camera-wearers present activity in terms of interactions with objects.
Forecasting
Forecasting movements and interactions requires comprehending the camera wearer’s intention.
Audio-Visual Diarization
The Audio-Visual Diarization tasks involve localizing and tracking of the participants, detecting each speaker's activity, and transcribing all speech content.
Social Interactions
The Social benchmark focuses on multimodal understanding of conversational interactions via attention and speech.