Benchmarks Overview

Episodic Memory

The Episodic Memory task aims to make past video queryable and requires localizing where the answer can be seen within the user’s past video.

Hands and Objects

Hands & Objects aims to understand the camera-wearers present activity in terms of interactions with objects.

Forecasting

Forecasting movements and interactions requires comprehending the camera wearer’s intention.

Audio-Visual Diarization

The Audio-Visual Diarization tasks involve localizing and tracking of the participants, detecting each speaker's activity, and transcribing all speech content.

The Social benchmark focuses on multimodal understanding of conversational interactions via attention and speech.

Benchmarks Overview

Episodic Memory​

Hands and Objects​

Forecasting​

Audio-Visual Diarization​

Social Interactions​

Episodic Memory

Hands and Objects

Forecasting

Audio-Visual Diarization

Social Interactions