Skip to main content

Start Here

Forum/Visualization Release + v1.1 Update Pending

Note that the Ego4D forum and visualization tool are now available. Also note that a dataset update is pending addressing a number of small annotation issues. Full details will be provided here and at the forum after the current challenge round locks on June 1.

Let's first walk through downloading the core dataset, what options are available and how we suggest you start with the data.

The rough flow for a typical researcher will be:

  1. Review and accept the terms of our license agreement. (It takes 48 to receive credentials - do this first.)
  2. Browse the dataset
  3. Download the CLI.
  4. Select your subset of interest
  5. Download The Data

EGO4D License Agreement

Obtaining the dataset or any annotations requires you first review our license agreement and accept the terms. Go here (ego4ddataset.com) to review and execute this agreement, and you will be emailed a set of AWS access credentials when your license agreement is approved, which will take ~48hrs. In the meantime, you can check out data overview & sample notebooks here to get familiar with the dataset, and can download the CLI & dataloaders to get setup in advance.

Note that licenses have the option to execute our license agreements as either an individual or on behalf of your institution. You will likely sign the license as an individual. Typically, only institutional signatories at a director or executive level can agree to license terms on behalf of an entire organization.

Also note that once approved your access credentials will expire in 14 days - you're expected to download the data locally, not to consume it from AWS.

Browse The Dataset

Refer to the benchmark and annotation documentation for details of what's available.

Use the visualization tool to browse the dataset.

Download The CLI

Download the CLI from EGO4D CLI

Select Your Subset Of Interest

For most purposes, you'll want to select the baseline, benchmark, scenario, or data type of interest first, and then specify the subset accordingly rather than downloading the full (> 5TB) dataset. Please refer to the EGO4D CLI documentation for the dataset commands, the approximate sizing below, and the benchmark and annotation documentation for details of what's available.

Download The Data

Run the CLI to download the dataset (likely to a network share if you're download the full scale videos):

python -m ego4d.cli.cli --output_directory="~/ego4d_data" --datasets full_scale annotations
Visualization First

This would download the full 5 TB videos + annotations. Use --datasets viz and follow the visualization setup to explore the dataset before selecting which subset you're interested in.

Note this is a video dataset and as such the downloads are large! Approximate download size:

DatasetSize
Full Primary Dataset~7.1 TB
Entire Dataset30+ TB
Data TypesSize
Full Scale Videos (canonical videos)~ 7 TB
Annotations~ 2 GB
Benchmark Clips~ 1 TB
Visualization Data~ 500 MB
Video Components~ 20 TB
Features~ 220 GB
Benchmark SubsetSize
Narrations Only~ 350 MB
Forecasting Only~
Hands & Objects Only~
Episodic Memory Only~
AV/Social Only~
Download Time

Average broadband speeds are in the ~100 Mbps ballpark. For 7 TB, you're looking at ~6-7 days to download.