Forum/Visualization Release + v1.1 Update Pending
Let's first walk through downloading the core dataset, what options are available and how we suggest you start with the data.
The rough flow for a typical researcher will be:
- Review and accept the terms of our license agreement. (It takes 48 to receive credentials - do this first.)
- Browse the dataset
- Download the CLI.
- Select your subset of interest
- Download The Data
EGO4D License Agreement
Obtaining the dataset or any annotations requires you first review our license agreement and accept the terms. Go here (ego4ddataset.com) to review and execute this agreement, and you will be emailed a set of AWS access credentials when your license agreement is approved, which will take ~48hrs. In the meantime, you can check out data overview & sample notebooks here to get familiar with the dataset, and can download the CLI & dataloaders to get setup in advance.
Note that licenses have the option to execute our license agreements as either an individual or on behalf of your institution. You will likely sign the license as an individual. Typically, only institutional signatories at a director or executive level can agree to license terms on behalf of an entire organization.
Also note that once approved your access credentials will expire in 14 days - you're expected to download the data locally, not to consume it from AWS.
Browse The Dataset
Use the visualization tool to browse the dataset.
Download The CLI
Instal via pip (conda support coming):
pip install ego4d
Alternatively, or in addition for utils/examples/etc, download the CLI: EGO4D CLI
Select Your Subset Of Interest
For most purposes, you'll want to select the baseline, benchmark, scenario, or data type of interest first, and then specify the subset accordingly rather than downloading the full (> 5TB) dataset. Please refer to the EGO4D CLI documentation for the dataset commands, the approximate sizing below, and the benchmark and annotation documentation for details of what's available.
Download The Data
Run the CLI to download the dataset (likely to a network share if you're download the full scale videos) via python/pip:
ego4d --output_directory="~/ego4d_data" --datasets full_scale annotations
Alternatively, if you downloaded the repo:
python -m ego4d.cli.cli --output_directory="~/ego4d_data" --datasets full_scale annotations
This would download the full 5 TB videos + annotations. Use
--datasets viz and follow the visualization setup to explore the dataset before selecting which subset you're interested in.
Note this is a video dataset and as such the downloads are large! Approximate download size:
|Full Primary Dataset||~7.1 TB|
|Entire Dataset||30+ TB|
|Full Scale Videos (canonical videos)||~ 7 TB|
|Annotations||~ 2 GB|
|Benchmark Clips||~ 1 TB|
|Visualization Data||~ 500 MB|
|Video Components||~ 20 TB|
|Features||~ 220 GB|
|Narrations Only||~ 350 MB|
|Hands & Objects Only||~|
|Episodic Memory Only||~|
Average broadband speeds are in the ~100 Mbps ballpark. For 7 TB, you're looking at ~6-7 days to download.