Skip to main content

Features

Pre-extracted feature vectors are available for every video in the dataset. They can be accessed with the EGO4D CLI. Please consult the table below for the appropriate --dataset option.

Want to Add a Model?

Refer to the features README on the Ego4D github.

If you need support in running the job to extract features, please open an issue on the github repository.

Description

Here is a table of the features pre-extracted from Ego4D. These features are extracted from the canonical videos. Canonical videos are all 30FPS.

Window Size and Stride are in frames.

Feature TypeDataset(s) Trained OnModel ArchWindow SizeStrideModel Weights Location
slowfast8x8_r101_k400Kinetics 400SlowFast 8x8 (R101 backbone)3216torchub path: facebookresearch/pytorchvideo/slowfast_r101
omnivore_video_swinlKinetics 400 / ImageNet-1KOmnivore (swin L); video head3216https://github.com/facebookresearch/omnivore#model-zoo
omnivore_image_swinlKinetics 400 / ImageNet-1KOmnivore (swin L); image head15https://github.com/facebookresearch/omnivore#model-zoo

Features are extracted in a moving window fashion. At every extraction point the model sees the next Window Size (W) frames (i.e. at frame i the model sees features [i, i + W) frames). The window starts at frame 0, and then is offset by the stride until the end of the video is reached.

There is a boundary condition where the last window may extend past the video. In this case, the extraction point is backed up such that a window with W frames from the video is used. This occurs when the number of frames in the canonical video is not divisible by the stride.

Example Window Stride

Let's say a video has 39 frames. The frames for extraction will be (in frame numbers):

  • [0, 31]
  • [7, 38] which is “back-padded” from [16, 47] to fit the last window

Implementation