The dataset visualization tool is publicly available at: https://visualize.ego4d-data.org. You will need an approved license for access.
One of the first things you'll likely want to do is filter by benchmark.
Filter queries use a simple syntax of property operator value expressions chained together with ANDs, ORs, and ()s. Anything with a space or () must be surrounded by double quotes.
- benchmarks include moments
- video_uid == a37f501d-5cc1-4cc2-8ac2-1ec4e66a86d2
- benchmarks include fho_hands AND modalities include imu
- duration > 5000
- moments.activities include "cut_open_a_package_(e.g._with_scissors)"
Autocomplete helps you type these up. Once your query is entered, click anywhere outside the autocomplete dropdown to end it (instead of hitting enter).
Once you click into a video, you'll see it with all its annotations. Many annotations are interactive.
- Video Frames
- Video Times
- Response Tracks
- Visual Crops
Any annotation that shows an underline on hover is clickable and will usually take you to that moment in the video.
You'll also notice many annotations have custom labels based on their context, e.g. scod object state changes show their pre/pnr/post times before expansion.
Each benchmark has modules to visualize its data types.
Time SegmentsThese show interactive start/end segments. Click a block to jump to the start, shift+click it to jump to the end. The black line indicates the streaming video's timestamp. A greedy algorithm assigns segments to tracks to show all segments without overlaps on a track and a minimal number of tracks. Colors are the same per label, but can be reused across multiple labels.
Labeled TimestampsSome annotations like narrations are labeled times across the video. This module shows any timestamp labels near the current video time.
Bounding BoxesBenchmarks with bounding boxes, like visual queries, are visualized directly on the video. Bounding boxes that track an object across frames (e.g. response tracks) are interpolated across them. The FHO hands benchmark uses points, not areas, to represent hands, so these are shown with fixed-size circles instead of bounding boxes.