Resources & Tools

1 Resources & Tools

This section provides an overview of the libraries, tools, and documentation used in this project - all of which stem from the Project Aria ecosystem.


1.1 Papers

1.2 Project Aria

1.2.1 Tools and Libraries

1.3 Aria Training and Evaluation Toolkit (ATEK)

The primary toolkit for ML training and evaluation on Aria datasets. ATEK provides a complete pipeline from raw VRS data to PyTorch-ready datasets, standardized evaluation metrics, and pre-trained model support.

Follow the EFM3D Installation Instructions, which already includes ATEK as a dependency.

1.3.1 Repository Structure

external/ATEK
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── atek
│   ├── __init__.py
│   ├── __pycache__
│   ├── configs
│   │   └── __init__.py
│   ├── data_download
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   └── atek_data_store_download.py
│   ├── data_loaders
│   │   ├── __init__.py
│   │   ├── atek_raw_dataloader_as_cubercnn.py
│   │   ├── atek_wds_dataloader.py
│   │   ├── cubercnn_model_adaptor.py
│   │   ├── sam2_model_adaptor.py
│   │   └── test
│   │       ├── __init__.py
│   │       └── atek_wds_dataloader_test.py
│   ├── data_preprocess
│   │   ├── __init__.py
│   │   ├── atek_data_sample.py
│   │   ├── atek_wds_writer.py
│   │   ├── genera_atek_preprocessor_factory.py
│   │   ├── general_atek_preprocessor.py
│   │   ├── processors
│   │   │   ├── __init__.py
│   │   │   ├── aria_camera_processor.py
│   │   │   ├── depth_image_processor.py
│   │   │   ├── efm_gt_processor.py
│   │   │   ├── mps_online_calib_processor.py
│   │   │   ├── mps_semidense_processor.py
│   │   │   ├── mps_traj_processor.py
│   │   │   ├── obb2_gt_processor.py
│   │   │   └── obb3_gt_processor.py
│   │   ├── sample_builders
│   │   │   ├── __init__.py
│   │   │   ├── atek_data_paths_provider.py
│   │   │   ├── efm_sample_builder.py
│   │   │   └── obb_sample_builder.py
│   │   ├── subsampling_lib
│   │   │   ├── __init__.py
│   │   │   └── temporal_subsampler.py
│   │   ├── test
│   │   │   ├── __init__.py
│   │   │   ├── aria_camera_processor_test.py
│   │   │   ├── atek_data_sample_test.py
│   │   │   ├── depth_image_processor_test.py
│   │   │   ├── file_io_utils_test.py
│   │   │   ├── mps_processor_test.py
│   │   │   ├── obb2_gt_processor_test.py
│   │   │   ├── obb3_gt_processor_test.py
│   │   │   ├── obb_sample_builder_test.py
│   │   │   └── test_data
│   │   └── util
│   │       └── __init__.py
│   ├── evaluation
│   │   ├── __init__.py
│   │   ├── static_object_detection
│   │   │   ├── __init__.py
│   │   │   ├── eval_obb3.py
│   │   │   ├── eval_obb3_metrics_utils.py
│   │   │   ├── obb3_csv_io.py
│   │   │   └── static_object_detection_metrics.py
│   │   └── surface_reconstruction
│   │       ├── __init__.py
│   │       ├── surface_reconstruction_metrics.py
│   │       └── surface_reconstruction_utils.py
│   ├── util
│   │   ├── __init__.py
│   │   ├── atek_constants.py
│   │   ├── camera_calib_utils.py
│   │   ├── file_io_utils.py
│   │   ├── tensor_utils.py
│   │   └── viz_utils.py
│   └── viz
│       ├── __init__.py
│       ├── atek_visualizer.py
│       └── cubercnn_visualizer.py
├── atek.egg-info
├── data
│   └── atek_data_store_confs
├── docs
│   ├── ATEK_Data_Store.md
│   ├── Install.md
│   ├── ML_task_object_detection.md
│   ├── ML_task_surface_recon.md
│   ├── ModelAdaptors.md
│   ├── data_loading_and_inference.md
│   ├── evaluation.md
│   ├── example_cubercnn_customization.md
│   ├── example_demos.md
│   ├── example_sam2_customization.md
│   ├── example_training.md
│   ├── images
│   ├── preprocessing.md
│   └── preprocessing_configurations.md
├── envs
├── examples
│   └── data
├── readme.md
├── setup.py
├── setup_for_pywheel.py
└── tools
    ├── ase_mesh_downloader.py
    ├── atek_wds_data_downloader.py
    ├── benchmarking_static_object_detection.py
    ├── benchmarking_surface_reconstruction.py
    ├── infer_cubercnn.py
    └── train_cubercnn.py

Quick Start - Download Pre-processed ASE:

# 1. Get download URLs from https://www.projectaria.com/datasets/ase/
#    Click "Access The Dataset" → Download JSON file

# cd into NBV repo root
# 2. Download data using ATEK downloader
python3 external/ATEK/tools/atek_wds_data_downloader.py \
  --config-name efm \
  --input-json-path .data/aria_download_urls/AriaSyntheticEnvironment_ATEK_download_urls.json \
  --output-folder-path .data/ase_atek \
  --max-num-sequences 2 \
  --download-wds-to-local

Quick Start - Load ASE Data in PyTorch:

# 3. Load data in PyTorch
from atek.data_loaders import create_native_atek_dataloader
from atek.util.file_io_utils import load_yaml_and_extract_tar_list

urls = load_yaml_and_extract_tar_list("./data/ase_wds/local_train_tars.yaml")
dataloader = create_native_atek_dataloader(
    urls=urls,
    batch_size=4,
    num_workers=4,
    shuffle_flag=True
)

for batch in dataloader:
    # batch contains: images, camera_calibs, trajectory, 3D bbox annotations, etc.
    pass

1.4 EFM3D

1.4.1 Repository Structure

external/efm3d
├── INSTALL.md
├── README.md
├── assets
├── benchmark.md
├── ckpt
│   └── README.md
├── data
│   ├── README.md
│   ├── dataverse_url_parser.py
│   └── download_ase_mesh.py
├── efm3d
│   ├── __init__.py
│   ├── aria
│   │   ├── __init__.py
│   │   ├── aria_constants.py
│   │   ├── camera.py
│   │   ├── obb.py
│   │   ├── pose.py
│   │   ├── projection_utils.py
│   │   └── tensor_wrapper.py
│   ├── config
│   │   └── taxonomy
│   ├── dataset
│   │   ├── atek_vrs_dataset.py
│   │   ├── atek_wds_dataset.py
│   │   ├── augmentation.py
│   │   ├── efm_model_adaptor.py
│   │   ├── vrs_dataset.py
│   │   └── wds_dataset.py
│   ├── inference
│   │   ├── __init__.py
│   │   ├── eval.py
│   │   ├── fuse.py
│   │   ├── model.py
│   │   ├── pipeline.py
│   │   ├── track.py
│   │   └── viz.py
│   ├── model
│   │   ├── __init__.py
│   │   ├── cnn.py
│   │   ├── dinov2_utils.py
│   │   ├── dpt.py
│   │   ├── evl.py
│   │   ├── evl_train.py
│   │   ├── image_tokenizer.py
│   │   ├── lifter.py
│   │   └── video_backbone.py
│   ├── thirdparty
│   │   ├── __init__.py
│   │   └── mmdetection3d
│   │       ├── __init__.py
│   │       ├── cuda
│   │       │   └── setup.py
│   │       └── iou3d.py
│   └── utils
│       ├── __init__.py
│       ├── common.py
│       ├── depth.py
│       ├── detection_utils.py
│       ├── evl_loss.py
│       ├── file_utils.py
│       ├── gravity.py
│       ├── image.py
│       ├── image_sampling.py
│       ├── marching_cubes.py
│       ├── mesh_utils.py
│       ├── obb_csv_writer.py
│       ├── obb_io.py
│       ├── obb_matchers.py
│       ├── obb_metrics.py
│       ├── obb_trackers.py
│       ├── obb_utils.py
│       ├── pointcloud.py
│       ├── ray.py
│       ├── reconstruction.py
│       ├── render.py
│       ├── rescale.py
│       ├── viz.py
│       ├── voxel.py
│       └── voxel_sampling.py
├── eval.py
├── infer.py
└── train.py

1.5 Project Aria Tools

Project Aria Tools provides Python and C++ APIs for working with raw Aria data (VRS format) and MPS outputs. Use this for data exploration and custom preprocessing; use ATEK for ML training.

Key Features:

  • VRS data provider for accessing sensor streams
  • Device calibration and projection utilities
  • MPS (Machine Perception Services) data loaders
  • Visualization tools (Rerun-based)
  • ASE, ADT, and AEA dataset utilities

1.6 SceneScript

SceneScript is the structured language model for scene reconstruction used to train on ASE.

Repository Structure:

1.6.1 Repository Structure

external/scenescript/
├── src/
│   ├── data/
│   │   ├── geometries/
│   │   │   ├── base_entity.py    # Base class for scene primitives
│   │   │   ├── wall.py           # Wall representation
│   │   │   ├── door.py           # Door representation
│   │   │   ├── window.py         # Window representation
│   │   │   └── bbox.py           # Bounding box utilities
│   │   ├── language_sequence.py  # SSL tokenization
│   │   └── point_cloud.py        # Point cloud utilities
│   └── networks/
│       ├── encoder.py            # Sparse 3D ResNet encoder
│       ├── decoder.py            # Autoregressive transformer
│       └── scenescript_model.py  # Full model pipeline
├── inference.ipynb               # Demo notebook
└── environment.yaml              # Conda environment

Key Classes:

  • BaseEntity: Abstract base for scene primitives
  • Wall, Door, Window: Geometric entities with SSL serialization
  • LanguageSequence: SSL tokenizer and parser
  • SceneScriptModel: End-to-end encoder-decoder

References

[1]
N. Frahm et al., “VIN-NBV: A view introspection network for next-best-view selection.” 2025. Available: https://arxiv.org/abs/2505.06219
[2]
X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction.” 2024. Available: https://arxiv.org/abs/2402.16174
[3]
J. Straub, D. DeTone, T. Shen, N. Yang, C. Sweeney, and R. Newcombe, “EFM3D: A benchmark for measuring progress towards 3D egocentric foundation models.” 2024. Available: https://arxiv.org/abs/2406.10224
[4]
A. Avetisyan et al., “SceneScript: Reconstructing scenes with an autoregressive structured language model.” 2024. Available: https://arxiv.org/abs/2403.13064
[5]
J. Engel et al., “Project aria: A new tool for egocentric multi-modal AI research.” 2023. Available: https://arxiv.org/abs/2308.13561
[6]
C. Xie et al., “Human-in-the-loop local corrections of 3D scene layouts via infilling.” 2025. Available: https://arxiv.org/abs/2503.11806