aria_nbv Package (formerly oracle_rri)

Note: This page is superseded by aria_nbv_overview.qmd for the consolidated package map. Keep this one for finer module notes.

1 Name & Scope

  • aria_nbv is the in-repo package that powers our NBV research stack. The on-disk module is still imported as oracle_rri; the docs adopt the new name ahead of the eventual code rename.
  • Purpose: typed, torch-friendly ingestion of ASE/ATEK snippets, candidate-view generation, rendering, fusion, and RRI computation utilities.

2 Module Stack (current code layout)

  • configs/, config.py, utils/: config-as-factory base classes (BaseConfig, SingletonConfig), structured logging (Console).
  • data/: typed snippet views (EfmSnippetView, cameras, trajectories, semidense points, OBBs).
  • data_handling/: ATEK WebDataset loader, metadata resolver, downloader (CLI-capable).
  • pose_generation/: Monte-Carlo candidate sampling with collision and free-space pruning.
  • rendering/: PyTorch3D- and trimesh-based depth rendering for candidates + typed batch wrapper.
  • views/ & viz/: candidate point-cloud rendering helpers and mesh/trajectory viz.
  • analysis/: depth debugger utilities.

3 Core Modules & Classes

3.1 Config & Logging

  • utils.base_config.BaseConfig: Pydantic-powered factory base with TOML IO and inspect() rendering (Rich).
  • configs.path_config.PathConfig: Singleton paths to data roots, ATEK/ASE URL manifests, mesh directories.
  • utils.console.Console: Rich wrapper with prefixes and optional shared PL logger hook.

3.2 Data Handling (data_handling/)

3.2.1 Dataset loader (dataset.py)

  • External dependencies used:
    • atek.data_loaders.load_atek_wds_dataset, select_and_remap_dict_keys for WebDataset ingest.
    • efm3d.aria.PoseTW, CameraTW for typed geometry conversion (to_camera_tw).
    • trimesh for GT mesh loading/simplification.
  • Batch handling: _explode_batched_dict splits WebDataset B-dim into per-sample dicts; ase_collate returns lists plus EFM-ready dicts.
  • Mesh pairing: scene_to_mesh mapping + optional decimation + caching; tolerant when meshes absent unless require_mesh=True.
  • Key typing: CameraStream, Trajectory, SemiDensePoints mirror ATEK dataclasses; preserve Aria frame (x-left, y-up, z-forward), T_A_B convention.
  • EFM remap: ASESample.to_efm_dict() applies EfmModelAdaptor.get_dict_key_mapping_all() and can include gt_mesh.

3.2.2 Metadata & Download

  • metadata.ASEMetadata: parses ase_mesh_download_urls.json and AriaSyntheticEnvironment_ATEK_download_urls.jsonSceneInfo(scene_id, mesh_url, mesh_sha, snippet_ids).
  • downloader.ASEDownloaderConfig/ASEDownloader: orchestrates mesh + WDS downloads (uses ATEK download_atek_wds_sequences, requests for meshes, SHA1 verification, unzip). CLI-friendly via pydantic_settings.

3.3 Analysis

  • analysis/depth_debugger.py: utilities to inspect depth/point clouds; useful for validating candidate sampling and fusion (see TODOs).

3.4 Visualisation

  • viz/mesh_viz.py: rendering helpers (Trimesh + matplotlib) for meshes and point clouds; wire into Streamlit dashboard per TODOs.

4 NBV Pipeline at a Glance

Code
flowchart LR
    A[ASE/ATEK shard\n+ GT mesh] --> B(data_handling.dataset\nEfmSnippetView)
    B --> C(pose_generation.\nCandidateViewGenerator)
    C -->|PoseTW batch| D(rendering.\nCandidateDepthRenderer)
    D -->|Depth maps| E[Fuse depth→PC\n(EFM3D pointcloud utils)]
    E --> F[Metrics: RRI / Chamfer]
    F --> G[Policy: pick NBV]
    G -->|update pose| C

flowchart LR
    A[ASE/ATEK shard\n+ GT mesh] --> B(data_handling.dataset\nEfmSnippetView)
    B --> C(pose_generation.\nCandidateViewGenerator)
    C -->|PoseTW batch| D(rendering.\nCandidateDepthRenderer)
    D -->|Depth maps| E[Fuse depth→PC\n(EFM3D pointcloud utils)]
    E --> F[Metrics: RRI / Chamfer]
    F --> G[Policy: pick NBV]
    G -->|update pose| C

5 RRI Definition (used throughout docs/code)

  • Let (P_t) be the current reconstruction point cloud, (P_q) the point cloud from a candidate view, and (M) the ground-truth mesh surface (sampled to points (M_s)).

  • Bidirectional Chamfer distance: \[ \mathrm{CD}(P, M) = \frac{1}{|P|}\sum_{p\in P}\min_{m\in M}\|p-m\|_2^2 + \frac{1}{|M_s|}\sum_{m\in M_s}\min_{p\in P}\|m-p\|_2^2. \]

  • Relative Reconstruction Improvement (higher is better): \[ \mathrm{RRI}(P_t, P_q, M) = \frac{\mathrm{CD}(P_t, M) - \mathrm{CD}(P_t \cup P_q, M)}{\mathrm{CD}(P_t, M) + \varepsilon}, \] with a small () for stability. Positive values indicate improvement after fusing the candidate view.

6 External Library Hooks (per component)

  • Trimesh: mesh IO, surface sampling, ray–mesh intersections; optional quadric decimation when loading GT meshes.
  • EFM3D:
    • PoseTW, CameraTW: camera/pose typing + operations.
    • utils.ray (ray_grid, transform_rays, ray_obb_intersection) for candidate depth rendering.
    • utils.pointcloud (dist_im_to_point_cloud_im, pointcloud_to_occupancy_snippet) for depth → PC and occupancy.
  • ATEK:
    • load_atek_wds_dataset, process_wds_sample for streaming shards.
    • Key mapping (EfmModelAdaptor.get_dict_key_mapping_all) when exporting to EFM schema.
    • Download helpers (atek_data_store_download.download_atek_wds_sequences) via ASEDownloader.

7 Theoretical context (Wikipedia quick refs)

  • Active perception/vision: NBV is an instance of active perception—moving the sensor to harvest more informative observations; active vision systems explicitly reorient cameras to reduce occlusion and improve depth estimates. Wikipedia :: Active perception
  • Point clouds as state: Our reconstruction state (P_t) is a point cloud—an unordered set of 3D samples that approximates scene geometry. Wikipedia :: Point cloud
  • Chamfer vs. Hausdorff: The Chamfer distance we use for RRI is a smooth, averaged variant of the symmetric Hausdorff distance between two point sets, trading strict worst-case guarantees for robustness to outliers. Wikipedia :: Chamfer distance; Wikipedia :: Hausdorff distance
Code
sequenceDiagram
    participant Agent
    participant CandidateGen as CandidateGen
    participant Renderer
    participant Metric
    Agent->>CandidateGen: last pose, mesh, extent
    CandidateGen-->>Agent: PoseTW candidates + masks
    Agent->>Renderer: valid poses + mesh + camera
    Renderer-->>Metric: depth maps → point clouds
    Metric-->>Agent: RRI scores (Chamfer/occupancy)
    Agent->>Agent: select next-best view (argmax)

sequenceDiagram
    participant Agent
    participant CandidateGen as CandidateGen
    participant Renderer
    participant Metric
    Agent->>CandidateGen: last pose, mesh, extent
    CandidateGen-->>Agent: PoseTW candidates + masks
    Agent->>Renderer: valid poses + mesh + camera
    Renderer-->>Metric: depth maps → point clouds
    Metric-->>Agent: RRI scores (Chamfer/occupancy)
    Agent->>Agent: select next-best view (argmax)

8 Suggested Mermaid for planned processing layer

Code
classDiagram
    class CandidateViewGenerator{
      +generate(last_pose, mesh, extent)->CandidateSamplingResult
    }
    class CandidateDepthRenderer{
      +render(sample, candidates)->CandidateDepthBatch
    }
    class PointCloudFusion{
      +merge(P_t, P_q)->P_fused
    }
    class OracleRRI{
      +score(P_t, P_q, mesh)->float
    }
    CandidateViewGenerator --> CandidateDepthRenderer : PoseTW batch
    CandidateDepthRenderer --> PointCloudFusion : depth->PC
    PointCloudFusion --> OracleRRI : fused PC

classDiagram
    class CandidateViewGenerator{
      +generate(last_pose, mesh, extent)->CandidateSamplingResult
    }
    class CandidateDepthRenderer{
      +render(sample, candidates)->CandidateDepthBatch
    }
    class PointCloudFusion{
      +merge(P_t, P_q)->P_fused
    }
    class OracleRRI{
      +score(P_t, P_q, mesh)->float
    }
    CandidateViewGenerator --> CandidateDepthRenderer : PoseTW batch
    CandidateDepthRenderer --> PointCloudFusion : depth->PC
    PointCloudFusion --> OracleRRI : fused PC

Use this layout to guide module additions under aria_nbv/ (import path oracle_rri for now) while keeping current data-handling and config patterns.