aria_nbv Package (formerly oracle_rri)

Note: This page is superseded by aria_nbv_overview.qmd for the consolidated package map. Keep this one for finer module notes.

1 Name & Scope

aria_nbv is the in-repo package that powers our NBV research stack. The on-disk module is still imported as oracle_rri; the docs adopt the new name ahead of the eventual code rename.
Purpose: typed, torch-friendly ingestion of ASE/ATEK snippets, candidate-view generation, rendering, fusion, and RRI computation utilities.

2 Module Stack (current code layout)

configs/, config.py, utils/: config-as-factory base classes (BaseConfig, SingletonConfig), structured logging (Console).
data/: typed snippet views (EfmSnippetView, cameras, trajectories, semidense points, OBBs).
data_handling/: ATEK WebDataset loader, metadata resolver, downloader (CLI-capable).
pose_generation/: Monte-Carlo candidate sampling with collision and free-space pruning.
rendering/: PyTorch3D- and trimesh-based depth rendering for candidates + typed batch wrapper.
views/ & viz/: candidate point-cloud rendering helpers and mesh/trajectory viz.
analysis/: depth debugger utilities.

3 Core Modules & Classes

3.1 Config & Logging

utils.base_config.BaseConfig: Pydantic-powered factory base with TOML IO and inspect() rendering (Rich).
configs.path_config.PathConfig: Singleton paths to data roots, ATEK/ASE URL manifests, mesh directories.
utils.console.Console: Rich wrapper with prefixes and optional shared PL logger hook.

3.2 Data Handling (`data_handling/`)

3.2.1 Dataset loader (`dataset.py`)

External dependencies used:
- atek.data_loaders.load_atek_wds_dataset, select_and_remap_dict_keys for WebDataset ingest.
- efm3d.aria.PoseTW, CameraTW for typed geometry conversion (to_camera_tw).
- trimesh for GT mesh loading/simplification.
Batch handling: _explode_batched_dict splits WebDataset B-dim into per-sample dicts; ase_collate returns lists plus EFM-ready dicts.
Mesh pairing: scene_to_mesh mapping + optional decimation + caching; tolerant when meshes absent unless require_mesh=True.
Key typing: CameraStream, Trajectory, SemiDensePoints mirror ATEK dataclasses; preserve Aria frame (x-left, y-up, z-forward), T_A_B convention.
EFM remap: ASESample.to_efm_dict() applies EfmModelAdaptor.get_dict_key_mapping_all() and can include gt_mesh.

3.2.2 Metadata & Download

metadata.ASEMetadata: parses ase_mesh_download_urls.json and AriaSyntheticEnvironment_ATEK_download_urls.json → SceneInfo(scene_id, mesh_url, mesh_sha, snippet_ids).
downloader.ASEDownloaderConfig/ASEDownloader: orchestrates mesh + WDS downloads (uses ATEK download_atek_wds_sequences, requests for meshes, SHA1 verification, unzip). CLI-friendly via pydantic_settings.

3.3 Analysis

analysis/depth_debugger.py: utilities to inspect depth/point clouds; useful for validating candidate sampling and fusion (see TODOs).

3.4 Visualisation

viz/mesh_viz.py: rendering helpers (Trimesh + matplotlib) for meshes and point clouds; wire into Streamlit dashboard per TODOs.

4 NBV Pipeline at a Glance

Code

flowchart LR
    A[ASE/ATEK shard\n+ GT mesh] --> B(data_handling.dataset\nEfmSnippetView)
    B --> C(pose_generation.\nCandidateViewGenerator)
    C -->|PoseTW batch| D(rendering.\nCandidateDepthRenderer)
    D -->|Depth maps| E[Fuse depth→PC\n(EFM3D pointcloud utils)]
    E --> F[Metrics: RRI / Chamfer]
    F --> G[Policy: pick NBV]
    G -->|update pose| C

flowchart LR
    A[ASE/ATEK shard\n+ GT mesh] --> B(data_handling.dataset\nEfmSnippetView)
    B --> C(pose_generation.\nCandidateViewGenerator)
    C -->|PoseTW batch| D(rendering.\nCandidateDepthRenderer)
    D -->|Depth maps| E[Fuse depth→PC\n(EFM3D pointcloud utils)]
    E --> F[Metrics: RRI / Chamfer]
    F --> G[Policy: pick NBV]
    G -->|update pose| C

5 RRI Definition (used throughout docs/code)

Let (P_t) be the current reconstruction point cloud, (P_q) the point cloud from a candidate view, and (M) the ground-truth mesh surface (sampled to points (M_s)).
Bidirectional Chamfer distance: \[ \mathrm{CD}(P, M) = \frac{1}{|P|}\sum_{p\in P}\min_{m\in M}\|p-m\|_2^2 + \frac{1}{|M_s|}\sum_{m\in M_s}\min_{p\in P}\|m-p\|_2^2. \]
Relative Reconstruction Improvement (higher is better): \[ \mathrm{RRI}(P_t, P_q, M) = \frac{\mathrm{CD}(P_t, M) - \mathrm{CD}(P_t \cup P_q, M)}{\mathrm{CD}(P_t, M) + \varepsilon}, \] with a small () for stability. Positive values indicate improvement after fusing the candidate view.

6 External Library Hooks (per component)

Trimesh: mesh IO, surface sampling, ray–mesh intersections; optional quadric decimation when loading GT meshes.
EFM3D:
- PoseTW, CameraTW: camera/pose typing + operations.
- utils.ray (ray_grid, transform_rays, ray_obb_intersection) for candidate depth rendering.
- utils.pointcloud (dist_im_to_point_cloud_im, pointcloud_to_occupancy_snippet) for depth → PC and occupancy.
ATEK:
- load_atek_wds_dataset, process_wds_sample for streaming shards.
- Key mapping (EfmModelAdaptor.get_dict_key_mapping_all) when exporting to EFM schema.
- Download helpers (atek_data_store_download.download_atek_wds_sequences) via ASEDownloader.

7 Theoretical context (Wikipedia quick refs)

Active perception/vision: NBV is an instance of active perception—moving the sensor to harvest more informative observations; active vision systems explicitly reorient cameras to reduce occlusion and improve depth estimates. Wikipedia :: Active perception
Point clouds as state: Our reconstruction state (P_t) is a point cloud—an unordered set of 3D samples that approximates scene geometry. Wikipedia :: Point cloud
Chamfer vs. Hausdorff: The Chamfer distance we use for RRI is a smooth, averaged variant of the symmetric Hausdorff distance between two point sets, trading strict worst-case guarantees for robustness to outliers. Wikipedia :: Chamfer distance; Wikipedia :: Hausdorff distance

Code

sequenceDiagram
    participant Agent
    participant CandidateGen as CandidateGen
    participant Renderer
    participant Metric
    Agent->>CandidateGen: last pose, mesh, extent
    CandidateGen-->>Agent: PoseTW candidates + masks
    Agent->>Renderer: valid poses + mesh + camera
    Renderer-->>Metric: depth maps → point clouds
    Metric-->>Agent: RRI scores (Chamfer/occupancy)
    Agent->>Agent: select next-best view (argmax)

sequenceDiagram
    participant Agent
    participant CandidateGen as CandidateGen
    participant Renderer
    participant Metric
    Agent->>CandidateGen: last pose, mesh, extent
    CandidateGen-->>Agent: PoseTW candidates + masks
    Agent->>Renderer: valid poses + mesh + camera
    Renderer-->>Metric: depth maps → point clouds
    Metric-->>Agent: RRI scores (Chamfer/occupancy)
    Agent->>Agent: select next-best view (argmax)

8 Suggested Mermaid for planned processing layer

Code

classDiagram
    class CandidateViewGenerator{
      +generate(last_pose, mesh, extent)->CandidateSamplingResult
    }
    class CandidateDepthRenderer{
      +render(sample, candidates)->CandidateDepthBatch
    }
    class PointCloudFusion{
      +merge(P_t, P_q)->P_fused
    }
    class OracleRRI{
      +score(P_t, P_q, mesh)->float
    }
    CandidateViewGenerator --> CandidateDepthRenderer : PoseTW batch
    CandidateDepthRenderer --> PointCloudFusion : depth->PC
    PointCloudFusion --> OracleRRI : fused PC

classDiagram
    class CandidateViewGenerator{
      +generate(last_pose, mesh, extent)->CandidateSamplingResult
    }
    class CandidateDepthRenderer{
      +render(sample, candidates)->CandidateDepthBatch
    }
    class PointCloudFusion{
      +merge(P_t, P_q)->P_fused
    }
    class OracleRRI{
      +score(P_t, P_q, mesh)->float
    }
    CandidateViewGenerator --> CandidateDepthRenderer : PoseTW batch
    CandidateDepthRenderer --> PointCloudFusion : depth->PC
    PointCloudFusion --> OracleRRI : fused PC

Use this layout to guide module additions under aria_nbv/ (import path oracle_rri for now) while keeping current data-handling and config patterns.

--- title: aria_nbv Package (formerly oracle_rri) --- > **Note**: This page is superseded by `aria_nbv_overview.qmd` for the consolidated package map. Keep this one for finer module notes. # Name & Scope - `aria_nbv` is the in-repo package that powers our NBV research stack. The on-disk module is still imported as `oracle_rri`; the docs adopt the new name ahead of the eventual code rename. - Purpose: typed, torch-friendly ingestion of ASE/ATEK snippets, candidate-view generation, rendering, fusion, and RRI computation utilities. # Module Stack (current code layout) - `configs/`, `config.py`, `utils/`: config-as-factory base classes (`BaseConfig`, `SingletonConfig`), structured logging (`Console`). - `data/`: typed snippet views (`EfmSnippetView`, cameras, trajectories, semidense points, OBBs). - `data_handling/`: ATEK WebDataset loader, metadata resolver, downloader (CLI-capable). - `pose_generation/`: Monte-Carlo candidate sampling with collision and free-space pruning. - `rendering/`: PyTorch3D- and trimesh-based depth rendering for candidates + typed batch wrapper. - `views/` & `viz/`: candidate point-cloud rendering helpers and mesh/trajectory viz. - `analysis/`: depth debugger utilities. # Core Modules & Classes ## Config & Logging - **`utils.base_config.BaseConfig`**: Pydantic-powered factory base with TOML IO and `inspect()` rendering (Rich). - **`configs.path_config.PathConfig`**: Singleton paths to data roots, ATEK/ASE URL manifests, mesh directories. - **`utils.console.Console`**: Rich wrapper with prefixes and optional shared PL logger hook. ## Data Handling (`data_handling/`) ### Dataset loader (`dataset.py`) - **External dependencies used**: - `atek.data_loaders.load_atek_wds_dataset`, `select_and_remap_dict_keys` for WebDataset ingest. - `efm3d.aria.PoseTW`, `CameraTW` for typed geometry conversion (`to_camera_tw`). - `trimesh` for GT mesh loading/simplification. - **Batch handling**: `_explode_batched_dict` splits WebDataset B-dim into per-sample dicts; `ase_collate` returns lists plus EFM-ready dicts. - **Mesh pairing**: `scene_to_mesh` mapping + optional decimation + caching; tolerant when meshes absent unless `require_mesh=True`. - **Key typing**: `CameraStream`, `Trajectory`, `SemiDensePoints` mirror ATEK dataclasses; preserve Aria frame (x-left, y-up, z-forward), `T_A_B` convention. - **EFM remap**: `ASESample.to_efm_dict()` applies `EfmModelAdaptor.get_dict_key_mapping_all()` and can include `gt_mesh`. ### Metadata & Download - **`metadata.ASEMetadata`**: parses `ase_mesh_download_urls.json` and `AriaSyntheticEnvironment_ATEK_download_urls.json` → `SceneInfo(scene_id, mesh_url, mesh_sha, snippet_ids)`. - **`downloader.ASEDownloaderConfig/ASEDownloader`**: orchestrates mesh + WDS downloads (uses ATEK `download_atek_wds_sequences`, `requests` for meshes, SHA1 verification, unzip). CLI-friendly via `pydantic_settings`. ## Analysis - **`analysis/depth_debugger.py`**: utilities to inspect depth/point clouds; useful for validating candidate sampling and fusion (see TODOs). ## Visualisation - **`viz/mesh_viz.py`**: rendering helpers (Trimesh + matplotlib) for meshes and point clouds; wire into Streamlit dashboard per TODOs. # NBV Pipeline at a Glance ```{mermaid} flowchart LR A[ASE/ATEK shard\n+ GT mesh] --> B(data_handling.dataset\nEfmSnippetView) B --> C(pose_generation.\nCandidateViewGenerator) C -->|PoseTW batch| D(rendering.\nCandidateDepthRenderer) D -->|Depth maps| E[Fuse depth→PC\n(EFM3D pointcloud utils)] E --> F[Metrics: RRI / Chamfer] F --> G[Policy: pick NBV] G -->|update pose| C ``` # RRI Definition (used throughout docs/code) - Let $P_t$ be the current reconstruction point cloud, $P_q$ the point cloud from a candidate view, and $M$ the ground-truth mesh surface (sampled to points $M_s$). - Bidirectional Chamfer distance: $$ \mathrm{CD}(P, M) = \frac{1}{|P|}\sum_{p\in P}\min_{m\in M}\|p-m\|_2^2 + \frac{1}{|M_s|}\sum_{m\in M_s}\min_{p\in P}\|m-p\|_2^2. $$ - Relative Reconstruction Improvement (higher is better): $$ \mathrm{RRI}(P_t, P_q, M) = \frac{\mathrm{CD}(P_t, M) - \mathrm{CD}(P_t \cup P_q, M)}{\mathrm{CD}(P_t, M) + \varepsilon}, $$ with a small $\varepsilon$ for stability. Positive values indicate improvement after fusing the candidate view. # External Library Hooks (per component) - **Trimesh**: mesh IO, surface sampling, ray–mesh intersections; optional quadric decimation when loading GT meshes. - **EFM3D**: - `PoseTW`, `CameraTW`: camera/pose typing + operations. - `utils.ray` (`ray_grid`, `transform_rays`, `ray_obb_intersection`) for candidate depth rendering. - `utils.pointcloud` (`dist_im_to_point_cloud_im`, `pointcloud_to_occupancy_snippet`) for depth → PC and occupancy. - **ATEK**: - `load_atek_wds_dataset`, `process_wds_sample` for streaming shards. - Key mapping (`EfmModelAdaptor.get_dict_key_mapping_all`) when exporting to EFM schema. - Download helpers (`atek_data_store_download.download_atek_wds_sequences`) via `ASEDownloader`. # Theoretical context (Wikipedia quick refs) - **Active perception/vision**: NBV is an instance of active perception—moving the sensor to harvest more informative observations; active vision systems explicitly reorient cameras to reduce occlusion and improve depth estimates. [Wikipedia :: Active perception](https://en.wikipedia.org/wiki/Active_perception) - **Point clouds as state**: Our reconstruction state $P_t$ is a point cloud—an unordered set of 3D samples that approximates scene geometry. [Wikipedia :: Point cloud](https://en.wikipedia.org/wiki/Point_cloud) - **Chamfer vs. Hausdorff**: The Chamfer distance we use for RRI is a smooth, averaged variant of the symmetric Hausdorff distance between two point sets, trading strict worst-case guarantees for robustness to outliers. [Wikipedia :: Chamfer distance](https://en.wikipedia.org/wiki/Chamfer_distance); [Wikipedia :: Hausdorff distance](https://en.wikipedia.org/wiki/Hausdorff_distance) ```{mermaid} sequenceDiagram participant Agent participant CandidateGen as CandidateGen participant Renderer participant Metric Agent->>CandidateGen: last pose, mesh, extent CandidateGen-->>Agent: PoseTW candidates + masks Agent->>Renderer: valid poses + mesh + camera Renderer-->>Metric: depth maps → point clouds Metric-->>Agent: RRI scores (Chamfer/occupancy) Agent->>Agent: select next-best view (argmax) ``` # Suggested Mermaid for planned processing layer ```{mermaid} classDiagram class CandidateViewGenerator{ +generate(last_pose, mesh, extent)->CandidateSamplingResult } class CandidateDepthRenderer{ +render(sample, candidates)->CandidateDepthBatch } class PointCloudFusion{ +merge(P_t, P_q)->P_fused } class OracleRRI{ +score(P_t, P_q, mesh)->float } CandidateViewGenerator --> CandidateDepthRenderer : PoseTW batch CandidateDepthRenderer --> PointCloudFusion : depth->PC PointCloudFusion --> OracleRRI : fused PC ``` Use this layout to guide module additions under `aria_nbv/` (import path `oracle_rri` for now) while keeping current data-handling and config patterns.

1 Name & Scope

2 Module Stack (current code layout)

3 Core Modules & Classes

3.1 Config & Logging

3.2 Data Handling (data_handling/)

3.2.1 Dataset loader (dataset.py)

3.2.2 Metadata & Download

3.3 Analysis

3.4 Visualisation

4 NBV Pipeline at a Glance

5 RRI Definition (used throughout docs/code)

6 External Library Hooks (per component)

7 Theoretical context (Wikipedia quick refs)

8 Suggested Mermaid for planned processing layer

3.2 Data Handling (`data_handling/`)

3.2.1 Dataset loader (`dataset.py`)