aria_nbv Implementation Overview

Note: High-level package navigation now lives in aria_nbv_overview.qmd. This page keeps the detailed function/library pointers for reference.

1 EFM3D & ATEK Function Library Overview

This document provides a comprehensive overview of all EFM3D and ATEK functions that are relevant for implementing the aria_nbv (formerly oracle_rri) Relative Reconstruction Improvement (RRI) computation for Next-Best-View planning.

1.1 EFM3D Core Utilities

1.1.1 Ray Operations (efm3d.utils.ray)

Purpose: Generate and transform camera rays for 3D reconstruction and novel view synthesis.

  • grid_ray(pixel_grid: torch.Tensor, camera: CameraTW) -> tuple[torch.Tensor, torch.Tensor]
    • Description: Unprojects pixel grid coordinates to 3D ray directions
    • Usage: Converting image coordinates to world-space rays for rendering candidate views
    • Theory: Essential for computing what each camera pixel “sees” in 3D space
  • ray_grid(cam: CameraTW) -> tuple[torch.Tensor, torch.Tensor]
    • Description: Generates rays for all pixels in a camera’s image grid
    • Usage: Batch ray generation for efficient candidate view rendering
    • Theory: Creates the complete ray bundle for a virtual camera
  • transform_rays(rays_old: torch.Tensor, T_new_old: PoseTW) -> torch.Tensor
    • Description: Transforms rays between coordinate systems using pose transformations
    • Usage: Moving rays from candidate camera poses to world coordinates
    • Theory: Enables coordinate system alignment for multi-view geometry
  • ray_obb_intersection(rays_v: torch.Tensor, voxel_extent: torch.Tensor, ...) -> torch.Tensor | tuple[...]
    • Description: Computes ray intersections with oriented bounding boxes
    • Usage: Determining which voxels are intersected by candidate view rays
    • Theory: Critical for efficient ray-voxel intersection tests in 3D reconstruction
  • sample_depths_in_grid(rays_v: torch.Tensor, ds_max: torch.Tensor, ...) -> tuple[...]
    • Description: Samples depth values along rays within voxel grids
    • Usage: Generating sample points for volume rendering and reconstruction
    • Theory: Implements stratified sampling for neural radiance field-style rendering

1.1.2 Point Cloud Processing (efm3d.utils.pointcloud)

Purpose: Handle point cloud operations for 3D reconstruction and occupancy mapping.

  • get_points_world(batch: dict, batch_idx: int | None = None, ...) -> tuple[torch.Tensor, torch.Tensor]
    • Description: Extracts world-coordinate point clouds from ASE dataset batches
    • Usage: Converting depth images and semi-dense SLAM points to 3D coordinates
    • Theory: Foundation for creating the current reconstruction P_t
  • collapse_pointcloud_time(pc_w: torch.Tensor) -> torch.Tensor
    • Description: Merges point clouds across time, removing duplicates and NaN values
    • Usage: Combining temporal observations into a single reconstruction
    • Theory: Temporal fusion for more complete scene representations
  • pointcloud_to_voxel_ids(pc_v: torch.Tensor, vW: int, vH: int, vD: int, voxel_extent: torch.Tensor) -> tuple[...]
    • Description: Maps 3D points to voxel grid indices with validity checking
    • Usage: Converting continuous point clouds to discrete voxel representations
    • Theory: Spatial discretization for efficient 3D processing
  • pointcloud_occupancy_samples(p3s_w: torch.Tensor, Ts_wc: torch.Tensor, ...) -> tuple[...]
    • Description: Samples occupied, surface, and free space points from point clouds
    • Usage: Creating training data for occupancy field learning
    • Theory: Generates diverse 3D samples for learning scene geometry

1.1.3 Voxel Operations (efm3d.utils.voxel)

Purpose: Handle 3D voxel grid operations and coordinate transformations.

  • tensor_wrap_voxel_extent(voxel_extent, B=None, device="cpu") -> torch.Tensor
    • Description: Normalizes voxel extent representations across batches
    • Usage: Ensuring consistent voxel coordinate systems
    • Theory: Standardizes 3D bounding box representations
  • create_voxel_grid(vW: int, vH: int, vD: int, voxel_extent, device="cpu") -> torch.Tensor
    • Description: Creates 3D coordinate grids for voxel centers
    • Usage: Generating sampling coordinates for 3D reconstruction
    • Theory: Establishes regular 3D sampling patterns

1.1.4 Voxel Sampling (efm3d.utils.voxel_sampling)

Purpose: Sample from 3D voxel grids with interpolation and coordinate conversion.

  • pc_to_vox(pc_v: torch.Tensor, vW: int, vH: int, vD: int, voxel_extent) -> tuple[...]
    • Description: Converts point cloud coordinates to voxel grid coordinates
    • Usage: Mapping between continuous and discrete 3D representations
    • Theory: Essential for sampling from learned 3D representations
  • sample_voxels(feat3d: torch.Tensor, pts_v: torch.Tensor, differentiable=False) -> tuple[...]
    • Description: Samples features from 3D voxel grids using trilinear interpolation
    • Usage: Querying learned 3D features at arbitrary points
    • Theory: Enables continuous sampling from discrete 3D representations

1.1.5 Depth Processing (efm3d.utils.depth)

Purpose: Convert between depth representations and 3D point clouds.

  • dist_im_to_point_cloud_im(dist_m: torch.Tensor, cams: CameraTW) -> tuple[...]
    • Description: Converts distance images to 3D point clouds using camera calibration
    • Usage: Processing ASE dataset depth maps into 3D points
    • Theory: Fundamental unprojection operation for 3D reconstruction

1.1.6 Reconstruction Utilities (efm3d.utils.reconstruction)

Purpose: Core functions for 3D scene reconstruction and occupancy field learning.

  • build_gt_occupancy(occ, visible, p3s_w, Ts_wc, cams, T_wv, voxel_extent) -> tuple[...]
    • Description: Creates ground truth occupancy grids from point clouds
    • Usage: Generating supervision signals for 3D reconstruction
    • Theory: Converts sparse points to dense occupancy representations
  • compute_occupancy_loss_subvoxel(occ, visible, p3s_w_all, ...) -> torch.Tensor
    • Description: Computes reconstruction loss using subvoxel sampling
    • Usage: Training occupancy networks with point cloud supervision
    • Theory: Implements differentiable 3D reconstruction loss

1.1.7 Mesh Processing (efm3d.utils.mesh_utils)

Purpose: Evaluate mesh reconstruction quality and compute geometric distances.

  • eval_mesh_to_mesh(pred: str | trimesh.Trimesh, gt: str | trimesh.Trimesh, ...) -> tuple[...]
    • Description: Computes accuracy/completeness metrics between predicted and ground truth meshes
    • Usage: CRITICAL for RRI computation - this is the Chamfer Distance calculation we need
    • Theory: Implements bidirectional distance evaluation for mesh quality assessment
  • compute_pts_to_mesh_dist(pts: torch.Tensor, faces: torch.Tensor, verts: torch.Tensor, step: int) -> np.ndarray
    • Description: Computes distances from point cloud to mesh surface
    • Usage: Core component of Chamfer Distance computation
    • Theory: Point-to-surface distance for reconstruction evaluation

1.2 Camera and Pose Systems (efm3d.aria)

1.2.1 Camera Calibration (efm3d.aria.camera)

Purpose: Handle camera calibration, projection, and coordinate transformations.

  • CameraTW class: Wrapper for camera calibrations with projection/unprojection methods
    • project(self, p3d: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: 3D to 2D projection
    • unproject(self, p2d: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: 2D to 3D ray generation
    • Usage: Converting between 2D image coordinates and 3D world coordinates
    • Theory: Essential for multi-view geometry and novel view synthesis

1.2.2 Pose Transformations (efm3d.aria.pose)

Purpose: Handle SE(3) pose transformations and coordinate system conversions.

  • PoseTW class: SE(3) pose transformations with composition and inversion
    • transform(self, p3d: torch.Tensor) -> torch.Tensor: Apply pose transformation to 3D points
    • compose(self, other) -> PoseTW: Chain pose transformations
    • inverse(self) -> PoseTW: Invert pose transformation
    • Usage: Managing coordinate transformations between camera poses and world coordinates
    • Theory: Foundation for multi-view geometry and coordinate system alignment

1.3 ATEK Evaluation Framework

1.3.1 Surface Reconstruction Metrics

Purpose: Standardized evaluation of 3D reconstruction quality.

  • evaluate_single_mesh_pair(pred_mesh_filename, gt_mesh_filename, ...) -> tuple[...]
    • Description: CORE FUNCTION for RRI - computes Chamfer Distance and reconstruction metrics
    • Usage: This is exactly what we need for computing RRI = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
    • Theory: Implements the standard surface reconstruction evaluation protocol
  • evaluate_mesh_over_a_dataset(input_folder, pred_mesh_filename, gt_mesh_filename, ...) -> dict
    • Description: Batch evaluation across multiple scenes
    • Usage: Evaluating NBV performance across the ASE validation set
    • Theory: Dataset-level performance assessment

1.4 Implementation Plan

1.4.1 Core Classes to Implement

1.4.1.1 1. OracleRRI Class

class OracleRRI:
    def __init__(self, gt_mesh_path: str, voxel_extent: torch.Tensor, device: str):
        """Initialize with ground truth mesh for oracle computation"""

    def compute_rri(self, current_pointcloud: torch.Tensor, candidate_pointcloud: torch.Tensor) -> float:
        """
        Compute RRI = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
        Uses ATEK's evaluate_single_mesh_pair internally
        """

    def batch_compute_rri(self, current_pc: torch.Tensor, candidate_pcs: list[torch.Tensor]) -> torch.Tensor:
        """Efficiently compute RRI for multiple candidate views"""

1.4.1.2 2. CandidateViewGenerator Class

class CandidateViewGenerator:
    def __init__(self, camera_calibration: CameraTW, sampling_strategy: str):
        """Generate candidate camera poses around current position"""

    def generate_spherical_candidates(self, center_pose: PoseTW, radius: float, n_samples: int) -> list[PoseTW]:
        """Sample poses on sphere around current position"""

    def generate_hemisphere_candidates(self, center_pose: PoseTW, radius: float, n_samples: int) -> list[PoseTW]:
        """Sample poses on hemisphere (avoiding ground/ceiling)"""

1.4.1.3 3. CandidateViewRenderer Class

class CandidateViewRenderer:
    def __init__(self, voxel_extent: torch.Tensor, resolution: tuple[int, int, int]):
        """Render synthetic observations from candidate poses"""

    def render_depth_from_pose(self, pose: PoseTW, camera: CameraTW, current_reconstruction: torch.Tensor) -> torch.Tensor:
        """
        Generate synthetic depth image from candidate viewpoint
        Uses EFM3D ray casting and voxel sampling
        """

    def depth_to_pointcloud(self, depth: torch.Tensor, pose: PoseTW, camera: CameraTW) -> torch.Tensor:
        """Convert rendered depth to world-coordinate point cloud"""

1.4.2 Integration with EFM3D Pipeline

1.4.2.1 Memory-Efficient Implementation

  • Use sample_voxels for efficient point cloud querying
  • Batch candidate view processing to avoid memory crashes
  • Implement progressive sampling for large scenes

1.4.2.2 Data Flow

  1. Input: ASE dataset batch with GT depth and camera poses
  2. Current Reconstruction: Use get_points_world + collapse_pointcloud_time
  3. Candidate Generation: CandidateViewGenerator around latest pose
  4. Synthetic Rendering: Use EFM3D ray casting to simulate candidate observations
  5. Point Cloud Fusion: Merge current + candidate point clouds
  6. RRI Computation: ATEK evaluate_single_mesh_pair for Chamfer Distance
  7. Best View Selection: argmax(RRI scores) for next-best-view

1.4.3 Theoretical Foundation

The core insight is that RRI measures reconstruction improvement:

  • P_t: Current point cloud reconstruction from first t views
  • P_{t∪q}: Enhanced reconstruction after adding candidate view q
  • RRI(q) = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
  • Negative RRI = improvement (lower Chamfer distance is better)

The challenge is ensuring consistent point cloud sampling between P_t and P_{t∪q} for valid Chamfer Distance comparison. This requires:

  1. Unified voxel discretization using EFM3D utilities
  2. Consistent density sampling to avoid bias
  3. Memory-efficient batching for large scenes

1.4.4 Key Technical Challenges

  1. Point Cloud Density Consistency: Ensure P_t and P_{t∪q} have comparable point densities
  2. Memory Management: Avoid GPU crashes during large-scale ray casting
  3. Coordinate System Alignment: Proper use of PoseTW transformations
  4. Synthetic View Realism: Generate plausible depth observations from candidate poses

1.4.5 Success Metrics

  • Correctness: RRI correlates with actual reconstruction improvement
  • Efficiency: Can process 100+ candidate views per scene within memory limits
  • Integration: Works end-to-end with ASE dataset and EFM3D inference pipeline
  • Validation: Produces sensible next-best-view selections on validation scenes

This implementation will leverage the mature EFM3D and ATEK libraries while focusing on the specific challenge of consistent point cloud sampling for valid RRI computation.