aria_nbv Implementation Overview
Note: High-level package navigation now lives in
aria_nbv_overview.qmd. This page keeps the detailed function/library pointers for reference.
1 EFM3D & ATEK Function Library Overview
This document provides a comprehensive overview of all EFM3D and ATEK functions that are relevant for implementing the aria_nbv (formerly oracle_rri) Relative Reconstruction Improvement (RRI) computation for Next-Best-View planning.
1.1 EFM3D Core Utilities
1.1.1 Ray Operations (efm3d.utils.ray)
Purpose: Generate and transform camera rays for 3D reconstruction and novel view synthesis.
grid_ray(pixel_grid: torch.Tensor, camera: CameraTW) -> tuple[torch.Tensor, torch.Tensor]- Description: Unprojects pixel grid coordinates to 3D ray directions
- Usage: Converting image coordinates to world-space rays for rendering candidate views
- Theory: Essential for computing what each camera pixel “sees” in 3D space
ray_grid(cam: CameraTW) -> tuple[torch.Tensor, torch.Tensor]- Description: Generates rays for all pixels in a camera’s image grid
- Usage: Batch ray generation for efficient candidate view rendering
- Theory: Creates the complete ray bundle for a virtual camera
transform_rays(rays_old: torch.Tensor, T_new_old: PoseTW) -> torch.Tensor- Description: Transforms rays between coordinate systems using pose transformations
- Usage: Moving rays from candidate camera poses to world coordinates
- Theory: Enables coordinate system alignment for multi-view geometry
ray_obb_intersection(rays_v: torch.Tensor, voxel_extent: torch.Tensor, ...) -> torch.Tensor | tuple[...]- Description: Computes ray intersections with oriented bounding boxes
- Usage: Determining which voxels are intersected by candidate view rays
- Theory: Critical for efficient ray-voxel intersection tests in 3D reconstruction
sample_depths_in_grid(rays_v: torch.Tensor, ds_max: torch.Tensor, ...) -> tuple[...]- Description: Samples depth values along rays within voxel grids
- Usage: Generating sample points for volume rendering and reconstruction
- Theory: Implements stratified sampling for neural radiance field-style rendering
1.1.2 Point Cloud Processing (efm3d.utils.pointcloud)
Purpose: Handle point cloud operations for 3D reconstruction and occupancy mapping.
get_points_world(batch: dict, batch_idx: int | None = None, ...) -> tuple[torch.Tensor, torch.Tensor]- Description: Extracts world-coordinate point clouds from ASE dataset batches
- Usage: Converting depth images and semi-dense SLAM points to 3D coordinates
- Theory: Foundation for creating the current reconstruction P_t
collapse_pointcloud_time(pc_w: torch.Tensor) -> torch.Tensor- Description: Merges point clouds across time, removing duplicates and NaN values
- Usage: Combining temporal observations into a single reconstruction
- Theory: Temporal fusion for more complete scene representations
pointcloud_to_voxel_ids(pc_v: torch.Tensor, vW: int, vH: int, vD: int, voxel_extent: torch.Tensor) -> tuple[...]- Description: Maps 3D points to voxel grid indices with validity checking
- Usage: Converting continuous point clouds to discrete voxel representations
- Theory: Spatial discretization for efficient 3D processing
pointcloud_occupancy_samples(p3s_w: torch.Tensor, Ts_wc: torch.Tensor, ...) -> tuple[...]- Description: Samples occupied, surface, and free space points from point clouds
- Usage: Creating training data for occupancy field learning
- Theory: Generates diverse 3D samples for learning scene geometry
1.1.3 Voxel Operations (efm3d.utils.voxel)
Purpose: Handle 3D voxel grid operations and coordinate transformations.
tensor_wrap_voxel_extent(voxel_extent, B=None, device="cpu") -> torch.Tensor- Description: Normalizes voxel extent representations across batches
- Usage: Ensuring consistent voxel coordinate systems
- Theory: Standardizes 3D bounding box representations
create_voxel_grid(vW: int, vH: int, vD: int, voxel_extent, device="cpu") -> torch.Tensor- Description: Creates 3D coordinate grids for voxel centers
- Usage: Generating sampling coordinates for 3D reconstruction
- Theory: Establishes regular 3D sampling patterns
1.1.4 Voxel Sampling (efm3d.utils.voxel_sampling)
Purpose: Sample from 3D voxel grids with interpolation and coordinate conversion.
pc_to_vox(pc_v: torch.Tensor, vW: int, vH: int, vD: int, voxel_extent) -> tuple[...]- Description: Converts point cloud coordinates to voxel grid coordinates
- Usage: Mapping between continuous and discrete 3D representations
- Theory: Essential for sampling from learned 3D representations
sample_voxels(feat3d: torch.Tensor, pts_v: torch.Tensor, differentiable=False) -> tuple[...]- Description: Samples features from 3D voxel grids using trilinear interpolation
- Usage: Querying learned 3D features at arbitrary points
- Theory: Enables continuous sampling from discrete 3D representations
1.1.5 Depth Processing (efm3d.utils.depth)
Purpose: Convert between depth representations and 3D point clouds.
dist_im_to_point_cloud_im(dist_m: torch.Tensor, cams: CameraTW) -> tuple[...]- Description: Converts distance images to 3D point clouds using camera calibration
- Usage: Processing ASE dataset depth maps into 3D points
- Theory: Fundamental unprojection operation for 3D reconstruction
1.1.6 Reconstruction Utilities (efm3d.utils.reconstruction)
Purpose: Core functions for 3D scene reconstruction and occupancy field learning.
build_gt_occupancy(occ, visible, p3s_w, Ts_wc, cams, T_wv, voxel_extent) -> tuple[...]- Description: Creates ground truth occupancy grids from point clouds
- Usage: Generating supervision signals for 3D reconstruction
- Theory: Converts sparse points to dense occupancy representations
compute_occupancy_loss_subvoxel(occ, visible, p3s_w_all, ...) -> torch.Tensor- Description: Computes reconstruction loss using subvoxel sampling
- Usage: Training occupancy networks with point cloud supervision
- Theory: Implements differentiable 3D reconstruction loss
1.1.7 Mesh Processing (efm3d.utils.mesh_utils)
Purpose: Evaluate mesh reconstruction quality and compute geometric distances.
eval_mesh_to_mesh(pred: str | trimesh.Trimesh, gt: str | trimesh.Trimesh, ...) -> tuple[...]- Description: Computes accuracy/completeness metrics between predicted and ground truth meshes
- Usage: CRITICAL for RRI computation - this is the Chamfer Distance calculation we need
- Theory: Implements bidirectional distance evaluation for mesh quality assessment
compute_pts_to_mesh_dist(pts: torch.Tensor, faces: torch.Tensor, verts: torch.Tensor, step: int) -> np.ndarray- Description: Computes distances from point cloud to mesh surface
- Usage: Core component of Chamfer Distance computation
- Theory: Point-to-surface distance for reconstruction evaluation
1.2 Camera and Pose Systems (efm3d.aria)
1.2.1 Camera Calibration (efm3d.aria.camera)
Purpose: Handle camera calibration, projection, and coordinate transformations.
CameraTWclass: Wrapper for camera calibrations with projection/unprojection methodsproject(self, p3d: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: 3D to 2D projectionunproject(self, p2d: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: 2D to 3D ray generation- Usage: Converting between 2D image coordinates and 3D world coordinates
- Theory: Essential for multi-view geometry and novel view synthesis
1.2.2 Pose Transformations (efm3d.aria.pose)
Purpose: Handle SE(3) pose transformations and coordinate system conversions.
PoseTWclass: SE(3) pose transformations with composition and inversiontransform(self, p3d: torch.Tensor) -> torch.Tensor: Apply pose transformation to 3D pointscompose(self, other) -> PoseTW: Chain pose transformationsinverse(self) -> PoseTW: Invert pose transformation- Usage: Managing coordinate transformations between camera poses and world coordinates
- Theory: Foundation for multi-view geometry and coordinate system alignment
1.3 ATEK Evaluation Framework
1.3.1 Surface Reconstruction Metrics
Purpose: Standardized evaluation of 3D reconstruction quality.
evaluate_single_mesh_pair(pred_mesh_filename, gt_mesh_filename, ...) -> tuple[...]- Description: CORE FUNCTION for RRI - computes Chamfer Distance and reconstruction metrics
- Usage: This is exactly what we need for computing RRI = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
- Theory: Implements the standard surface reconstruction evaluation protocol
evaluate_mesh_over_a_dataset(input_folder, pred_mesh_filename, gt_mesh_filename, ...) -> dict- Description: Batch evaluation across multiple scenes
- Usage: Evaluating NBV performance across the ASE validation set
- Theory: Dataset-level performance assessment
1.4 Implementation Plan
1.4.1 Core Classes to Implement
1.4.1.1 1. OracleRRI Class
class OracleRRI:
def __init__(self, gt_mesh_path: str, voxel_extent: torch.Tensor, device: str):
"""Initialize with ground truth mesh for oracle computation"""
def compute_rri(self, current_pointcloud: torch.Tensor, candidate_pointcloud: torch.Tensor) -> float:
"""
Compute RRI = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
Uses ATEK's evaluate_single_mesh_pair internally
"""
def batch_compute_rri(self, current_pc: torch.Tensor, candidate_pcs: list[torch.Tensor]) -> torch.Tensor:
"""Efficiently compute RRI for multiple candidate views"""1.4.1.2 2. CandidateViewGenerator Class
class CandidateViewGenerator:
def __init__(self, camera_calibration: CameraTW, sampling_strategy: str):
"""Generate candidate camera poses around current position"""
def generate_spherical_candidates(self, center_pose: PoseTW, radius: float, n_samples: int) -> list[PoseTW]:
"""Sample poses on sphere around current position"""
def generate_hemisphere_candidates(self, center_pose: PoseTW, radius: float, n_samples: int) -> list[PoseTW]:
"""Sample poses on hemisphere (avoiding ground/ceiling)"""1.4.1.3 3. CandidateViewRenderer Class
class CandidateViewRenderer:
def __init__(self, voxel_extent: torch.Tensor, resolution: tuple[int, int, int]):
"""Render synthetic observations from candidate poses"""
def render_depth_from_pose(self, pose: PoseTW, camera: CameraTW, current_reconstruction: torch.Tensor) -> torch.Tensor:
"""
Generate synthetic depth image from candidate viewpoint
Uses EFM3D ray casting and voxel sampling
"""
def depth_to_pointcloud(self, depth: torch.Tensor, pose: PoseTW, camera: CameraTW) -> torch.Tensor:
"""Convert rendered depth to world-coordinate point cloud"""1.4.2 Integration with EFM3D Pipeline
1.4.2.1 Memory-Efficient Implementation
- Use
sample_voxelsfor efficient point cloud querying - Batch candidate view processing to avoid memory crashes
- Implement progressive sampling for large scenes
1.4.2.2 Data Flow
- Input: ASE dataset batch with GT depth and camera poses
- Current Reconstruction: Use
get_points_world+collapse_pointcloud_time - Candidate Generation:
CandidateViewGeneratoraround latest pose - Synthetic Rendering: Use EFM3D ray casting to simulate candidate observations
- Point Cloud Fusion: Merge current + candidate point clouds
- RRI Computation: ATEK
evaluate_single_mesh_pairfor Chamfer Distance - Best View Selection: argmax(RRI scores) for next-best-view
1.4.3 Theoretical Foundation
The core insight is that RRI measures reconstruction improvement:
- P_t: Current point cloud reconstruction from first t views
- P_{t∪q}: Enhanced reconstruction after adding candidate view q
- RRI(q) = Chamfer(P_{t∪q}, GT) - Chamfer(P_t, GT)
- Negative RRI = improvement (lower Chamfer distance is better)
The challenge is ensuring consistent point cloud sampling between P_t and P_{t∪q} for valid Chamfer Distance comparison. This requires:
- Unified voxel discretization using EFM3D utilities
- Consistent density sampling to avoid bias
- Memory-efficient batching for large scenes
1.4.4 Key Technical Challenges
- Point Cloud Density Consistency: Ensure P_t and P_{t∪q} have comparable point densities
- Memory Management: Avoid GPU crashes during large-scale ray casting
- Coordinate System Alignment: Proper use of PoseTW transformations
- Synthetic View Realism: Generate plausible depth observations from candidate poses
1.4.5 Success Metrics
- Correctness: RRI correlates with actual reconstruction improvement
- Efficiency: Can process 100+ candidate views per scene within memory limits
- Integration: Works end-to-end with ASE dataset and EFM3D inference pipeline
- Validation: Produces sensible next-best-view selections on validation scenes
This implementation will leverage the mature EFM3D and ATEK libraries while focusing on the specific challenge of consistent point cloud sampling for valid RRI computation.