EFM3D Symbol Index

1 Purpose

EFM3D is the foundation for our ASE-based NBV research: it standardises Aria sensor snippets, lifts Dino features into volumetric grids, fuses predictions through time, and evaluates reconstruction quality. This page catalogues every symbol from the vendored external/efm3d tree that we need, grounding each entry in the relevant theory (see also ../theory/rri_theory.qmd, ../impl/oracle_rri_impl.qmd, and atek_implementation.qmd). We emphasise tensor shapes, coordinate conventions, and provide quick usage examples for the key types.

2 Core Design Ideas

  • Structured tensors – all modalities (RGB, SLAM, depth, poses, OBBs) are stored as TensorWrapper subclasses and accessed via the ARIA_* string constants. Respecting this schema keeps our oracle loaders interoperable with EVL.
  • SE(3) geometry – camera and pose utilities rely on Lie-group operations (PoseTW, CameraTW) to interpolate and project accurately. Candidate view generation must use the same mathematics to stay consistent with GT meshes.
  • Volumetric reasoning – EVL constructs cubic voxel volumes populated with feature channels, occupancy masks, and free-space masks. These volumes are the state representation that our RRI computations tap into (cf. Chamfer distance in rri_theory.qmd).
  • Semantic heads – oriented bounding boxes (OBBs) provide categorical priors. NBV policies can leverage these for task-weighted objectives.

3 Exhaustive ARIA Constants Reference (aria/aria_constants.py)

The tables below list every constant, its structure, dtype/shape, and what it represents. Shapes are given for batched snippets: B (batch size), T (frames per snippet), N (points per frame), K (OBB slots, default 128), V = D×H×W (voxel count).

3.1 Sequence and snippet metadata

Constant Type / Shape Description
ARIA_SEQ_ID str Unique identifier of the full sensor sequence.
ARIA_SEQ_TIME_NS torch.long scalar Sequence start time in Aria clock nanoseconds.
ARIA_SNIPPET_ID torch.long [B] Index of the snippet within the parent sequence.
ARIA_SNIPPET_LENGTH_S torch.float32 [B] Duration of each snippet in seconds.
ARIA_SNIPPET_TIME_NS torch.long [B] Snippet start timestamp (ns, Aria clock).
ARIA_SNIPPET_T_WORLD_SNIPPET PoseTW [B, 12] Transform from snippet frame to world frame.
ARIA_SNIPPET_ORIGIN_RATIO torch.float32 [B] Fraction of snippet length that defines the origin (default 0.5).

3.2 Player (streamer) timing

Constant Type / Shape Description
ARIA_PLAY_TIME_NS torch.long [B, T] Playback timestamps (ns).
ARIA_PLAY_SEQUENCE_TIME_S torch.float32 [B, T] Sequence-relative time in seconds.
ARIA_PLAY_SNIPPET_TIME_S torch.float32 [B, T] Snippet-relative time in seconds.
ARIA_PLAY_FREQUENCY_HZ torch.float32 [B] Playback frequency.

3.3 RGB / SLAM image streams

Constant Type / Shape Description
ARIA_FRAME_ID list[str] (rgb/slaml/slamr) Per-stream frame IDs in sequence order.
ARIA_IMG_SNIPPET_TIME_S list[torch.float32] [B, T] each Snippet-relative timestamps per stream.
ARIA_IMG_TIME_NS list[torch.long] [B, T] each Sequence timestamps per stream.
ARIA_IMG_T_SNIPPET_RIG list[PoseTW] [B, T, 12] each Pose of rig at capture time for each frame.
ARIA_IMG list[torch.float32] [B, T, C, H, W] each Image tensors (RGB: 3×1408×1408, SLAM: 1×640×480).
ARIA_IMG_FREQUENCY_HZ list[torch.float32] [B] each Frame rate per stream.

3.4 Calibration streams

Constant Type / Shape Description
ARIA_CALIB list[CameraTW] [B, T, 26|34] each Camera intrinsics/extrinsics tensors.
ARIA_CALIB_SNIPPET_TIME_S list[torch.float32] [B, T] each Snippet-relative calibration timestamps.
ARIA_CALIB_TIME_NS list[torch.long] [B, T] each Sequence-relative calibration timestamps.

3.5 Rig poses

Constant Type / Shape Description
ARIA_POSE_SNIPPET_TIME_S torch.float32 [B, T] Snippet-relative timestamps for rig poses.
ARIA_POSE_TIME_NS torch.long [B, T] Sequence timestamps for rig poses.
ARIA_POSE_T_SNIPPET_RIG PoseTW [B, T, 12] Transform rig→snippet.
ARIA_POSE_T_WORLD_RIG PoseTW [B, T, 12] Transform rig2world.
ARIA_POSE_FREQUENCY_HZ torch.float32 [B] Pose sampling frequency.

3.6 Points & depth

Constant Type / Shape Description
ARIA_POINTS_WORLD torch.float32 [B, T, N, 3] Semi-dense SLAM point cloud (world frame).
ARIA_POINTS_TIME_NS torch.long [B, T, N] Sequence timestamps per point sample.
ARIA_POINTS_SNIPPET_TIME_S torch.float32 [B, T, N] Snippet-relative point timestamps.
ARIA_POINTS_FREQUENCY_HZ torch.float32 [B] Point stream frequency.
ARIA_POINTS_INV_DIST_STD torch.float32 [B, T, N] Inverse-distance standard deviation (\(σ_ρ\)).
ARIA_POINTS_DIST_STD torch.float32 [B, T, N] Distance standard deviation (\(σ_d\)).
ARIA_DEPTH_M list[str] Keys for z-depth maps (rgb/depth_m, …); tensors [B, T, 1, H, W].
ARIA_DISTANCE_M list[str] Keys for ray-distance maps (rgb/distance_m, …); tensors [B, T, 1, H, W].
ARIA_DEPTH_M_PRED, ARIA_DISTANCE_M_PRED list[str] Predicted depth/distance keys.

3.7 IMU & audio

Constant Type / Shape Description
ARIA_IMU list[str] (imur, imul) IMU stream roots.
ARIA_IMU_CHANNELS nested list[str] Channel names (lin_acc_ms2, rot_vel_rads).
ARIA_IMU_SNIPPET_TIME_S, ARIA_IMU_TIME_NS list[torch.float32/long] [B, T] IMU timestamps.
ARIA_IMU_FACTORY_CALIB list[torch.float32] Factory calibration matrices.
ARIA_IMU_FREQUENCY_HZ list[torch.float32] Sampling frequency per IMU.
ARIA_AUDIO str (audio) Root key for audio samples.
ARIA_AUDIO_SNIPPET_TIME_S torch.float32 [B, T_audio] Snippet-relative timestamps.
ARIA_AUDIO_TIME_NS torch.long [B, T_audio] Sequence timestamps.
ARIA_AUDIO_FREQUENCY_HZ torch.float32 [B] Audio sampling frequency.

3.8 OBB annotations & predictions

Constant Type / Shape Description
ARIA_OBB_PADDED ObbTW [B, T, K, 34] GT OBB tensor (snippet frame).
ARIA_OBB_SEM_ID_TO_NAME dict[int,str] Semantic ID → label.
ARIA_OBB_SNIPPET_TIME_S, ARIA_OBB_TIME_NS torch.float32/long [B, T, K] OBB timestamps.
ARIA_OBB_FREQUENCY_HZ torch.float32 [B] OBB stream rate.
ARIA_OBB_PRED, ARIA_OBB_PRED_VIZ ObbTW Predicted OBBs (raw & filtered).
ARIA_OBB_PRED_SEM_ID_TO_NAME dict[int,str] Predicted semantic mapping.
ARIA_OBB_PRED_PROBS_FULL, ARIA_OBB_PRED_PROBS_FULL_VIZ torch.float32 [B, T, K, C] Class logits (full & viz).
ARIA_OBB_TRACKED, ARIA_OBB_TRACKED_PROBS_FULL ObbTW / probs Tracked OBBs after association.
ARIA_OBB_UNINST ObbTW Uninstantiated (filtered) OBBs.
ARIA_OBB_BB2 list[str] Keys for 2D BBs per stream.
ARIA_OBB_BB3 str Key for 3D BB tensor.

3.9 SDF, meshes, volumes

Constant Type / Shape Description
ARIA_SDF torch.float32 [B, V] Snippet signed-distance field values.
ARIA_SDF_EXT torch.float32 [B, 6] Spatial extent of SDF grids.
ARIA_SDF_COSY_TIME_NS torch.long [B] Timestamp linking SDF to snippet frame.
ARIA_SDF_MASK torch.bool [B, V] Valid voxel mask.
ARIA_SDF_T_WORLD_VOXEL PoseTW [B, 12] Transform from voxel to world frame.
ARIA_MESH_VERTS_W, ARIA_MESH_FACES, ARIA_MESH_VERT_NORMS_W lists of tensors Snippet mesh vertices [B, Nv, 3], faces [B, Nf, 3], normals.
ARIA_SCENE_MESH_VERTS_W, ARIA_SCENE_MESH_FACES, ARIA_SCENE_MESH_VERT_NORMS_W lists Scene-level meshes.
ARIA_MESH_VOL_MIN, ARIA_MESH_VOL_MAX, ARIA_POINTS_VOL_MIN, ARIA_POINTS_VOL_MAX torch.float32 [B, 3] Axis-aligned bounds for meshes/points.

3.10 Image resolution helpers & camera metadata

Constant Type / Shape Description
RESOLUTION_MAP dict[int, tuple] Resolution ID → (RGB_hw, SLAM_w, SLAM_h).
WH_MULTIPLE_OF_MAP dict[int, int] Width/height multiples.
RGB_RADIUS_FACTOR, SLAM_RADIUS_FACTOR float Valid fisheye radius fractions.
ARIA_RGB_WIDTH_TO_RADIUS, ARIA_SLAM_WIDTH_TO_RADIUS dict[int, float] Valid radius per width.
ARIA_RGB_SCALE_TO_WH, ARIA_SLAM_SCALE_TO_WH dict[int, list[int]] Width/height pairs per scale.
ARIA_IMG_MIN_LUX, ARIA_IMG_MAX_LUX, ARIA_IMG_MAX_PERC_OVEREXPOSED, ARIA_IMG_MAX_PERC_UNDEREXPOSED float Quality thresholds.
ARIA_EFM_OUTPUT str Key for EVL inference outputs.
ARIA_CAM_INFO nested dict Camera names, stream IDs, VRS IDs, display names, spatial order.

4 Key Tensor Wrappers & Geometry Types

4.1 TensorWrapper (aria/tensor_wrapper.py)

  • Role: lightweight wrapper around tensors that preserves shape metadata and supports device-aware batching. All higher-level wrappers inherit from it.
  • Shape: arbitrary; data stored in _data attribute.
  • Usage:
from efm3d.aria.tensor_wrapper import TensorWrapper, smart_stack

w1 = TensorWrapper(torch.zeros(12))
w2 = TensorWrapper(torch.ones(12))
stacked = smart_stack([w1, w2])  # TensorWrapper with data shape [2, 12]

4.2 PoseTW (aria/pose.py)

  • Role: SE(3) pose wrapper storing rotation and translation flattened into 12 numbers.
  • Shape: [B, T, 12] or [T, 12]; dtype torch.float32.
  • Theory: interpolation leverages the Lie algebra of SO(3). For poses \((R_i, t_i)\) and \((R_j, t_j)\) at times \(t_i, t_j\), EVL computes the twist log(R_iᵀ R_j), scales it by \((t - t_i)/(t_j - t_i)\), exponentiates to obtain \(R(t)\), and linearly blends translations.
  • Usage:
from efm3d.aria.pose import PoseTW

times = torch.tensor([0.0, 1.0])
poses = PoseTW(torch.eye(4).view(1, 12).repeat(2, 1))
interp, mask = poses.interpolate(times, torch.tensor([0.5]))
T_world_cam = interp.to_matrix().view(4, 4)

4.3 CameraTW (aria/camera.py)

  • Role: camera intrinsics/extrinsics wrapper with distortion parameters and valid radii.
  • Shape: [B, T, 34] for RGB, [B, T, 26] for SLAM streams.
  • Theory: projections follow fisheye or pinhole models; .project maps camera-frame points to pixels, .unproject recovers rays.
  • Usage:
from efm3d.aria.camera import get_aria_camera

cam = get_aria_camera()  # RGB camera at 1408×1408
points_cam = torch.tensor([[0.0, 0.0, 1.0]])
pixels, depths = cam.project(points_cam)

4.4 ObbTW (aria/obb.py)

  • Role: oriented bounding box tensor (center, extents, quaternion, scores) with utilities for projection and IoU.
  • Shape: [B, T, K, 34].
  • Usage:
from efm3d.aria.obb import ObbTW, transform_obbs
from efm3d.aria.pose import PoseTW

obbs = ObbTW(torch.zeros(1, 1, 128, 34))
T_world_snippet = PoseTW(torch.eye(4).view(1, 12))
obbs_world = transform_obbs(obbs, T_world_snippet)

5 Dataset & Adaptor Modules

5.1 Streamed ATEK → EFM Datasets

5.1.1 WdsStreamDataset (dataset/wds_dataset.py)

  • Converts raw ARIA multimodal WDS snippets to EVL-ready tensors; iterates 2 s shards with configurable stride/snippet length.
  • Keys converted via convert_to_aria_multimodal_dataset():
    • Images: rgb/img, slaml/img, slamr/img as torch.float32 [T,C,H,W], SLAM frames forced to 1 channel.
    • Poses/calib: pose/t_world_rig, pose/t_snippet_rig, rgb/t_snippet_rig, … stored as PoseTW; calibration as CameraTW.
    • Optional GT: obb/padded, points/world, points/vol_min/max.
  • Snippet slicing: crops a rolling window (snippet_length_s, stride stride_length_s) out of 2 s WDS chunks; world/snippet transforms and volume bounds are NOT cropped.

5.1.2 AtekWdsStreamDataset (dataset/atek_wds_dataset.py)

  • Wraps ATEK WDS shards and runs load_atek_wds_dataset_as_efm() to remap keys and adapt schema before slicing windows.
  • Uses FPS, snippet length, and stride identical to WdsStreamDataset, but upstream samples already include EFM-compliant keys produced by the adaptor (see below).

5.1.3 EfmModelAdaptor (dataset/efm_model_adaptor.py)

Bridges ATEK WebDataset samples to the EVL schema and enforces fixed shapes for batching.

  • Key remapping: get_dict_key_mapping_all() maps ATEK flattened keys to EFM names (mfcd#camera-rgb+images → rgb/img, mtd#ts_world_device → pose/t_world_rig, msdpd#points_world → points/p3s_world, etc.).
  • Padding and typing:
    • fixed_num_frames = snippet_length_s * freq (default 2 s @ 10 Hz → 20 frames).
    • Semidense lists padded to [T, N_max, 3|1] (semidense_points_pad_to_num=50k default).
    • Cameras converted to CameraTW with per-frame gains/exposures, shared intrinsics/extrinsics; duplicated calibrations get /calib/time_ns & /calib/snippet_time_s.
    • All images promoted to float32 and scaled to [0,1] (RGB stays RGB order).
  • Pose realignment:
    • Optional gravity fix: if ATEK world gravity is [0,-9.81,0], rotate to EFM’s [0,0,-9.81].
    • Split world pose into snippet/t_world_snippet (first frame) and pose/t_snippet_rig; duplicate to each camera */t_snippet_rig.
    • Timestamp split: snippet/time_ns = rgb/img/time_ns[0]; per-stream /snippet_time_s computed relative to it.
    • run_local_cosy() recenters snippet time at origin_ratio (default 0.5) to stabilise interpolation; transforms OBBs into the new snippet frame.
  • GT handling: OBB GT converted to ObbTW, padded to 128 slots, optional taxonomy remap (CSV) applied; obbs/time_ns stored alongside obbs/sem_id_to_name.
  • Entry points:
from efm3d.dataset.efm_model_adaptor import (
    load_atek_wds_dataset_as_efm,
    load_atek_wds_dataset_as_efm_train,
)

dataset = load_atek_wds_dataset_as_efm(
    urls="/data/ase_eval/train-{00000..00099}.tar",
    freq=10,
    snippet_length_s=2.0,
    atek_to_efm_taxonomy_mapping_file="atek_to_efm.csv",
    batch_size=1,
)
sample = next(iter(dataset))
assert sample["rgb/img"].shape == (20, 3, 1408, 1408)  # fixed frame count

6 Model Components

6.1 VideoBackboneDinov2 (model/video_backbone.py)

  • Role: DinoV2.5 encoder returning frame-wise feature maps rgb/feat.
  • Usage:
from efm3d.model.video_backbone import VideoBackboneDinov2

backbone = VideoBackboneDinov2(model_name="dinov2_vitg14", img_size=1408)
features = backbone({"rgb/img": torch.randn(1, 10, 3, 1408, 1408)})

6.2 Lifter (model/lifter.py)

  • Role: lifts 2D features into a 3D voxel grid, producing voxel/feat, point masks, and free-space masks.
  • Outputs: voxel/feat [B, C_out, D, H, W], voxel/pts_world [B, D·H·W, 3], voxel/T_world_voxel [B, 12].
  • Usage (conceptual—requires full adaptor batch):
from efm3d.model.lifter import Lifter

lifter = Lifter(
    in_dim=1024,
    out_dim=128,
    patch_size=16,
    voxel_size=[64, 64, 64],
    voxel_extent=[-4, 4, -4, 4, -1, 7],
)
outputs = lifter(batch)  # batch from EfmModelAdaptor
vol = outputs["voxel/feat"]

6.3 EVL & EfmInference

  • Role: EVL combines the lifter with occupancy and OBB heads; EfmInference wraps configuration and checkpoint loading for inference.
  • Usage:
from efm3d.inference.model import EfmInference

model = EfmInference(
    cfg_path="external/efm3d/efm3d/config/evl_inf.yaml",
    ckpt_path="./ckpt/model_release.pth",
)
outputs = model.forward(batch)
occ_logits = outputs["occ/logits"]  # [B, 1, D, H, W]

7 Fusion & Evaluation Utilities

7.1 VolumeFusion (inference/fuse.py)

  • Role: accumulates per-snippet occupancy logits into a global volume, weighting observations and masking uncertain boundary voxels.
  • Usage:
from efm3d.inference.fuse import VolumeFusion
from efm3d.aria.pose import PoseTW

fusion = VolumeFusion(voxel_size=[64, 64, 64], voxel_extent=[-4, 4, -4, 4, -1, 7])
local_logits = torch.rand(64, 64, 64)
T_l_w = PoseTW(torch.eye(4).view(1, 12))
fusion.fuse(local_logits, local_extent=[-4, 4, -4, 4, -1, 7], T_l_w=T_l_w)

7.2 run_one (inference/pipeline.py)

  • Role: orchestrates dataset streaming, EVL inference, fusion, and metrics aggregation.
  • Usage:
from efm3d.inference.pipeline import run_one

run_one(
    input_path="datasets/ase_eval/81022",
    model_ckpt="./ckpt/model_release.pth",
    model_cfg="external/efm3d/efm3d/config/evl_inf.yaml",
    max_snip=16,
    snip_stride=0.1,
    voxel_res=0.04,
    output_dir="./output",
)

7.3 obb_eval_dataset (inference/eval.py)

  • Role: loads per-sequence detections and computes joint 3D detection mAP.
  • Usage:
from efm3d.inference.eval import obb_eval_dataset

joint_metrics = obb_eval_dataset("./output/model_release")

8 Geometry & Metric Utilities (efm3d.utils)

8.1 Point clouds & rays

  • get_points_world – convert depth or semi-dense points into world coordinates.
from efm3d.utils.pointcloud import get_points_world

points_world, sigma_d = get_points_world(batch)
  • get_freespace_world – sample free-space points along camera rays.
from efm3d.utils.pointcloud import get_freespace_world
from efm3d.aria.pose import PoseTW

free_pts = get_freespace_world(
    batch,
    batch_idx=0,
    T_wv=PoseTW(torch.eye(4).view(1, 12)),
    vW=64,
    vH=64,
    vD=64,
    voxel_extent=torch.tensor([-4, 4, -4, 4, -1, 7], dtype=torch.float32),
)
  • pointcloud_to_occupancy_snippet – rasterise points into voxel occupancy masks.
from efm3d.utils.pointcloud import pointcloud_to_occupancy_snippet

occupancy = pointcloud_to_occupancy_snippet(
    points_world,
    vW=64,
    vH=64,
    vD=64,
    voxel_extent=torch.tensor([-4, 4, -4, 4, -1, 7], dtype=torch.float32),
)
  • ray_grid – compute voxel traversal for rays (useful for custom visibility checks).
from efm3d.utils.ray import ray_grid

rays = torch.tensor([[0., 0., 0., 0., 0., 1.]])  # origin + direction
steps = ray_grid(
    rays,
    voxel_extent=torch.tensor([-4, 4, -4, 4, -1, 7], dtype=torch.float32),
    vW=64,
    vH=64,
    vD=64,
)

8.2 Mesh metrics

  • compute_pts_to_mesh_dist and eval_mesh_to_mesh – bidirectional point-to-mesh distances (Chamfer components).
from efm3d.utils.mesh_utils import compute_pts_to_mesh_dist, eval_mesh_to_mesh

dists = compute_pts_to_mesh_dist(points_world[0], faces, verts)
metrics, acc, comp = eval_mesh_to_mesh("pred.ply", "gt.ply")

8.3 Losses

  • compute_occ_losses – occupancy BCE + TV.
  • compute_obb_losses – classification + IoU regression for OBB heads.
from efm3d.utils.evl_loss import compute_occ_losses

losses = compute_occ_losses(pred_logits, gt_labels, valid_mask)

9 How These Symbols Support NBV & RRI

  • Oracle RRIs: use get_points_world for current reconstructions, fuse new predictions with VolumeFusion, and evaluate against GT meshes via compute_pts_to_mesh_dist to obtain the directed Chamfer terms used in rri_theory.qmd.
  • Visibility & novelty: get_freespace_world, pointcloud_to_occupancy_snippet, and ray_grid (noted in efm3d/utils/ray.py) deliver accurate coverage estimates akin to GenNBV, but grounded in ASE geometry (ase_dataset.qmd).
  • Semantic weighting: ObbTW tensors and obb_eval_dataset expose per-class coverage and mAP scores that we can fold into task-specific NBV rewards.
  • Coordinate consistency: run_local_cosy, PoseTW.interpolate, and CameraTW.project keep candidate poses, fused reconstructions, and GT meshes in the same world frame—necessary when computing oracle metrics (oracle_rri_impl.qmd).

10 Quick Reference Table

Symbol Location Concept Why it matters
ARIA_* aria/aria_constants.py Dataset key schema Ensure loaders/oracles read/write tensors correctly.
PoseTW aria/pose.py SE(3) interpolation Align candidate viewpoints with snippet/world frames.
CameraTW aria/camera.py Fisheye projection Generate rays for coverage & rendering.
ObbTW aria/obb.py Oriented boxes Semantic coverage metrics & detection losses.
EfmModelAdaptor dataset/efm_model_adaptor.py Snippet normalisation Keeps EVL and oracle batches aligned.
Lifter model/lifter.py 2D→3D feature lifting Supplies volumetric priors for RRI.
VolumeFusion inference/fuse.py Occupancy fusion Accumulates evidence across viewpoints.
get_freespace_world utils/pointcloud.py Free-space sampling Coverage/information gain cues.
compute_pts_to_mesh_dist utils/mesh_utils.py Chamfer distance term Core of oracle RRI metric.
obb_eval_dataset inference/eval.py Dataset-level mAP Benchmarks semantic reconstruction quality.

Use this catalogue as a map when wiring EVL outputs into our oracle metrics or when you need the exact tensor layout for NBV experiments.