Glossary

metrics Relative Reconstruction Improvement (RRI) Theory

Chamfer Distance

CD core metrics.reconstruction_quality

Aliases Chamfer metric

Symbols

\(D\)rri.cd_value \(\mathcal{P}\)oracle.points \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh

Equations

\(D(\mathcal{P},\mathcal{M}^{\mathrm{GT}})=D_{P\to M}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})+D_{M\to P}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})\)rri.cd

Historical bidirectional distance family used to compare reconstructed points against reference geometry.

Thesis-facing ARIA-NBV notation uses point-mesh error D with directional components D_{P->M} and D_{M->P}; older seminar material may still call this CD.

Links and references

RRI PC GT

Docs

References

Master Thesis Research Questions Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory

Target-Specific RRI

target RRI core metrics.reconstruction_quality

Aliases entity RRI object RRI target RRI

Symbols

\(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Equations

\(\mathrm{RRI}_e(q)=\frac{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})-D(\mathcal{P}_t^e\cup\mathcal{P}_q^e,\mathcal{M}_e^{\mathrm{GT}})}{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon}\)rri.target_rri

RRI computed only on the ground-truth and reconstructed geometry associated with a selected target of interest.

Target-specific RRI lets the thesis compare views by how much they improve a selected object or region, even when scene-level RRI would prefer large background surfaces.

\[ \mathrm{RRI}_e(q)=\frac{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})-D(\mathcal{P}_t^e\cup\mathcal{P}_q^e,\mathcal{M}_e^{\mathrm{GT}})}{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon} \]

Links and references

RRI target cost

Docs

References

target RRI target-conditioned scorer OBB

Target of Interest

target core entity.targeting

Aliases target entity object of interest inspection target task target

Symbols

\(e_t\)rl.target \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Selected entity, object crop, point, region, or surface-deficit hypothesis whose reconstruction quality should be improved.

Target-conditioned ARIA-NBV variants use this target as an explicit input to candidate scoring and planning instead of optimizing only scene-level RRI.

Links and references

Docs

Master Thesis Research Questions Master Thesis Research Questions types

Point Cloud

PC core reconstruction.representation

Aliases 3D points semi-dense point cloud

Symbols

\(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q

Set of 3D points representing observed scene geometry.

ARIA-NBV compares current and candidate-fused point clouds against ground-truth meshes to compute oracle RRI and surface-distance diagnostics.

Links and references

RRI oracle RRI SLAM

Docs

Semi-Dense Point Clouds Surface Reconstruction Metrics Relative Reconstruction Improvement (RRI) Theory

Candidate View

candidate core planning.action

Aliases candidate pose candidate viewpoint next-view candidate

Symbols

\(\mathcal{Q}_t\)oracle.candidates_t \(q_{t,i}\)oracle.candidate_qti

Proposed camera pose whose expected reconstruction utility is evaluated before selecting the next observation.

ARIA-NBV samples candidate views around a reference pose or target, renders candidate depths from the mesh for oracle labels, and scores candidates with RRI or learned VIN predictions.

Links and references

NBV oracle RRI target-conditioned scorer

Docs

Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory CandidateViewGenerator

References

Relative Reconstruction Improvement (RRI) Theory Aria Synthetic Environments (ASE) Dataset Master Thesis Roadmap

Oracle RRI

oracle RRI core supervision.oracle

Aliases RRI oracle oracle label mesh-supervised RRI

Symbols

\(\mathrm{RRI}\)oracle.rri \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh

Equations

\(\mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon}\)rri.rri

RRI label computed with privileged ground-truth geometry, used for supervised training and evaluation.

The current oracle renders candidate depth maps from ASE ground-truth meshes, backprojects candidate point clouds, fuses them with the current semi-dense reconstruction, and scores the resulting surface-distance improvement.

Links and references

RRI candidate ASE

Docs

References

@VIN-NBV-frahm2025 @EFM3D-straub2024

Target-Conditioned Scorer

target-conditioned scorer core model.scoring

Aliases target-conditioned VIN target-aware scorer

Symbols

\(\hat{r}\)vin.rri_hat

Equations

\(\rho=\operatorname{corr}(\operatorname{rank}(\hat{r}_i),\operatorname{rank}(r_i))\)metrics.spearman \(\mathrm{TopKAcc}(k)=\frac{1}{N}\sum_i\mathbb{1}[y_i\in\mathrm{TopK}(\boldsymbol{\pi}_i,k)]\)metrics.topk_acc

VIN-style candidate scorer that receives scene state, a candidate view, and an encoding of the target of interest.

The scorer predicts target-specific utility so view ranking can prioritize a selected entity or region instead of only optimizing scene-level RRI.

Links and references

target target RRI VIN Q_H

Docs

Master Thesis Research Questions model_v3 VinModelV3

References

Master Thesis Research Questions Master Thesis Research Questions

Observed Target Selection

OBS-SEL core protocol.targeting

Aliases observed-only target selection actor-visible target selection

Main thesis protocol component requiring target selection to use only actor-visible observed or predicted target evidence.

Observed Target Selection uses predicted or tracked OBBs, class probabilities, confidence, projected area, and semidense or EVL support. Ground-truth target annotations are not visible to the selector in the main thesis protocol.

Links and references

target PRED-Q GT-EVAL OBB

Docs

References

@EFM3D-straub2024 @ProjectAria-ASE-2025

Predicted-Target Q

PRED-Q core protocol.learning

Aliases predicted-target scorer predicted-target Q_H actor-visible Q

Symbols

\(Q_H\)rl.qh

Equations

\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h

Main thesis protocol component requiring scorer or Q_H inputs to use predicted or observed target descriptors.

Predicted-Target Q covers target-conditioned one-step scoring and finite-candidate Q_H selection whose target inputs are actor-visible predicted or observed descriptors, not ground-truth target annotations.

Links and references

OBS-SEL GT-EVAL target-conditioned scorer target RRI Q_H

Docs

Master Thesis Research Questions Master Thesis Research Questions Master Thesis Research Questions

References

@VIN-NBV-frahm2025 @DoubleDQN-vanHasselt2015

Ground-Truth Target Evaluation

GT-EVAL core protocol.evaluation

Aliases GT target evaluation GT target crop evaluation oracle target evaluation

Symbols

\(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Main thesis protocol component using ground-truth OBBs and target mesh crops only for labels and evaluation.

Ground-Truth Target Evaluation uses GT OBBs and target mesh crops for oracle target-RRI labels, matching checks, and evaluation while keeping those annotations hidden from the actor-visible selector, scorer, and Q_H model in the main result.

Links and references

GT target RRI OBS-SEL PRED-Q

Docs

Master Thesis Research Questions Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory

References

Master Thesis Research Questions Master Thesis Roadmap

Acquisition Cost

cost core planning.objective

Aliases capture budget view budget motion cost

Symbols

\(C(\tau)\)rl.acquisition_cost

Budget consumed to acquire observations, measured by view count, path length, elapsed time, invalid-action rate, or a weighted combination.

The thesis should first report cumulative root-normalized target gain, diagnostic target RRI, and acquisition cost separately, then use scalarized objectives only when the tradeoff is explicit.

Links and references

target RRI candidate NBV

Docs

Target-Conditioned NBV MDP

NBV MDP core planning.mdp

Aliases ARIA-NBV MDP rollout MDP finite-candidate NBV MDP

Symbols

\(\mathcal{M}_{\mathrm{NBV}}\)rl.mdp_nbv \(s\)rl.s \(\mathcal{A}(s_t)\)rl.action_set \(T\)rl.transition \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H

Equations

\(\mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H)\)rl.nbv_mdp

Finite-horizon MDP contract for target-conditioned ARIA-NBV rollouts and fitted Q_H training.

The ARIA-NBV MDP keeps actions restricted to sampled finite candidate views and keeps GT meshes or GT target crops outside the actor-visible state. It is the contract that connects target-conditioned rollout generation, reward computation, validity masks, and fitted finite-horizon Q learning.

ARIA-NBV MDP contract

\[ \mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H) \]

Links and references

state action set transition reward return Q_H mask

Docs

Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions Master Thesis Roadmap

References

@VIN-NBV-frahm2025 @DoubleDQN-vanHasselt2015

Rollout State

state core planning.mdp

Aliases MDP state actor-visible state rollout observation

Symbols

\(s\)rl.s \(s_t^{\mathrm{hist}}\)rl.s_hist \(s_t^{\mathrm{off}}\)rl.s_off \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(s_t^{\mathrm{cf+}}\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}\)rl.s_oracle \(\mathcal{P}\)oracle.points \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(e_t\)rl.target \(b_t\)rl.budget

Equations

\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist \(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off \(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0 \(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle

Rollout state family separating actor-visible state from oracle-only supervision.

ARIA-NBV distinguishes raw historic snippet state, persisted VIN offline sample state, minimal counterfactual actor state, geometry-rich counterfactual ablation state, and privileged oracle rollout state. The main Q_H actor input starts from the minimal counterfactual state; all-candidate GT renders, GT mesh crops, and oracle RRI labels remain outside the actor-visible state.

Links and references

historic state offline state CF0 state CF+ state oracle state action set mask OBS-SEL PRED-Q

Docs

Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions

Historic Snippet State

historic state core planning.state

Aliases raw historic state logged snippet state historic observed state

Symbols

\(s_t^{\mathrm{hist}}\)rl.s_hist

Equations

\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist

Raw actor-visible state from the logged ASE/Project Aria snippet trajectory.

The historic snippet state is the richest non-privileged state because it comes from the original logged trajectory. It may contain calibrated camera streams, timestamps, trajectory and gravity estimates, semidense points with support fields, frozen EVL/EFM evidence, and observed or predicted OBBs. It must not contain the GT mesh or GT OBB crops as actor inputs.

Links and references

offline state EVL OBB

Docs

Finite-Candidate Rollout And Q_H Contract Project Aria EFM3D and EVL

References

@projectaria-engel2023 @EFM3D-straub2024

Persisted Offline Sample State

offline state core planning.state

Aliases offline sample state VIN offline state persisted VIN state

Symbols

\(s_t^{\mathrm{off}}\)rl.s_off

Equations

\(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off

Compact persisted state used by VIN training and offline diagnostics.

The persisted offline sample state is not the full raw snippet. It is the compact immutable training and diagnostic payload: VinSnippetView, candidate poses/cameras/counts, labels and oracle metrics, optional candidate depths, compact OBB fields, trajectory metadata, and selected EVL numeric tensors needed to reproduce scoring diagnostics.

Links and references

historic state vin-nbv oracle RRI

Docs

Finite-Candidate Rollout And Q_H Contract VinModelV3

Minimal Counterfactual Actor State

CF0 state core planning.state

Aliases minimal counterfactual state counterfactual actor state CF0 rollout state

Symbols

\(s_t^{\mathrm{cf0}}\)rl.s_cf0

Equations

\(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0

Main Q_H actor state for mesh-supervised counterfactual rollouts.

The minimal counterfactual actor state is the default input to target-conditioned Q_H. It contains the accumulated counterfactual point proxy as broad scene state, optional lifted image-foundation point features, local root EVL evidence for target support and local reads, selected-action history, observed or predicted target descriptor, budget state, finite candidate table, validity masks, reason codes, and current-state candidate-query features. Synthetic observations update the state only after their candidate is selected.

Links and references

CF+ state action set transition Q_H

Docs

Finite-Candidate Rollout And Q_H Contract EFM3D Scene Embeddings Master Thesis Research Questions

Geometry-Rich Counterfactual State

CF+ state core planning.state

Aliases geometry-rich counterfactual state counterfactual geometry state CF+ rollout state

Symbols

\(s_t^{\mathrm{cf+}}\)rl.s_cf_geom

Equations

\(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom

Counterfactual ablation state with selected synthetic geometry observations.

The geometry-rich counterfactual state adds only selected prior synthetic observations to the minimal state. It may include rendered depth, depth-valid masks, backprojected points, derived normals, and local support summaries for views that have already been selected. It does not include oracle renders for unselected candidates.

Links and references

CF0 state transition oracle state

Docs

Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions

Oracle Rollout State

oracle state core planning.state

Aliases oracle state privileged rollout state GT rollout state

Symbols

\(s_t^{\mathrm{oracle}}\)rl.s_oracle

Equations

\(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle

Privileged rollout state for labels, upper bounds, and evaluation.

The oracle rollout state may contain GT mesh geometry, GT target crops, GT OBBs, all-candidate synthetic depth and point clouds, derived normals, mesh-face visibility, Chamfer/RRI terms, and oracle scores. These fields support label generation and diagnostics but are not actor-visible inputs for the main scorer or Q_H model.

Links and references

GT-EVAL oracle RRI target RRI CF0 state

Docs

Finite-Candidate Rollout And Q_H Contract Relative Reconstruction Improvement (RRI) Theory

References

Master Thesis Research Questions Finite-Candidate Rollout And Q_H Contract CandidateViewGenerator

Finite Candidate Action Set

action set core planning.mdp

Aliases candidate action set masked candidate set finite NBV action space

Symbols

\(\mathcal{A}(s_t)\)rl.action_set \(q_{t,i}\)oracle.candidate_qti \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask

Equations

\(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set

Masked finite action-index set over sampled candidate views.

At each rollout step, ARIA-NBV samples a finite candidate table Q_t={q_{t,i}}. The admissible action set contains indices i whose validity mask m_{t,i} is true, and selecting a_t chooses pose q_t=q_{t,a_t}. This keeps planning bounded and preserves invalidity as a feasibility constraint rather than a low-quality RRI label.

Masked finite action-index set

\[ \mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t} \]

Links and references

candidate mask state

Docs

Counterfactual Transition

transition core planning.mdp

Aliases rollout transition candidate fusion update counterfactual update

Symbols

\(T\)rl.transition \(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q

Equations

\(\mathcal{P}_{t+1}=\mathcal{P}_t\cup\mathcal{P}_{q_t}\)rl.counterfactual_transition

Replayable state update after selecting a candidate index.

For the thesis-core ASE mesh/oracle loop, the transition uses the selected candidate index to render or retrieve that candidate’s depth, backproject points, and update the counterfactual point state and selected-view history. All-candidate GT renders and scores remain oracle-only before selection. The update must be deterministic under the stored seed and lineage.

Point-state transition

\[ \mathcal{P}_{t+1}=\mathcal{P}_t\cup\mathcal{P}_{q_t} \]

Links and references

state PC oracle RRI

Docs

Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions CounterfactualPoseGenerator

Target-RRI Reward

reward core metrics.reconstruction_quality

Aliases target reward quality reward target-specific RRI reward

Symbols

\(r_t^e\)rl.reward_target \(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{P}\)oracle.points \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Equations

\(r_t^e=\mathrm{RRI}_e(q_t\mid \mathcal{P}_t,\mathcal{M}_e^{\mathrm{GT}})\)rl.target_rri_reward

Quality-only immediate reward equal to root-normalized target gain for the selected candidate.

The main thesis reward is cumulative root-normalized target gain under equal acquisition budget. State-relative target RRI remains a one-step diagnostic and VIN-compatible label; log-improvement variants remain visible follow-up reward ablations, not the default target for the first Q_H result.

Target-RRI reward

\[ r_t^e=\mathrm{RRI}_e(q_t\mid \mathcal{P}_t,\mathcal{M}_e^{\mathrm{GT}}) \]

Log-improvement follow-up

\[ r_t^{\log,e}=\log(D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon)-\log(D(\mathcal{P}_{t+1}^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon) \]

Links and references

target RRI return cost

Docs

Master Thesis Research Questions Finite-Candidate Rollout And Q_H Contract Master Thesis Roadmap

References

Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions

Finite-Horizon Return

return core planning.objective

Aliases H-step return bounded return cumulative target root gain

Symbols

\(G_t^{(H)}\)rl.return_h \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H

Equations

\(G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k}^e\)rl.finite_horizon_return

H-step discounted return over root-normalized target-gain rewards.

The return definition keeps gamma symbolic so discounted ablations remain possible. The first thesis result should report cumulative root-normalized target gain under an equal acquisition budget and treat log-improvement or scalarized rewards as follow-up analysis.

Finite-horizon return

\[ G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k}^e \]

Links and references

reward Q_H

Docs

Finite-Horizon Q Function

Q_H core model.value

Aliases Q_H candidate-query Q_H bounded Q function finite-candidate Q fitted Double-Q head

Symbols

\(Q_H\)rl.qh \(G_t^{(H)}\)rl.return_h \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(a\)rl.a

Equations

\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h \(y_t^Q=r_t+\gamma V(s_{t+1})\)rl.q_backup

Finite-horizon candidate-value function for target-conditioned ARIA-NBV.

The mandatory M5 learned policy-like result is Q_H over finite candidate sets. The first-path architecture uses candidate-to-state query attention: encode s_t^{cf0}, actor-visible target descriptor z_e, selected-view history, budget state, scene-memory summaries, and candidate tokens, then emit one continuous return value per candidate. DQN contributes replayed transition learning and Bellman-style finite-action value targets; Double DQN contributes the masked online-selector / target-evaluator backup to reduce max-over-candidate overestimation; IQL contributes the offline support rule that value learning must not query invalid, ungenerated, or unavailable actions. Q_H must respect validity masks and beat one-step greedy or model scoring on cumulative root-normalized target gain under equal acquisition budget, with bounded oracle lookahead as an upper bound.

Finite-horizon candidate value

\[ Q_H(s_t^{\mathrm{cf0}},a_t,z_e)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t,z_e\right] \]

Masked Double-DQN selector

\[ j^*=\arg\max_{j:m_{t+1,j}=1}Q_\theta(s_{t+1}^{\mathrm{cf0}},a_{t+1,j},z_e) \]

Masked Double-DQN target

\[ y_t=r_t^e+\gamma(1-d_t)Q_{\bar\theta}(s_{t+1}^{\mathrm{cf0}},a_{t+1,j^*},z_e) \]

Links and references

return CF0 state PRED-Q mask

Docs

Master Thesis Research Questions Master Thesis Roadmap Finite-Candidate Rollout And Q_H Contract RL Sources For Rollout And Q_H

References

@DBLP:journals/corr/MnihKSGAWR13 @DoubleDQN-vanHasselt2015 @IQL-kostrikov2021

Validity Mask

mask core planning.constraints

Aliases candidate validity mask action mask invalid action mask

Symbols

\(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(m\)vin.cand_valid

Equations

\(m_i=\mathbb{1}[\mathrm{finite}]\mathbb{1}[v_i>0]\mathbb{1}[v_i^{\mathrm{sem}}>0]\)metrics.candidate_validity \(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set

Hard mask that separates feasible candidate actions from invalid candidates.

The mask m_{t,i} gates candidate actions, while invalid reason codes rho_{t,i} preserve why a candidate was rejected. Collision, outside-bounds poses, no target visibility, bad frusta, no depth hits, and outside-EVL-extent cases are constraints rather than low target-RRI examples.

Links and references

action set Q_H reward

Docs

Master Thesis Research Questions CandidateSamplingResult

Project Aria

core dataset.project_aria

Aliases Aria Project Aria ecosystem

Egocentric research-device and tooling ecosystem for calibrated, time-aligned multimodal sensing.

ARIA-NBV treats Project Aria and its MPS-style products as the actor-visible sensing contract, while ASE meshes remain offline supervision and evaluation assets.

Links and references

MPS ASE VIO

Docs

Project Aria Aria Synthetic Environments (ASE) Dataset

References

@projectaria-engel2023

Dataset

Aria Digital Twin

ADT background dataset.aria

Aliases ADT

Project Aria dataset with real-world captures and digital-twin scene annotations.

ADT is adjacent to ARIA-NBV’s sim-to-real context; the current experiments focus on ASE and EFM3D/ATEK exports, while ADT remains a relevant transfer surface.

Links and references

ASE AEO

Docs

Aria Synthetic Environments (ASE) Dataset

References

Aria Synthetic Environments (ASE) Dataset

Aria Everyday Objects

AEO background dataset.aria

Aliases AEO

Small-scale real-world Project Aria object dataset used by EFM3D for egocentric 3D perception evaluation.

AEO is relevant as a possible sim-to-real check for ARIA-NBV ideas that are first developed on ASE mesh-supervised snippets.

Links and references

ASE ADT

Docs

References

Setup Instructions Aria Synthetic Environments (ASE) Dataset

Virtual Reality Standard

VRS background dataset.file_format

Aliases VRS file Project Aria VRS

File format used by Project Aria tooling to store multi-modal sensor recordings efficiently.

VRS is part of the upstream Project Aria ecosystem, while ARIA-NBV primarily consumes ASE/ATEK-derived tensorized snippets for current experiments.

Links and references

snippet MFCD MTD

Docs

References

Aria Synthetic Environments (ASE) Dataset

Central Pupil Frame

CPF background dataset.frames

Aliases central pupil coordinate frame

Coordinate frame placed at the midpoint between the left and right eye boxes of Project Aria glasses.

CPF is used by Project Aria for gaze-related quantities and should remain distinct from rig, camera, world, and PyTorch3D frames in ARIA-NBV docs.

Links and references

LUF VIO

Docs

References

Aria Synthetic Environments (ASE) Dataset Semi-Dense Point Clouds

Machine Perception Services

MPS background dataset.project_aria

Aliases Project Aria MPS

Project Aria processing services that derive pose, mapping, gaze, hand, and related perception outputs from sensor recordings.

ARIA-NBV mainly depends on MPS-style pose and semi-dense mapping products as the current reconstruction state for RRI computation.

Links and references

SLAM PC VIO

Docs

References

Aria Synthetic Environments (ASE) Dataset efm_dataset VinModelV3

Snippet

snippet support dataset.sample

Aliases temporal window ASE snippet

Short synchronized temporal window of Aria sensor data used as one EVL/VIN input sample.

A snippet typically contains RGB or grayscale streams, poses, calibration, semi-dense points, and scene metadata that EVL lifts into a voxel grid.

Links and references

ASE EVL VIN

Docs

References

SceneScript Aria Synthetic Environments (ASE) Dataset

SceneScript Language

SSL background dataset.scene_representation

Aliases Structure Scene Language SceneScript structured language

Structured language representation for indoor scene layout using primitives such as walls, doors, windows, and objects.

SceneScript is relevant to ARIA-NBV as a possible semantic/global planning layer and as one source of ASE scene-structure context.

Links and references

ASE target NBV

Docs

References

@SceneScript-avetisyan2024

Motion Trajectory Data

MTD background dataset.stream

Aliases trajectory data pose stream

Device poses over time, usually represented as a sequence of 6-DoF transformations.

MTD supplies the logged egocentric trajectory used to define snippet state, current reconstruction context, and candidate-view reference poses.

Links and references

snippet VIO candidate

Docs

Aria Synthetic Environments (ASE) Dataset efm_views

References

Aria Synthetic Environments (ASE) Dataset EFM3D and EVL

Multi-Frame Camera Data

MFCD background dataset.stream

Aliases multi-camera frame data synchronized camera data

Synchronized camera streams from multiple Project Aria cameras over a temporal window.

MFCD provides the egocentric image evidence consumed by EVL and aligned with pose, calibration, and semi-dense point streams.

Links and references

snippet EVL VIO

Docs

References

Semi-Dense Point Clouds Relative Reconstruction Improvement (RRI) Theory

Multi-Semi-Dense Point Data

MSDPD background dataset.stream

Aliases multi-frame semi-dense points semi-dense point stream

Semi-dense 3D point observations generated by SLAM-style processing across a snippet or trajectory window.

MSDPD is the sparse observed geometry that ARIA-NBV fuses with candidate point clouds and compares against ground-truth meshes for RRI labels.

Links and references

PC SLAM RRI

Docs

References

Aria Synthetic Environments (ASE) Dataset Setup Instructions

Aria Synthetic Environments

ASE support dataset.synthetic

Aliases Aria Synthetic Environment ASE dataset

Large-scale synthetic indoor dataset with simulated Project Aria sensor characteristics, egocentric trajectories, and scene annotations.

ARIA-NBV uses ASE snippets and the public mesh-supervised subset to generate oracle RRI labels from ground-truth meshes and semi-dense point clouds.

Links and references

snippet EFM3D oracle RRI

Docs

References

Aria Synthetic Environments (ASE) Dataset Master Thesis Research Questions types

Geometry

Oriented Bounding Box

OBB support geometry.annotation

Aliases oriented box object box

3D bounding box with arbitrary orientation, used to represent object extent more tightly than an axis-aligned box.

OBBs are a natural target encoding for entity-aware ARIA-NBV because they provide center, extent, orientation, semantic class, and confidence signals.

Links and references

target target-conditioned scorer EVL

Docs

References

Relative Reconstruction Improvement (RRI) Theory VinModelV3

Frustum

frustum support geometry.camera

Aliases viewing frustum camera frustum

Truncated pyramidal camera-visible volume bounded by near and far clipping planes plus lateral field-of-view planes.

Candidate-view frusta define which scene surfaces can project into a camera image and are therefore central to visibility, rendering, and RRI diagnostics.

Links and references

candidate EVL PC

Docs

References

@Frustum-Wikipedia-2025

Left-Up-Forward

LUF support geometry.frames

Aliases LUF frame left up forward

Camera coordinate convention whose x axis points left, y axis points up, and z axis points forward.

LUF is one of the coordinate-frame conventions that must stay explicit when moving between Project Aria cameras, PyTorch3D cameras, and ARIA-NBV candidate poses.

Links and references

candidate frustum

Docs

05 Coordinate Conventions 12F Appendix Pose Frames

Degrees of Freedom

DoF background geometry.pose

Aliases DoF

Number of independent pose parameters available to a camera, object, or action representation.

ARIA-NBV distinguishes full 6-DoF pose estimation from reduced 5-DoF candidate or action spaces used for practical view planning.

Links and references

candidate

Docs

05 Coordinate Conventions 12F Appendix Pose Frames

Six Degrees of Freedom

6DoF background geometry.pose

Aliases 6-DoF 6 degrees of freedom

Pose parameterization with three translational and three rotational degrees of freedom.

Project Aria poses, candidate cameras, and world-to-device transforms are usually represented as 6-DoF rigid-body transforms.

Links and references

candidate DoF

Docs

05 Coordinate Conventions 12F Appendix Pose Frames

Metrics

Coverage Ratio

CR support metrics.coverage

Aliases coverage ratio

Fraction of a target surface, scene, or region treated as observed under a chosen visibility or distance threshold.

Coverage ratio is a useful diagnostic baseline for NBV, but ARIA-NBV treats reconstruction-quality improvement through RRI as the preferred optimization target.

Links and references

RRI candidate

Docs

NBV Background Relative Reconstruction Improvement (RRI) Theory

References

Aria Synthetic Environments (ASE) Dataset Relative Reconstruction Improvement (RRI) Theory

Area Under Curve

AUC support metrics.evaluation

Aliases AUC

Aggregate score computed by integrating a metric curve over an acquisition, threshold, or ranking axis.

AUC can summarize how quickly reconstruction quality, coverage, or ranking quality improves as views are acquired or candidate thresholds change.

Links and references

cost RRI

Docs

NBV Background

Ground Truth

GT support metrics.reference

Aliases reference annotation oracle reference

Reference data or annotations treated as the trusted target for training, validation, or evaluation.

In ARIA-NBV, ground truth usually refers to ASE meshes, object annotations, poses, or labels used to compute oracle supervision and diagnostic metrics.

Links and references

oracle RRI RRI OBB

Docs

References

Model

Egocentric Foundation Model 3D

EFM3D support model.backbone

Aliases EFM egocentric foundation model

Egocentric 3D foundation-model stack used as the frozen spatial backbone for ARIA-NBV candidate scoring.

The current project uses EFM3D and its EVL architecture to expose voxel occupancy, centerness, semantic, and OBB evidence for VIN-style RRI prediction.

Links and references

EVL VIN ASE

Docs

EFM3D and EVL VinModelV3 EvlBackbone

References

EFM3D and EVL EFM3D Scene Embeddings EvlBackbone VinModelV3

Egocentric Voxel Lifting

EVL support model.backbone

Aliases voxel lifting EVL backbone

EFM3D architecture that lifts synchronized egocentric observations into a gravity-aligned 3D voxel feature volume.

ARIA-NBV uses EVL head outputs, pre-head features, and OBB predictions as local actor-visible evidence and target support for VIN-style RRI prediction. Broader NBV scene embeddings may combine semidense or fused point state with lifted image-foundation features.

Links and references

EFM3D VIN OBB

Docs

References

@EFM3D-straub2024 @EVL-Doc-2025

View Introspection Network

VIN support model.scoring

Aliases VIN scorer VIN-NBV model Aria-VIN-NBV

Learned candidate-view scorer that predicts RRI or an ordinal RRI-derived utility without capturing the candidate view.

ARIA-NBV adapts VIN-NBV by placing a lightweight RRI prediction head on top of frozen EVL features and candidate-pose evidence from ASE snippets.

Links and references

RRI target-conditioned scorer EVL

Docs

VIN-NBV model_v3 VinModelV3

References

Master Thesis Roadmap 10A Extensions

Planning

Five Degrees of Freedom

5DoF background planning.action_space

Aliases 5-DoF 5 degrees of freedom

Reduced camera-action parameterization commonly used when roll is fixed or otherwise constrained.

ARIA-NBV uses 5-DoF language for candidate and planning abstractions where position and viewing direction matter while roll is not an independently optimized control dimension.

Links and references

candidate NBV DoF

Docs

References

@GenNBV-chen2024

Reconstruction

Track

track background reconstruction.features

Aliases feature track 2D track SLAM track

Temporal sequence of corresponding image-feature detections across frames, usually carrying per-frame image coordinates, timestamps, and camera IDs.

Tracks can be triangulated or optimized into 3D points, making them a bridge between image evidence and the semi-dense point clouds used by ARIA-NBV.

\[ \mathcal{T}=\{(u_k,v_k,\mathrm{cam}_k,t_k)\}_{k=0}^{N} \]

Links and references

PC SLAM VIO

Docs

Semi-Dense Point Clouds

Multi-view Stereo

MVS background reconstruction.geometry

Aliases multi-view reconstruction

Reconstruction family that estimates dense or semi-dense scene geometry from multiple calibrated views.

MVS is part of the broader reconstruction context for NBV, although ARIA-NBV’s current oracle labels are built from ASE meshes and Aria-style semi-dense point observations.

Links and references

PC NBV

Docs

NBV Background

Visual-Inertial Odometry

VIO background reconstruction.pose

Aliases visual inertial odometry inertial visual odometry

Pose-estimation method that combines visual measurements from cameras with inertial measurements from IMUs.

Project Aria pose and mapping products build on visual-inertial estimation, and ARIA-NBV depends on those pose streams for snippet state and candidate frame contracts.

Links and references

SLAM MTD

Docs

Aria Synthetic Environments (ASE) Dataset Semi-Dense Point Clouds

References

Semi-Dense Point Clouds Aria Synthetic Environments (ASE) Dataset

Simultaneous Localization and Mapping

SLAM support reconstruction.state

Aliases visual SLAM mapping and localization

Method for estimating sensor motion while building a map of the surrounding scene.

In ARIA-NBV, logged SLAM poses and semi-dense points form the current reconstruction state against which candidate-view RRI is evaluated.

Links and references

PC snippet RRI

Docs

References

Active 3DGS and Targeted NBV

Representation

3D Gaussian Splatting

3DGS background representation.radiance_field

Aliases 3D Gaussian Splatting 3DGS Gaussian splatting

Explicit radiance-field representation using optimized 3D Gaussian primitives for real-time novel-view synthesis.

ARIA-NBV treats active 3DGS work as proposal, uncertainty, and simulator-bridge literature, not as a replacement for ASE mesh-supervised RRI.

Links and references

NBV RRI

Docs

References

@GaussianSplatting-kerbl2023

Occupancy Grid

support representation.spatial

Aliases occupancy volume

Spatial grid whose cells encode whether space is occupied, free, unknown, or represented by a related occupancy probability.

Occupancy-style voxel evidence appears in EVL outputs and VIN feature construction, where it helps summarize local 3D scene state for candidate scoring.

Links and references

EVL VIN

Docs

VinModelV3 06 Architecture

References