ARIA-NBV
  • Setup
  • Thesis State
    • Roadmap
    • Research Questions
    • M1 Contract Report
  • Aria NBV API
  • Resources
    • Resources (External Links, Repos, Docs, etc.)
    • ASE Dataset
    • Glossary
    • Archive
    • View Source
    • Open Issue
  1. Resources
  2. Glossary
  • Home
  • Literature
    • Literature Review
    • NBV Methods
      • VIN-NBV
      • PB-NBV
      • Hestia
      • GenNBV
      • SCONE And FisherRF
      • Active 3DGS and Targeted NBV
    • ARIA Ecosystem
      • Project Aria
      • EFM3D and EVL
      • SceneScript
    • Rollout, Value, And RL
      • RL Sources For Rollout And Q_H
  • Thesis State
    • Master Thesis Research Questions
    • Master Thesis Roadmap
    • M1 Contract Report
  • Resources
    • Setup Instructions
    • Resources & Tools
    • Aria Synthetic Environments (ASE) Dataset
    • Glossary
    • Archive
  • Theory
    • NBV Background
    • Relative Reconstruction Improvement (RRI) Theory
    • Finite-Candidate Rollout And Q_H Contract
    • Candidate Sampling And Target Selection
    • Candidate-View Dependence
    • EFM3D Scene Embeddings
    • Surface Reconstruction Metrics
    • Semi-Dense Point Clouds
  • Architecture Diagrams
    • VIN NBV Diagrams
  • Presentations
    • Project Slides 1
    • Project Slides 2
    • Project Slides 4
  1. Resources
  2. Glossary

Glossary

56 terms across 10 categories. The canonical source is docs/typst/shared/glossary.typ; generated Quarto, Typst, YAML, KG, and shortcode artifacts share the same definitions.

Tier counts: 27 core, 13 support, and 16 background terms. Core terms are the thesis math lookup; background terms remain linkable but are visually demoted.

Use {{< gls term-id >}} for the linked short label and {{< glsfull term-id >}} for the linked full label in QMD pages.

Core 27 Math lookup Dataset 11 Geometry 5 Metrics 3 Model 3 Planning 1 Reconstruction 4 Representation 2

Core Math Lookup

Concept Meaning Symbols Equations
NBV - Next-Best-View Problem of selecting the next sensor viewpoint to improve an active reconstruction or inspection objective under a limited acquisition budget.
\(\mathcal{M}_{\mathrm{NBV}}\)rl.mdp_nbv
\(\mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H)\)rl.nbv_mdp
RRI - Relative Reconstruction Improvement Metric quantifying the relative reconstruction-quality improvement obtained by adding a candidate observation to the current reconstruction.
\(\mathrm{RRI}\)oracle.rri
\(\mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon}\)rri.rri
CD - Chamfer Distance Historical bidirectional distance family used to compare reconstructed points against reference geometry.
\(D\)rri.cd_value \(\mathcal{P}\)oracle.points \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh
\(D(\mathcal{P},\mathcal{M}^{\mathrm{GT}})=D_{P\to M}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})+D_{M\to P}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})\)rri.cd
target RRI - Target-Specific RRI RRI computed only on the ground-truth and reconstructed geometry associated with a selected target of interest.
\(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
\(\mathrm{RRI}_e(q)=\frac{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})-D(\mathcal{P}_t^e\cup\mathcal{P}_q^e,\mathcal{M}_e^{\mathrm{GT}})}{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon}\)rri.target_rri
target - Target of Interest Selected entity, object crop, point, region, or surface-deficit hypothesis whose reconstruction quality should be improved.
\(e_t\)rl.target \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
-
PC - Point Cloud Set of 3D points representing observed scene geometry.
\(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q
-
candidate - Candidate View Proposed camera pose whose expected reconstruction utility is evaluated before selecting the next observation.
\(\mathcal{Q}_t\)oracle.candidates_t \(q_{t,i}\)oracle.candidate_qti
-
oracle RRI - Oracle RRI RRI label computed with privileged ground-truth geometry, used for supervised training and evaluation.
\(\mathrm{RRI}\)oracle.rri \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh
\(\mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon}\)rri.rri
target-conditioned scorer - Target-Conditioned Scorer VIN-style candidate scorer that receives scene state, a candidate view, and an encoding of the target of interest.
\(\hat{r}\)vin.rri_hat
\(\rho=\operatorname{corr}(\operatorname{rank}(\hat{r}_i),\operatorname{rank}(r_i))\)metrics.spearman \(\mathrm{TopKAcc}(k)=\frac{1}{N}\sum_i\mathbb{1}[y_i\in\mathrm{TopK}(\boldsymbol{\pi}_i,k)]\)metrics.topk_acc
OBS-SEL - Observed Target Selection Main thesis protocol component requiring target selection to use only actor-visible observed or predicted target evidence. - -
PRED-Q - Predicted-Target Q Main thesis protocol component requiring scorer or Q_H inputs to use predicted or observed target descriptors.
\(Q_H\)rl.qh
\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h
GT-EVAL - Ground-Truth Target Evaluation Main thesis protocol component using ground-truth OBBs and target mesh crops only for labels and evaluation.
\(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
-
cost - Acquisition Cost Budget consumed to acquire observations, measured by view count, path length, elapsed time, invalid-action rate, or a weighted combination.
\(C(\tau)\)rl.acquisition_cost
-
NBV MDP - Target-Conditioned NBV MDP Finite-horizon MDP contract for target-conditioned ARIA-NBV rollouts and fitted Q_H training.
\(\mathcal{M}_{\mathrm{NBV}}\)rl.mdp_nbv \(s\)rl.s \(\mathcal{A}(s_t)\)rl.action_set \(T\)rl.transition \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H
\(\mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H)\)rl.nbv_mdp
state - Rollout State Rollout state family separating actor-visible state from oracle-only supervision.
\(s\)rl.s \(s_t^{\mathrm{hist}}\)rl.s_hist \(s_t^{\mathrm{off}}\)rl.s_off \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(s_t^{\mathrm{cf+}}\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}\)rl.s_oracle \(\mathcal{P}\)oracle.points \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(e_t\)rl.target \(b_t\)rl.budget
\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist \(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off \(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0 \(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle
historic state - Historic Snippet State Raw actor-visible state from the logged ASE/Project Aria snippet trajectory.
\(s_t^{\mathrm{hist}}\)rl.s_hist
\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist
offline state - Persisted Offline Sample State Compact persisted state used by VIN training and offline diagnostics.
\(s_t^{\mathrm{off}}\)rl.s_off
\(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off
CF0 state - Minimal Counterfactual Actor State Main Q_H actor state for mesh-supervised counterfactual rollouts.
\(s_t^{\mathrm{cf0}}\)rl.s_cf0
\(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0
CF+ state - Geometry-Rich Counterfactual State Counterfactual ablation state with selected synthetic geometry observations.
\(s_t^{\mathrm{cf+}}\)rl.s_cf_geom
\(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom
oracle state - Oracle Rollout State Privileged rollout state for labels, upper bounds, and evaluation.
\(s_t^{\mathrm{oracle}}\)rl.s_oracle
\(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle
action set - Finite Candidate Action Set Masked finite action-index set over sampled candidate views.
\(\mathcal{A}(s_t)\)rl.action_set \(q_{t,i}\)oracle.candidate_qti \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask
\(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set
transition - Counterfactual Transition Replayable state update after selecting a candidate index.
\(T\)rl.transition \(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q
\(\mathcal{P}_{t+1}=\mathcal{P}_t\cup\mathcal{P}_{q_t}\)rl.counterfactual_transition
reward - Target-RRI Reward Quality-only immediate reward equal to root-normalized target gain for the selected candidate.
\(r_t^e\)rl.reward_target \(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{P}\)oracle.points \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
\(r_t^e=\mathrm{RRI}_e(q_t\mid \mathcal{P}_t,\mathcal{M}_e^{\mathrm{GT}})\)rl.target_rri_reward
return - Finite-Horizon Return H-step discounted return over root-normalized target-gain rewards.
\(G_t^{(H)}\)rl.return_h \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H
\(G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k}^e\)rl.finite_horizon_return
Q_H - Finite-Horizon Q Function Finite-horizon candidate-value function for target-conditioned ARIA-NBV.
\(Q_H\)rl.qh \(G_t^{(H)}\)rl.return_h \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(a\)rl.a
\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h \(y_t^Q=r_t+\gamma V(s_{t+1})\)rl.q_backup
mask - Validity Mask Hard mask that separates feasible candidate actions from invalid candidates.
\(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(m\)vin.cand_valid
\(m_i=\mathbb{1}[\mathrm{finite}]\mathbb{1}[v_i>0]\mathbb{1}[v_i^{\mathrm{sem}}>0]\)metrics.candidate_validity \(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set
Project Aria Egocentric research-device and tooling ecosystem for calibrated, time-aligned multimodal sensing. - -

Core Concepts

Next-Best-View

NBV core planning.view_selection

Aliases next best view viewpoint selection view planning

Symbols
\(\mathcal{M}_{\mathrm{NBV}}\)rl.mdp_nbv
Equations
\(\mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H)\)rl.nbv_mdp

Problem of selecting the next sensor viewpoint to improve an active reconstruction or inspection objective under a limited acquisition budget.

In ARIA-NBV, the preferred objective is reconstruction quality improvement rather than coverage alone, with candidate views ranked by oracle or learned RRI scores.

Links and references
Related
candidate RRI VIN
Docs
NBV Background Master Thesis Roadmap Master Thesis Research Questions
References
@VIN-NBV-frahm2025 @GenNBV-chen2024
Relative Reconstruction Improvement

RRI core metrics.reconstruction_quality

Aliases reconstruction-quality gain relative quality improvement

Symbols
\(\mathrm{RRI}\)oracle.rri
Equations
\(\mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon}\)rri.rri

Metric quantifying the relative reconstruction-quality improvement obtained by adding a candidate observation to the current reconstruction.

ARIA-NBV computes RRI by comparing reconstruction error before and after fusing candidate-view geometry, usually against an ASE ground-truth mesh for oracle supervision and evaluation.

\[ \mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon} \]

Links and references
Related
oracle RRI target RRI candidate
Docs
Relative Reconstruction Improvement (RRI) Theory oracle_rri VIN-NBV
References
@VIN-NBV-frahm2025
Chamfer Distance

CD core metrics.reconstruction_quality

Aliases Chamfer metric

Symbols
\(D\)rri.cd_value \(\mathcal{P}\)oracle.points \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh
Equations
\(D(\mathcal{P},\mathcal{M}^{\mathrm{GT}})=D_{P\to M}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})+D_{M\to P}(\mathcal{P},\mathcal{M}^{\mathrm{GT}})\)rri.cd

Historical bidirectional distance family used to compare reconstructed points against reference geometry.

Thesis-facing ARIA-NBV notation uses point-mesh error D with directional components D_{P->M} and D_{M->P}; older seminar material may still call this CD.

Links and references
Related
RRI PC GT
Docs
metrics Relative Reconstruction Improvement (RRI) Theory
References
@VIN-NBV-frahm2025
Target-Specific RRI

target RRI core metrics.reconstruction_quality

Aliases entity RRI object RRI target RRI

Symbols
\(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
Equations
\(\mathrm{RRI}_e(q)=\frac{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})-D(\mathcal{P}_t^e\cup\mathcal{P}_q^e,\mathcal{M}_e^{\mathrm{GT}})}{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon}\)rri.target_rri

RRI computed only on the ground-truth and reconstructed geometry associated with a selected target of interest.

Target-specific RRI lets the thesis compare views by how much they improve a selected object or region, even when scene-level RRI would prefer large background surfaces.

\[ \mathrm{RRI}_e(q)=\frac{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})-D(\mathcal{P}_t^e\cup\mathcal{P}_q^e,\mathcal{M}_e^{\mathrm{GT}})}{D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon} \]

Links and references
Related
RRI target cost
Docs
Master Thesis Research Questions Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory
References
@VIN-NBV-frahm2025
Target of Interest

target core entity.targeting

Aliases target entity object of interest inspection target task target

Symbols
\(e_t\)rl.target \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Selected entity, object crop, point, region, or surface-deficit hypothesis whose reconstruction quality should be improved.

Target-conditioned ARIA-NBV variants use this target as an explicit input to candidate scoring and planning instead of optimizing only scene-level RRI.

Links and references
Related
target RRI target-conditioned scorer OBB
Docs
Master Thesis Research Questions Master Thesis Research Questions types
Point Cloud

PC core reconstruction.representation

Aliases 3D points semi-dense point cloud

Symbols
\(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q

Set of 3D points representing observed scene geometry.

ARIA-NBV compares current and candidate-fused point clouds against ground-truth meshes to compute oracle RRI and surface-distance diagnostics.

Links and references
Related
RRI oracle RRI SLAM
Docs
Semi-Dense Point Clouds Surface Reconstruction Metrics Relative Reconstruction Improvement (RRI) Theory
Candidate View

candidate core planning.action

Aliases candidate pose candidate viewpoint next-view candidate

Symbols
\(\mathcal{Q}_t\)oracle.candidates_t \(q_{t,i}\)oracle.candidate_qti

Proposed camera pose whose expected reconstruction utility is evaluated before selecting the next observation.

ARIA-NBV samples candidate views around a reference pose or target, renders candidate depths from the mesh for oracle labels, and scores candidates with RRI or learned VIN predictions.

Links and references
Related
NBV oracle RRI target-conditioned scorer
Docs
Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory CandidateViewGenerator
References
@VIN-NBV-frahm2025
Oracle RRI

oracle RRI core supervision.oracle

Aliases RRI oracle oracle label mesh-supervised RRI

Symbols
\(\mathrm{RRI}\)oracle.rri \(\mathcal{M}^{\mathrm{GT}}\)ase.mesh
Equations
\(\mathrm{RRI}(q)=\frac{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})-D(\mathcal{P}_t\cup\mathcal{P}_q,\mathcal{M}^{\mathrm{GT}})}{D(\mathcal{P}_t,\mathcal{M}^{\mathrm{GT}})+\varepsilon}\)rri.rri

RRI label computed with privileged ground-truth geometry, used for supervised training and evaluation.

The current oracle renders candidate depth maps from ASE ground-truth meshes, backprojects candidate point clouds, fuses them with the current semi-dense reconstruction, and scores the resulting surface-distance improvement.

Links and references
Related
RRI candidate ASE
Docs
Relative Reconstruction Improvement (RRI) Theory Aria Synthetic Environments (ASE) Dataset Master Thesis Roadmap
References
@VIN-NBV-frahm2025 @EFM3D-straub2024
Target-Conditioned Scorer

target-conditioned scorer core model.scoring

Aliases target-conditioned VIN target-aware scorer

Symbols
\(\hat{r}\)vin.rri_hat
Equations
\(\rho=\operatorname{corr}(\operatorname{rank}(\hat{r}_i),\operatorname{rank}(r_i))\)metrics.spearman \(\mathrm{TopKAcc}(k)=\frac{1}{N}\sum_i\mathbb{1}[y_i\in\mathrm{TopK}(\boldsymbol{\pi}_i,k)]\)metrics.topk_acc

VIN-style candidate scorer that receives scene state, a candidate view, and an encoding of the target of interest.

The scorer predicts target-specific utility so view ranking can prioritize a selected entity or region instead of only optimizing scene-level RRI.

Links and references
Related
target target RRI VIN Q_H
Docs
Master Thesis Research Questions model_v3 VinModelV3
References
@VIN-NBV-frahm2025
Observed Target Selection

OBS-SEL core protocol.targeting

Aliases observed-only target selection actor-visible target selection

Main thesis protocol component requiring target selection to use only actor-visible observed or predicted target evidence.

Observed Target Selection uses predicted or tracked OBBs, class probabilities, confidence, projected area, and semidense or EVL support. Ground-truth target annotations are not visible to the selector in the main thesis protocol.

Links and references
Related
target PRED-Q GT-EVAL OBB
Docs
Master Thesis Research Questions Master Thesis Research Questions
References
@EFM3D-straub2024 @ProjectAria-ASE-2025
Predicted-Target Q

PRED-Q core protocol.learning

Aliases predicted-target scorer predicted-target Q_H actor-visible Q

Symbols
\(Q_H\)rl.qh
Equations
\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h

Main thesis protocol component requiring scorer or Q_H inputs to use predicted or observed target descriptors.

Predicted-Target Q covers target-conditioned one-step scoring and finite-candidate Q_H selection whose target inputs are actor-visible predicted or observed descriptors, not ground-truth target annotations.

Links and references
Related
OBS-SEL GT-EVAL target-conditioned scorer target RRI Q_H
Docs
Master Thesis Research Questions Master Thesis Research Questions Master Thesis Research Questions
References
@VIN-NBV-frahm2025 @DoubleDQN-vanHasselt2015
Ground-Truth Target Evaluation

GT-EVAL core protocol.evaluation

Aliases GT target evaluation GT target crop evaluation oracle target evaluation

Symbols
\(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target

Main thesis protocol component using ground-truth OBBs and target mesh crops only for labels and evaluation.

Ground-Truth Target Evaluation uses GT OBBs and target mesh crops for oracle target-RRI labels, matching checks, and evaluation while keeping those annotations hidden from the actor-visible selector, scorer, and Q_H model in the main result.

Links and references
Related
GT target RRI OBS-SEL PRED-Q
Docs
Master Thesis Research Questions Master Thesis Research Questions Relative Reconstruction Improvement (RRI) Theory
References
@ProjectAria-ASE-2025
Acquisition Cost

cost core planning.objective

Aliases capture budget view budget motion cost

Symbols
\(C(\tau)\)rl.acquisition_cost

Budget consumed to acquire observations, measured by view count, path length, elapsed time, invalid-action rate, or a weighted combination.

The thesis should first report cumulative root-normalized target gain, diagnostic target RRI, and acquisition cost separately, then use scalarized objectives only when the tradeoff is explicit.

Links and references
Related
target RRI candidate NBV
Docs
Master Thesis Research Questions Master Thesis Roadmap
Target-Conditioned NBV MDP

NBV MDP core planning.mdp

Aliases ARIA-NBV MDP rollout MDP finite-candidate NBV MDP

Symbols
\(\mathcal{M}_{\mathrm{NBV}}\)rl.mdp_nbv \(s\)rl.s \(\mathcal{A}(s_t)\)rl.action_set \(T\)rl.transition \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H
Equations
\(\mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H)\)rl.nbv_mdp

Finite-horizon MDP contract for target-conditioned ARIA-NBV rollouts and fitted Q_H training.

The ARIA-NBV MDP keeps actions restricted to sampled finite candidate views and keeps GT meshes or GT target crops outside the actor-visible state. It is the contract that connects target-conditioned rollout generation, reward computation, validity masks, and fitted finite-horizon Q learning.

ARIA-NBV MDP contract

\[ \mathcal{M}_{\mathrm{NBV}}=(\mathcal{S},\mathcal{A},T,r_e,\gamma,H) \]

Links and references
Related
state action set transition reward return Q_H mask
Docs
Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions Master Thesis Roadmap
References
@VIN-NBV-frahm2025 @DoubleDQN-vanHasselt2015
Rollout State

state core planning.mdp

Aliases MDP state actor-visible state rollout observation

Symbols
\(s\)rl.s \(s_t^{\mathrm{hist}}\)rl.s_hist \(s_t^{\mathrm{off}}\)rl.s_off \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(s_t^{\mathrm{cf+}}\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}\)rl.s_oracle \(\mathcal{P}\)oracle.points \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(e_t\)rl.target \(b_t\)rl.budget
Equations
\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist \(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off \(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0 \(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom \(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle

Rollout state family separating actor-visible state from oracle-only supervision.

ARIA-NBV distinguishes raw historic snippet state, persisted VIN offline sample state, minimal counterfactual actor state, geometry-rich counterfactual ablation state, and privileged oracle rollout state. The main Q_H actor input starts from the minimal counterfactual state; all-candidate GT renders, GT mesh crops, and oracle RRI labels remain outside the actor-visible state.

Links and references
Related
historic state offline state CF0 state CF+ state oracle state action set mask OBS-SEL PRED-Q
Docs
Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions
Historic Snippet State

historic state core planning.state

Aliases raw historic state logged snippet state historic observed state

Symbols
\(s_t^{\mathrm{hist}}\)rl.s_hist
Equations
\(s_t^{\mathrm{hist}}=(I_{1:t},T_{1:t},P_{1:t}^{\mathrm{semi}},V^{\mathrm{root}},e_t,b_t)\)rl.s_hist

Raw actor-visible state from the logged ASE/Project Aria snippet trajectory.

The historic snippet state is the richest non-privileged state because it comes from the original logged trajectory. It may contain calibrated camera streams, timestamps, trajectory and gravity estimates, semidense points with support fields, frozen EVL/EFM evidence, and observed or predicted OBBs. It must not contain the GT mesh or GT OBB crops as actor inputs.

Links and references
Related
offline state EVL OBB
Docs
Finite-Candidate Rollout And Q_H Contract Project Aria EFM3D and EVL
References
@projectaria-engel2023 @EFM3D-straub2024
Persisted Offline Sample State

offline state core planning.state

Aliases offline sample state VIN offline state persisted VIN state

Symbols
\(s_t^{\mathrm{off}}\)rl.s_off
Equations
\(s_t^{\mathrm{off}}=(\mathrm{VinSnippetView},\mathcal{Q}_t,N_t,m_{t,i},\ell_{t,i})\)rl.s_off

Compact persisted state used by VIN training and offline diagnostics.

The persisted offline sample state is not the full raw snippet. It is the compact immutable training and diagnostic payload: VinSnippetView, candidate poses/cameras/counts, labels and oracle metrics, optional candidate depths, compact OBB fields, trajectory metadata, and selected EVL numeric tensors needed to reproduce scoring diagnostics.

Links and references
Related
historic state vin-nbv oracle RRI
Docs
Finite-Candidate Rollout And Q_H Contract VinModelV3
Minimal Counterfactual Actor State

CF0 state core planning.state

Aliases minimal counterfactual state counterfactual actor state CF0 rollout state

Symbols
\(s_t^{\mathrm{cf0}}\)rl.s_cf0
Equations
\(s_t^{\mathrm{cf0}}=(V^{\mathrm{root}},\mathcal{P}_t,\mathcal{Q}_t,m_{t,i},\rho_{t,i},e_t,b_t)\)rl.s_cf0

Main Q_H actor state for mesh-supervised counterfactual rollouts.

The minimal counterfactual actor state is the default input to target-conditioned Q_H. It contains the accumulated counterfactual point proxy as broad scene state, optional lifted image-foundation point features, local root EVL evidence for target support and local reads, selected-action history, observed or predicted target descriptor, budget state, finite candidate table, validity masks, reason codes, and current-state candidate-query features. Synthetic observations update the state only after their candidate is selected.

Links and references
Related
CF+ state action set transition Q_H
Docs
Finite-Candidate Rollout And Q_H Contract EFM3D Scene Embeddings Master Thesis Research Questions
Geometry-Rich Counterfactual State

CF+ state core planning.state

Aliases geometry-rich counterfactual state counterfactual geometry state CF+ rollout state

Symbols
\(s_t^{\mathrm{cf+}}\)rl.s_cf_geom
Equations
\(s_t^{\mathrm{cf+}}=(s_t^{\mathrm{cf0}},D_{1:t}^{\mathrm{sel}},P_{1:t}^{\mathrm{sel}},N_{1:t}^{\mathrm{sel}})\)rl.s_cf_geom

Counterfactual ablation state with selected synthetic geometry observations.

The geometry-rich counterfactual state adds only selected prior synthetic observations to the minimal state. It may include rendered depth, depth-valid masks, backprojected points, derived normals, and local support summaries for views that have already been selected. It does not include oracle renders for unselected candidates.

Links and references
Related
CF0 state transition oracle state
Docs
Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions
Oracle Rollout State

oracle state core planning.state

Aliases oracle state privileged rollout state GT rollout state

Symbols
\(s_t^{\mathrm{oracle}}\)rl.s_oracle
Equations
\(s_t^{\mathrm{oracle}}=(s_t^{\mathrm{cf+}},\mathcal{M}^{\mathrm{GT}},\mathcal{M}_e^{\mathrm{GT}},\{D_{t,i}^{\mathrm{GT}},\mathcal{P}_{t,i}^{\mathrm{GT}},\mathrm{RRI}_{t,i}\}_{i=1}^{N_t})\)rl.s_oracle

Privileged rollout state for labels, upper bounds, and evaluation.

The oracle rollout state may contain GT mesh geometry, GT target crops, GT OBBs, all-candidate synthetic depth and point clouds, derived normals, mesh-face visibility, Chamfer/RRI terms, and oracle scores. These fields support label generation and diagnostics but are not actor-visible inputs for the main scorer or Q_H model.

Links and references
Related
GT-EVAL oracle RRI target RRI CF0 state
Docs
Finite-Candidate Rollout And Q_H Contract Relative Reconstruction Improvement (RRI) Theory
References
@ProjectAria-ASE-2025
Finite Candidate Action Set

action set core planning.mdp

Aliases candidate action set masked candidate set finite NBV action space

Symbols
\(\mathcal{A}(s_t)\)rl.action_set \(q_{t,i}\)oracle.candidate_qti \(\mathcal{Q}_t\)oracle.candidates_t \(m_{t,i}\)rl.validity_mask
Equations
\(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set

Masked finite action-index set over sampled candidate views.

At each rollout step, ARIA-NBV samples a finite candidate table Q_t={q_{t,i}}. The admissible action set contains indices i whose validity mask m_{t,i} is true, and selecting a_t chooses pose q_t=q_{t,a_t}. This keeps planning bounded and preserves invalidity as a feasibility constraint rather than a low-quality RRI label.

Masked finite action-index set

\[ \mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t} \]

Links and references
Related
candidate mask state
Docs
Master Thesis Research Questions Finite-Candidate Rollout And Q_H Contract CandidateViewGenerator
Counterfactual Transition

transition core planning.mdp

Aliases rollout transition candidate fusion update counterfactual update

Symbols
\(T\)rl.transition \(\mathcal{P}\)oracle.points \(\mathcal{P}_q\)oracle.points_q
Equations
\(\mathcal{P}_{t+1}=\mathcal{P}_t\cup\mathcal{P}_{q_t}\)rl.counterfactual_transition

Replayable state update after selecting a candidate index.

For the thesis-core ASE mesh/oracle loop, the transition uses the selected candidate index to render or retrieve that candidate’s depth, backproject points, and update the counterfactual point state and selected-view history. All-candidate GT renders and scores remain oracle-only before selection. The update must be deterministic under the stored seed and lineage.

Point-state transition

\[ \mathcal{P}_{t+1}=\mathcal{P}_t\cup\mathcal{P}_{q_t} \]

Links and references
Related
state PC oracle RRI
Docs
Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions CounterfactualPoseGenerator
Target-RRI Reward

reward core metrics.reconstruction_quality

Aliases target reward quality reward target-specific RRI reward

Symbols
\(r_t^e\)rl.reward_target \(\mathrm{RRI}_e\)entity.rri_e \(\mathcal{P}\)oracle.points \(\mathcal{M}_e^{\mathrm{GT}}\)ase.mesh_target
Equations
\(r_t^e=\mathrm{RRI}_e(q_t\mid \mathcal{P}_t,\mathcal{M}_e^{\mathrm{GT}})\)rl.target_rri_reward

Quality-only immediate reward equal to root-normalized target gain for the selected candidate.

The main thesis reward is cumulative root-normalized target gain under equal acquisition budget. State-relative target RRI remains a one-step diagnostic and VIN-compatible label; log-improvement variants remain visible follow-up reward ablations, not the default target for the first Q_H result.

Target-RRI reward

\[ r_t^e=\mathrm{RRI}_e(q_t\mid \mathcal{P}_t,\mathcal{M}_e^{\mathrm{GT}}) \]

Log-improvement follow-up

\[ r_t^{\log,e}=\log(D(\mathcal{P}_t^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon)-\log(D(\mathcal{P}_{t+1}^e,\mathcal{M}_e^{\mathrm{GT}})+\varepsilon) \]

Links and references
Related
target RRI return cost
Docs
Master Thesis Research Questions Finite-Candidate Rollout And Q_H Contract Master Thesis Roadmap
References
@VIN-NBV-frahm2025
Finite-Horizon Return

return core planning.objective

Aliases H-step return bounded return cumulative target root gain

Symbols
\(G_t^{(H)}\)rl.return_h \(r_t^e\)rl.reward_target \(\gamma\)rl.gamma \(H\)rl.H
Equations
\(G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k}^e\)rl.finite_horizon_return

H-step discounted return over root-normalized target-gain rewards.

The return definition keeps gamma symbolic so discounted ablations remain possible. The first thesis result should report cumulative root-normalized target gain under an equal acquisition budget and treat log-improvement or scalarized rewards as follow-up analysis.

Finite-horizon return

\[ G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k}^e \]

Links and references
Related
reward Q_H
Docs
Finite-Candidate Rollout And Q_H Contract Master Thesis Research Questions
Finite-Horizon Q Function

Q_H core model.value

Aliases Q_H candidate-query Q_H bounded Q function finite-candidate Q fitted Double-Q head

Symbols
\(Q_H\)rl.qh \(G_t^{(H)}\)rl.return_h \(s_t^{\mathrm{cf0}}\)rl.s_cf0 \(a\)rl.a
Equations
\(Q_H(s_t^{\mathrm{cf0}},a_t)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t\right]\)rl.q_h \(y_t^Q=r_t+\gamma V(s_{t+1})\)rl.q_backup

Finite-horizon candidate-value function for target-conditioned ARIA-NBV.

The mandatory M5 learned policy-like result is Q_H over finite candidate sets. The first-path architecture uses candidate-to-state query attention: encode s_t^{cf0}, actor-visible target descriptor z_e, selected-view history, budget state, scene-memory summaries, and candidate tokens, then emit one continuous return value per candidate. DQN contributes replayed transition learning and Bellman-style finite-action value targets; Double DQN contributes the masked online-selector / target-evaluator backup to reduce max-over-candidate overestimation; IQL contributes the offline support rule that value learning must not query invalid, ungenerated, or unavailable actions. Q_H must respect validity masks and beat one-step greedy or model scoring on cumulative root-normalized target gain under equal acquisition budget, with bounded oracle lookahead as an upper bound.

Finite-horizon candidate value

\[ Q_H(s_t^{\mathrm{cf0}},a_t,z_e)=\mathbb{E}\left[G_t^{(H)}\mid s_t=s_t^{\mathrm{cf0}},a_t,z_e\right] \]

Masked Double-DQN selector

\[ j^*=\arg\max_{j:m_{t+1,j}=1}Q_\theta(s_{t+1}^{\mathrm{cf0}},a_{t+1,j},z_e) \]

Masked Double-DQN target

\[ y_t=r_t^e+\gamma(1-d_t)Q_{\bar\theta}(s_{t+1}^{\mathrm{cf0}},a_{t+1,j^*},z_e) \]

Links and references
Related
return CF0 state PRED-Q mask
Docs
Master Thesis Research Questions Master Thesis Roadmap Finite-Candidate Rollout And Q_H Contract RL Sources For Rollout And Q_H
References
@DBLP:journals/corr/MnihKSGAWR13 @DoubleDQN-vanHasselt2015 @IQL-kostrikov2021
Validity Mask

mask core planning.constraints

Aliases candidate validity mask action mask invalid action mask

Symbols
\(m_{t,i}\)rl.validity_mask \(\rho_{t,i}\)rl.invalid_reason \(m\)vin.cand_valid
Equations
\(m_i=\mathbb{1}[\mathrm{finite}]\mathbb{1}[v_i>0]\mathbb{1}[v_i^{\mathrm{sem}}>0]\)metrics.candidate_validity \(\mathcal{Q}_t=\{q_{t,i}\}_{i=1}^{N_t},\quad \mathcal{A}(s_t)=\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\},\quad q_t=q_{t,a_t}\)rl.finite_action_set

Hard mask that separates feasible candidate actions from invalid candidates.

The mask m_{t,i} gates candidate actions, while invalid reason codes rho_{t,i} preserve why a candidate was rejected. Collision, outside-bounds poses, no target visibility, bad frusta, no depth hits, and outside-EVL-extent cases are constraints rather than low target-RRI examples.

Links and references
Related
action set Q_H reward
Docs
Master Thesis Research Questions CandidateSamplingResult
Project Aria

core dataset.project_aria

Aliases Aria Project Aria ecosystem

Egocentric research-device and tooling ecosystem for calibrated, time-aligned multimodal sensing.

ARIA-NBV treats Project Aria and its MPS-style products as the actor-visible sensing contract, while ASE meshes remain offline supervision and evaluation assets.

Links and references
Related
MPS ASE VIO
Docs
Project Aria Aria Synthetic Environments (ASE) Dataset
References
@projectaria-engel2023

Dataset

Aria Digital Twin

ADT background dataset.aria

Aliases ADT

Project Aria dataset with real-world captures and digital-twin scene annotations.

ADT is adjacent to ARIA-NBV’s sim-to-real context; the current experiments focus on ASE and EFM3D/ATEK exports, while ADT remains a relevant transfer surface.

Links and references
Related
ASE AEO
Docs
Aria Synthetic Environments (ASE) Dataset
References
@EFM3D-straub2024
Aria Everyday Objects

AEO background dataset.aria

Aliases AEO

Small-scale real-world Project Aria object dataset used by EFM3D for egocentric 3D perception evaluation.

AEO is relevant as a possible sim-to-real check for ARIA-NBV ideas that are first developed on ASE mesh-supervised snippets.

Links and references
Related
ASE ADT
Docs
Aria Synthetic Environments (ASE) Dataset
References
@EFM3D-straub2024
Virtual Reality Standard

VRS background dataset.file_format

Aliases VRS file Project Aria VRS

File format used by Project Aria tooling to store multi-modal sensor recordings efficiently.

VRS is part of the upstream Project Aria ecosystem, while ARIA-NBV primarily consumes ASE/ATEK-derived tensorized snippets for current experiments.

Links and references
Related
snippet MFCD MTD
Docs
Setup Instructions Aria Synthetic Environments (ASE) Dataset
References
@ProjectAria-ASE-2025
Central Pupil Frame

CPF background dataset.frames

Aliases central pupil coordinate frame

Coordinate frame placed at the midpoint between the left and right eye boxes of Project Aria glasses.

CPF is used by Project Aria for gaze-related quantities and should remain distinct from rig, camera, world, and PyTorch3D frames in ARIA-NBV docs.

Links and references
Related
LUF VIO
Docs
Aria Synthetic Environments (ASE) Dataset
References
@ProjectAria-ASE-2025
Machine Perception Services

MPS background dataset.project_aria

Aliases Project Aria MPS

Project Aria processing services that derive pose, mapping, gaze, hand, and related perception outputs from sensor recordings.

ARIA-NBV mainly depends on MPS-style pose and semi-dense mapping products as the current reconstruction state for RRI computation.

Links and references
Related
SLAM PC VIO
Docs
Aria Synthetic Environments (ASE) Dataset Semi-Dense Point Clouds
References
@ProjectAria-ASE-2025
Snippet

snippet support dataset.sample

Aliases temporal window ASE snippet

Short synchronized temporal window of Aria sensor data used as one EVL/VIN input sample.

A snippet typically contains RGB or grayscale streams, poses, calibration, semi-dense points, and scene metadata that EVL lifts into a voxel grid.

Links and references
Related
ASE EVL VIN
Docs
Aria Synthetic Environments (ASE) Dataset efm_dataset VinModelV3
References
@ProjectAria-ASE-2025 @EFM3D-straub2024
SceneScript Language

SSL background dataset.scene_representation

Aliases Structure Scene Language SceneScript structured language

Structured language representation for indoor scene layout using primitives such as walls, doors, windows, and objects.

SceneScript is relevant to ARIA-NBV as a possible semantic/global planning layer and as one source of ASE scene-structure context.

Links and references
Related
ASE target NBV
Docs
SceneScript Aria Synthetic Environments (ASE) Dataset
References
@SceneScript-avetisyan2024
Motion Trajectory Data

MTD background dataset.stream

Aliases trajectory data pose stream

Device poses over time, usually represented as a sequence of 6-DoF transformations.

MTD supplies the logged egocentric trajectory used to define snippet state, current reconstruction context, and candidate-view reference poses.

Links and references
Related
snippet VIO candidate
Docs
Aria Synthetic Environments (ASE) Dataset efm_views
References
@ProjectAria-ASE-2025
Multi-Frame Camera Data

MFCD background dataset.stream

Aliases multi-camera frame data synchronized camera data

Synchronized camera streams from multiple Project Aria cameras over a temporal window.

MFCD provides the egocentric image evidence consumed by EVL and aligned with pose, calibration, and semi-dense point streams.

Links and references
Related
snippet EVL VIO
Docs
Aria Synthetic Environments (ASE) Dataset EFM3D and EVL
References
@ProjectAria-ASE-2025 @EFM3D-straub2024
Multi-Semi-Dense Point Data

MSDPD background dataset.stream

Aliases multi-frame semi-dense points semi-dense point stream

Semi-dense 3D point observations generated by SLAM-style processing across a snippet or trajectory window.

MSDPD is the sparse observed geometry that ARIA-NBV fuses with candidate point clouds and compares against ground-truth meshes for RRI labels.

Links and references
Related
PC SLAM RRI
Docs
Semi-Dense Point Clouds Relative Reconstruction Improvement (RRI) Theory
References
@ProjectAria-ASE-2025
Aria Synthetic Environments

ASE support dataset.synthetic

Aliases Aria Synthetic Environment ASE dataset

Large-scale synthetic indoor dataset with simulated Project Aria sensor characteristics, egocentric trajectories, and scene annotations.

ARIA-NBV uses ASE snippets and the public mesh-supervised subset to generate oracle RRI labels from ground-truth meshes and semi-dense point clouds.

Links and references
Related
snippet EFM3D oracle RRI
Docs
Aria Synthetic Environments (ASE) Dataset Setup Instructions
References
@ProjectAria-ASE-2025 @EFM3D-straub2024

Geometry

Oriented Bounding Box

OBB support geometry.annotation

Aliases oriented box object box

3D bounding box with arbitrary orientation, used to represent object extent more tightly than an axis-aligned box.

OBBs are a natural target encoding for entity-aware ARIA-NBV because they provide center, extent, orientation, semantic class, and confidence signals.

Links and references
Related
target target-conditioned scorer EVL
Docs
Aria Synthetic Environments (ASE) Dataset Master Thesis Research Questions types
References
@EFM3D-straub2024
Frustum

frustum support geometry.camera

Aliases viewing frustum camera frustum

Truncated pyramidal camera-visible volume bounded by near and far clipping planes plus lateral field-of-view planes.

Candidate-view frusta define which scene surfaces can project into a camera image and are therefore central to visibility, rendering, and RRI diagnostics.

Links and references
Related
candidate EVL PC
Docs
Relative Reconstruction Improvement (RRI) Theory VinModelV3
References
@Frustum-Wikipedia-2025
Left-Up-Forward

LUF support geometry.frames

Aliases LUF frame left up forward

Camera coordinate convention whose x axis points left, y axis points up, and z axis points forward.

LUF is one of the coordinate-frame conventions that must stay explicit when moving between Project Aria cameras, PyTorch3D cameras, and ARIA-NBV candidate poses.

Links and references
Related
candidate frustum
Docs
05 Coordinate Conventions 12F Appendix Pose Frames
Degrees of Freedom

DoF background geometry.pose

Aliases DoF

Number of independent pose parameters available to a camera, object, or action representation.

ARIA-NBV distinguishes full 6-DoF pose estimation from reduced 5-DoF candidate or action spaces used for practical view planning.

Links and references
Related
candidate
Docs
05 Coordinate Conventions 12F Appendix Pose Frames
Six Degrees of Freedom

6DoF background geometry.pose

Aliases 6-DoF 6 degrees of freedom

Pose parameterization with three translational and three rotational degrees of freedom.

Project Aria poses, candidate cameras, and world-to-device transforms are usually represented as 6-DoF rigid-body transforms.

Links and references
Related
candidate DoF
Docs
05 Coordinate Conventions 12F Appendix Pose Frames

Metrics

Coverage Ratio

CR support metrics.coverage

Aliases coverage ratio

Fraction of a target surface, scene, or region treated as observed under a chosen visibility or distance threshold.

Coverage ratio is a useful diagnostic baseline for NBV, but ARIA-NBV treats reconstruction-quality improvement through RRI as the preferred optimization target.

Links and references
Related
RRI candidate
Docs
NBV Background Relative Reconstruction Improvement (RRI) Theory
References
@VIN-NBV-frahm2025
Area Under Curve

AUC support metrics.evaluation

Aliases AUC

Aggregate score computed by integrating a metric curve over an acquisition, threshold, or ranking axis.

AUC can summarize how quickly reconstruction quality, coverage, or ranking quality improves as views are acquired or candidate thresholds change.

Links and references
Related
cost RRI
Docs
NBV Background
Ground Truth

GT support metrics.reference

Aliases reference annotation oracle reference

Reference data or annotations treated as the trusted target for training, validation, or evaluation.

In ARIA-NBV, ground truth usually refers to ASE meshes, object annotations, poses, or labels used to compute oracle supervision and diagnostic metrics.

Links and references
Related
oracle RRI RRI OBB
Docs
Aria Synthetic Environments (ASE) Dataset Relative Reconstruction Improvement (RRI) Theory
References
@ProjectAria-ASE-2025 @EFM3D-straub2024

Model

Egocentric Foundation Model 3D

EFM3D support model.backbone

Aliases EFM egocentric foundation model

Egocentric 3D foundation-model stack used as the frozen spatial backbone for ARIA-NBV candidate scoring.

The current project uses EFM3D and its EVL architecture to expose voxel occupancy, centerness, semantic, and OBB evidence for VIN-style RRI prediction.

Links and references
Related
EVL VIN ASE
Docs
EFM3D and EVL VinModelV3 EvlBackbone
References
@EFM3D-straub2024
Egocentric Voxel Lifting

EVL support model.backbone

Aliases voxel lifting EVL backbone

EFM3D architecture that lifts synchronized egocentric observations into a gravity-aligned 3D voxel feature volume.

ARIA-NBV uses EVL head outputs, pre-head features, and OBB predictions as local actor-visible evidence and target support for VIN-style RRI prediction. Broader NBV scene embeddings may combine semidense or fused point state with lifted image-foundation features.

Links and references
Related
EFM3D VIN OBB
Docs
EFM3D and EVL EFM3D Scene Embeddings EvlBackbone VinModelV3
References
@EFM3D-straub2024 @EVL-Doc-2025
View Introspection Network

VIN support model.scoring

Aliases VIN scorer VIN-NBV model Aria-VIN-NBV

Learned candidate-view scorer that predicts RRI or an ordinal RRI-derived utility without capturing the candidate view.

ARIA-NBV adapts VIN-NBV by placing a lightweight RRI prediction head on top of frozen EVL features and candidate-pose evidence from ASE snippets.

Links and references
Related
RRI target-conditioned scorer EVL
Docs
VIN-NBV model_v3 VinModelV3
References
@VIN-NBV-frahm2025

Planning

Five Degrees of Freedom

5DoF background planning.action_space

Aliases 5-DoF 5 degrees of freedom

Reduced camera-action parameterization commonly used when roll is fixed or otherwise constrained.

ARIA-NBV uses 5-DoF language for candidate and planning abstractions where position and viewing direction matter while roll is not an independently optimized control dimension.

Links and references
Related
candidate NBV DoF
Docs
Master Thesis Roadmap 10A Extensions
References
@GenNBV-chen2024

Reconstruction

Track

track background reconstruction.features

Aliases feature track 2D track SLAM track

Temporal sequence of corresponding image-feature detections across frames, usually carrying per-frame image coordinates, timestamps, and camera IDs.

Tracks can be triangulated or optimized into 3D points, making them a bridge between image evidence and the semi-dense point clouds used by ARIA-NBV.

\[ \mathcal{T}=\{(u_k,v_k,\mathrm{cam}_k,t_k)\}_{k=0}^{N} \]

Links and references
Related
PC SLAM VIO
Docs
Semi-Dense Point Clouds
Multi-view Stereo

MVS background reconstruction.geometry

Aliases multi-view reconstruction

Reconstruction family that estimates dense or semi-dense scene geometry from multiple calibrated views.

MVS is part of the broader reconstruction context for NBV, although ARIA-NBV’s current oracle labels are built from ASE meshes and Aria-style semi-dense point observations.

Links and references
Related
PC NBV
Docs
NBV Background
Visual-Inertial Odometry

VIO background reconstruction.pose

Aliases visual inertial odometry inertial visual odometry

Pose-estimation method that combines visual measurements from cameras with inertial measurements from IMUs.

Project Aria pose and mapping products build on visual-inertial estimation, and ARIA-NBV depends on those pose streams for snippet state and candidate frame contracts.

Links and references
Related
SLAM MTD
Docs
Aria Synthetic Environments (ASE) Dataset Semi-Dense Point Clouds
References
@ProjectAria-ASE-2025
Simultaneous Localization and Mapping

SLAM support reconstruction.state

Aliases visual SLAM mapping and localization

Method for estimating sensor motion while building a map of the surrounding scene.

In ARIA-NBV, logged SLAM poses and semi-dense points form the current reconstruction state against which candidate-view RRI is evaluated.

Links and references
Related
PC snippet RRI
Docs
Semi-Dense Point Clouds Aria Synthetic Environments (ASE) Dataset
References
@ProjectAria-ASE-2025

Representation

3D Gaussian Splatting

3DGS background representation.radiance_field

Aliases 3D Gaussian Splatting 3DGS Gaussian splatting

Explicit radiance-field representation using optimized 3D Gaussian primitives for real-time novel-view synthesis.

ARIA-NBV treats active 3DGS work as proposal, uncertainty, and simulator-bridge literature, not as a replacement for ASE mesh-supervised RRI.

Links and references
Related
NBV RRI
Docs
Active 3DGS and Targeted NBV
References
@GaussianSplatting-kerbl2023
Occupancy Grid

support representation.spatial

Aliases occupancy volume

Spatial grid whose cells encode whether space is occupied, free, unknown, or represented by a related occupancy probability.

Occupancy-style voxel evidence appears in EVL outputs and VIN feature construction, where it helps summarize local 3D scene state for candidate scoring.

Links and references
Related
EVL VIN
Docs
VinModelV3 06 Architecture
References
@EFM3D-straub2024

[1]; [2]; [3]; [4]; [5]; [6]; [7]; [8]; [9]; [10]; [11]; [12]

References

[1]
V. Mnih et al., “Playing atari with deep reinforcement learning,” CoRR, vol. abs/1312.5602, 2013, Available: http://arxiv.org/abs/1312.5602
[2]
H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning.” 2015. Available: https://arxiv.org/abs/1509.06461
[3]
J. Straub, D. DeTone, T. Shen, N. Yang, C. Sweeney, and R. Newcombe, “EFM3D: A benchmark for measuring progress towards 3D egocentric foundation models.” 2024. Available: https://arxiv.org/abs/2406.10224
[4]
Meta Platforms Inc., “Egocentric voxel lifting (EVL) documentation.” [Online]. Available: https://facebookresearch.github.io/projectaria_tools/docs/open_models/evl
[5]
Wikipedia contributors, “Frustum.” [Online]. Available: https://en.wikipedia.org/wiki/Frustum
[6]
B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, 2023, doi: 10.1145/3592433.
[7]
X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html
[8]
I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning.” 2021. Available: https://arxiv.org/abs/2110.06169
[9]
Meta Platforms Inc., “Aria synthetic environments dataset.” [Online]. Available: https://facebookresearch.github.io/projectaria_tools/docs/open_datasets/aria_synthetic_environments_dataset
[10]
A. Avetisyan et al., “SceneScript: Reconstructing scenes with an autoregressive structured language model.” 2024. Available: https://arxiv.org/abs/2403.13064
[11]
N. Frahm et al., “VIN-NBV: A view introspection network for next-best-view selection.” 2025. Available: https://arxiv.org/abs/2505.06219
[12]
J. Engel et al., “Project aria: A new tool for egocentric multi-modal AI research.” 2023. Available: https://arxiv.org/abs/2308.13561
 
 

Built with Quarto