1 Literature Review

This section is a thesis-oriented distillation, not a general survey. Each page extracts what ARIA-NBV can adopt from the local paper corpus: NBV objectives, egocentric observation contracts, candidate proposal mechanisms, rollout/value-learning gates, and failure modes that affect target-conditioned reconstruction.

The current project direction is deliberately narrow: keep RRI and target RRI as the authoritative utility signals; use Project Aria, ASE, EFM3D, and EVL as the actor-visible state substrate; require target-conditioned fitted Double-Q / Q_H over trusted finite candidate sets; and keep IQL, continuous actor-critic, simulator-backed RL, and 3DGS control behind evidence gates.

1.1 Adoption-State Overview

domain papers adoption state core signal do not adopt
NBV objective and candidate planning VIN-NBV, PB-NBV, GenNBV, Hestia, receding-horizon and shadowcasting NBV thesis-core method for RRI ranking; proposal/diagnostic for projection and frontier heuristics; stretch/bridge for continuous policies Mesh-supervised oracle RRI, finite candidate sets, efficient candidate shortlists, validity-aware motion constraints Replacing RRI with coverage, projected frontier area, or policy reward before calibration
ARIA ecosystem and actor-visible state Project Aria, EFM3D/EVL, EFM3D scene embeddings, ASE, MPS core substrate Calibrated egocentric streams, trajectories, online calibration, semi-dense points, DINO/EVL local evidence, predicted OBB support, semidense+DINO representation ablations Leaking GT meshes or GT boxes into actor-visible selection/scoring
Rollout, value learning, and RL RL sources for rollout and Q_H, Trajectory Transformer, Gumbel-Top-k, Double DQN, IQL, soft Q-learning, PPO/SAC thesis-core method for fitted Double-Q / Q_H; gated follow-up for IQL and actor-critic Deterministic rollout traces first, stochastic rollout data second, masked target-conditioned fitted Double-Q over finite candidates third Starting with continuous online RL before a trusted reward loop and support-aware offline store exist
Coverage/information utility channels SCONE and FisherRF, SCONE, FisherRF proposal/diagnostic Target-local support, visibility, directional novelty, Fisher-style diminishing returns Replacing target RRI with coverage or uncertainty reduction
3DGS / radiance-field active reconstruction Active 3DGS and targeted NBV, ActiveNeRF, FisherRF, Next Best Sense, dynamic/object-centric 3DGS, FOV-HPE proposal/diagnostic and stretch/bridge Uncertainty, Fisher information, target/object weighting, downstream-task view selection Treating 3DGS uncertainty or human-pose reward as a substitute for ASE mesh-supervised target RRI
Semantic scene representations SceneScript stretch/bridge Structured scene language, editable entities, global layout priors, ASE-scale scene representation Making SceneScript a thesis-core dependency before observed target contracts and target RRI are trusted

Adoption-state labels used in the pages:

  • core substrate: required observation, dataset, or representation contract.
  • thesis-core method: needed for the current thesis claim.
  • proposal/diagnostic: useful for candidate proposals, reports, or sanity checks.
  • gated follow-up: useful only after prerequisite evidence exists.
  • stretch/bridge: future direction beyond the required thesis result.
  • background: context only.

1.2 Domain Hierarchy

1.2.1 1. NBV Objective And Candidate Planning

  • VIN-NBV: source-backed RRI objective, oracle RRI labels, CORAL ordinal training, and greedy candidate ranking [1].
  • PB-NBV: projection/ellipsoid candidate shortlisting and frontier/occupied evidence separation [2].
  • GenNBV: continuous 5DoF PPO baseline with coverage-gain rewards, useful as a simulator-gated contrast [3].
  • Hestia: hierarchical look-at-then-fly control and directional voxel-face visibility, useful for continuous-policy bridge design [4].

1.2.2 2. ARIA Ecosystem And Actor-Visible State

  • Project Aria: calibrated egocentric device, VRS/tooling path, MPS trajectories, online calibration, and semi-dense maps [5].
  • EFM3D/EVL: local actor-visible DINO/voxel evidence, occupancy/head outputs, and OBB support, with broader scene memory delegated to semidense/fused point evidence [6].

1.2.3 3. Rollout, Value Learning, And RL

  • RL sources for rollout and Q_H: planning-as-sequence-decoding, stochastic beams, overestimation control, offline-RL support constraints, and the mandatory target-conditioned fitted Double-Q / Q_H gate.

1.2.4 4. 3DGS / Radiance-Field Active Reconstruction

  • SCONE and FisherRF: coverage-as-support and information-as-diminishing-returns channels for candidate tokens and diagnostics [7], [8].
  • Active 3DGS and targeted NBV: uncertainty, Fisher information, object-centric utility, dynamic scenes, and task-specific view selection as proposal/diagnostic signals.

1.2.5 5. Semantic Scene Representations

  • SceneScript: structured scene language and editable entity-level representation as stretch-only semantic/global planning context [9].

1.3 Current Synthesis

Code
flowchart LR
  A["Project Aria / ASE observed state"] --> B["EVL and semi-dense reconstruction proxy"]
  B --> C["Scene + target oracle RRI labels"]
  C --> D["One-step candidate scorer"]
  D --> E["Trusted finite-candidate rollouts"]
  E --> F["Target-conditioned fitted Double-Q Q_H"]
  F -. "after evidence" .-> G["IQL / actor-critic / simulator bridge"]

flowchart LR
  A["Project Aria / ASE observed state"] --> B["EVL and semi-dense reconstruction proxy"]
  B --> C["Scene + target oracle RRI labels"]
  C --> D["One-step candidate scorer"]
  D --> E["Trusted finite-candidate rollouts"]
  E --> F["Target-conditioned fitted Double-Q Q_H"]
  F -. "after evidence" .-> G["IQL / actor-critic / simulator bridge"]

The thesis should first prove that deterministic bounded oracle lookahead improves cumulative target RRI over one-step greedy under equal acquisition budget. It should then train a target-conditioned fitted Double-Q / Q_H model over finite candidate sets and require it to beat one-step greedy/model scoring on cumulative target RRI. IQL, actor-critic bridges, SB3/PPO/SAC, Habitat/Isaac, and 3DGS control remain gated follow-up or stretch work.

1.4 Local Corpus

The local source mirrors and paper manifest are tracked under docs/literature/.

Key local mirrors include VIN-NBV, GenNBV, Hestia, PB-NBV, EFM3D, Project Aria, SceneScript, Trajectory Transformer, Double DQN, IQL, Gumbel-Top-k, Deep Energy-Based Policies, SCONE, FisherRF, Dynamic 3DGS, Next Best Sense, and Instance/Object-centric NBV. FOV-HPE is tracked as DOI/PDF evidence in the local corpus, not as a local TeX mirror.

References

[1]
N. Frahm et al., “VIN-NBV: A view introspection network for next-best-view selection.” 2025. Available: https://arxiv.org/abs/2505.06219
[2]
Z. Jia, Y. Li, Q. Hao, and S. Zhang, “PB-NBV: Efficient projection-based next-best-view planning framework for reconstruction of unknown objects,” IEEE Robotics and Automation Letters, vol. 10, no. 7, pp. 7444–7451, 2025, doi: 10.1109/LRA.2025.3573631.
[3]
X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html
[4]
C.-Y. Lu et al., “Hestia: Voxel-face-aware hierarchical next-best-view acquisition for efficient 3D reconstruction,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2026. Available: https://openaccess.thecvf.com/content/WACV2026/papers/Lu_Hestia_Voxel-Face-Aware_Hierarchical_Next-Best-View_Acquisition_for_Efficient_3D_Reconstruction_WACV_2026_paper.pdf
[5]
J. Engel et al., “Project aria: A new tool for egocentric multi-modal AI research.” 2023. Available: https://arxiv.org/abs/2308.13561
[6]
J. Straub, D. DeTone, T. Shen, N. Yang, C. Sweeney, and R. Newcombe, “EFM3D: A benchmark for measuring progress towards 3D egocentric foundation models.” 2024. Available: https://arxiv.org/abs/2406.10224
[7]
A. Guédon, P. Monasse, and V. Lepetit, “SCONE: Surface coverage optimization in unknown environments by volumetric integration,” in Advances in neural information processing systems, 2022. Available: https://arxiv.org/abs/2208.10449
[8]
W. Jiang, B. Lei, and K. Daniilidis, “FisherRF: Active view selection and uncertainty quantification for radiance fields using fisher information.” 2024. Available: https://arxiv.org/abs/2311.17874
[9]
A. Avetisyan et al., “SceneScript: Reconstructing scenes with an autoregressive structured language model.” 2024. Available: https://arxiv.org/abs/2403.13064