Literature Review

1 Literature Review

This section is a thesis-oriented distillation, not a general survey. Each page extracts what ARIA-NBV can adopt from the local paper corpus: NBV objectives, egocentric observation contracts, candidate proposal mechanisms, rollout/value-learning gates, and failure modes that affect target-conditioned reconstruction.

The current project direction is deliberately narrow: keep RRI and target RRI as the authoritative utility signals; use Project Aria, ASE, EFM3D, and EVL as the actor-visible state substrate; require target-conditioned fitted Double-Q / Q_H over trusted finite candidate sets; and keep IQL, continuous actor-critic, simulator-backed RL, and 3DGS control behind evidence gates.

1.1 Adoption-State Overview

domain	papers	adoption state	core signal	do not adopt
NBV objective and candidate planning	VIN-NBV, PB-NBV, GenNBV, Hestia, receding-horizon and shadowcasting NBV	thesis-core method for RRI ranking; proposal/diagnostic for projection and frontier heuristics; stretch/bridge for continuous policies	Mesh-supervised oracle RRI, finite candidate sets, efficient candidate shortlists, validity-aware motion constraints	Replacing RRI with coverage, projected frontier area, or policy reward before calibration
ARIA ecosystem and actor-visible state	Project Aria, EFM3D/EVL, EFM3D scene embeddings, ASE, MPS	core substrate	Calibrated egocentric streams, trajectories, online calibration, semi-dense points, DINO/EVL local evidence, predicted OBB support, semidense+DINO representation ablations	Leaking GT meshes or GT boxes into actor-visible selection/scoring
Rollout, value learning, and RL	RL sources for rollout and Q_H, Trajectory Transformer, Gumbel-Top-k, Double DQN, IQL, soft Q-learning, PPO/SAC	thesis-core method for fitted Double-Q / Q_H; gated follow-up for IQL and actor-critic	Deterministic rollout traces first, stochastic rollout data second, masked target-conditioned fitted Double-Q over finite candidates third	Starting with continuous online RL before a trusted reward loop and support-aware offline store exist
Coverage/information utility channels	SCONE and FisherRF, SCONE, FisherRF	proposal/diagnostic	Target-local support, visibility, directional novelty, Fisher-style diminishing returns	Replacing target RRI with coverage or uncertainty reduction
3DGS / radiance-field active reconstruction	Active 3DGS and targeted NBV, ActiveNeRF, FisherRF, Next Best Sense, dynamic/object-centric 3DGS, FOV-HPE	proposal/diagnostic and stretch/bridge	Uncertainty, Fisher information, target/object weighting, downstream-task view selection	Treating 3DGS uncertainty or human-pose reward as a substitute for ASE mesh-supervised target RRI
Semantic scene representations	SceneScript	stretch/bridge	Structured scene language, editable entities, global layout priors, ASE-scale scene representation	Making SceneScript a thesis-core dependency before observed target contracts and target RRI are trusted

Adoption-state labels used in the pages:

core substrate: required observation, dataset, or representation contract.
thesis-core method: needed for the current thesis claim.
proposal/diagnostic: useful for candidate proposals, reports, or sanity checks.
gated follow-up: useful only after prerequisite evidence exists.
stretch/bridge: future direction beyond the required thesis result.
background: context only.

1.2 Domain Hierarchy

1.2.1 1. NBV Objective And Candidate Planning

VIN-NBV: source-backed RRI objective, oracle RRI labels, CORAL ordinal training, and greedy candidate ranking [1].
PB-NBV: projection/ellipsoid candidate shortlisting and frontier/occupied evidence separation [2].
GenNBV: continuous 5DoF PPO baseline with coverage-gain rewards, useful as a simulator-gated contrast [3].
Hestia: hierarchical look-at-then-fly control and directional voxel-face visibility, useful for continuous-policy bridge design [4].

1.2.2 2. ARIA Ecosystem And Actor-Visible State

Project Aria: calibrated egocentric device, VRS/tooling path, MPS trajectories, online calibration, and semi-dense maps [5].
EFM3D/EVL: local actor-visible DINO/voxel evidence, occupancy/head outputs, and OBB support, with broader scene memory delegated to semidense/fused point evidence [6].

1.2.3 3. Rollout, Value Learning, And RL

RL sources for rollout and Q_H: planning-as-sequence-decoding, stochastic beams, overestimation control, offline-RL support constraints, and the mandatory target-conditioned fitted Double-Q / Q_H gate.

1.2.4 4. 3DGS / Radiance-Field Active Reconstruction

SCONE and FisherRF: coverage-as-support and information-as-diminishing-returns channels for candidate tokens and diagnostics [7], [8].
Active 3DGS and targeted NBV: uncertainty, Fisher information, object-centric utility, dynamic scenes, and task-specific view selection as proposal/diagnostic signals.

1.2.5 5. Semantic Scene Representations

SceneScript: structured scene language and editable entity-level representation as stretch-only semantic/global planning context [9].

1.3 Current Synthesis

Code

flowchart LR
  A["Project Aria / ASE observed state"] --> B["EVL and semi-dense reconstruction proxy"]
  B --> C["Scene + target oracle RRI labels"]
  C --> D["One-step candidate scorer"]
  D --> E["Trusted finite-candidate rollouts"]
  E --> F["Target-conditioned fitted Double-Q Q_H"]
  F -. "after evidence" .-> G["IQL / actor-critic / simulator bridge"]

flowchart LR
  A["Project Aria / ASE observed state"] --> B["EVL and semi-dense reconstruction proxy"]
  B --> C["Scene + target oracle RRI labels"]
  C --> D["One-step candidate scorer"]
  D --> E["Trusted finite-candidate rollouts"]
  E --> F["Target-conditioned fitted Double-Q Q_H"]
  F -. "after evidence" .-> G["IQL / actor-critic / simulator bridge"]

The thesis should first prove that deterministic bounded oracle lookahead improves cumulative target RRI over one-step greedy under equal acquisition budget. It should then train a target-conditioned fitted Double-Q / Q_H model over finite candidate sets and require it to beat one-step greedy/model scoring on cumulative target RRI. IQL, actor-critic bridges, SB3/PPO/SAC, Habitat/Isaac, and 3DGS control remain gated follow-up or stretch work.

1.4 Local Corpus

The local source mirrors and paper manifest are tracked under docs/literature/.

sources.jsonl: canonical paper manifest.
tex-src/: local LaTeX mirrors when available.

Key local mirrors include VIN-NBV, GenNBV, Hestia, PB-NBV, EFM3D, Project Aria, SceneScript, Trajectory Transformer, Double DQN, IQL, Gumbel-Top-k, Deep Energy-Based Policies, SCONE, FisherRF, Dynamic 3DGS, Next Best Sense, and Instance/Object-centric NBV. FOV-HPE is tracked as DOI/PDF evidence in the local corpus, not as a local TeX mirror.

References

[1]

N. Frahm et al., “VIN-NBV: A view introspection network for next-best-view selection.” 2025. Available: https://arxiv.org/abs/2505.06219

[2]

Z. Jia, Y. Li, Q. Hao, and S. Zhang, “PB-NBV: Efficient projection-based next-best-view planning framework for reconstruction of unknown objects,” IEEE Robotics and Automation Letters, vol. 10, no. 7, pp. 7444–7451, 2025, doi: 10.1109/LRA.2025.3573631.

[3]

X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html

[4]

C.-Y. Lu et al., “Hestia: Voxel-face-aware hierarchical next-best-view acquisition for efficient 3D reconstruction,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2026. Available: https://openaccess.thecvf.com/content/WACV2026/papers/Lu_Hestia_Voxel-Face-Aware_Hierarchical_Next-Best-View_Acquisition_for_Efficient_3D_Reconstruction_WACV_2026_paper.pdf

[5]

J. Engel et al., “Project aria: A new tool for egocentric multi-modal AI research.” 2023. Available: https://arxiv.org/abs/2308.13561

[6]

J. Straub, D. DeTone, T. Shen, N. Yang, C. Sweeney, and R. Newcombe, “EFM3D: A benchmark for measuring progress towards 3D egocentric foundation models.” 2024. Available: https://arxiv.org/abs/2406.10224

[7]

A. Guédon, P. Monasse, and V. Lepetit, “SCONE: Surface coverage optimization in unknown environments by volumetric integration,” in Advances in neural information processing systems, 2022. Available: https://arxiv.org/abs/2208.10449

[8]

W. Jiang, B. Lei, and K. Daniilidis, “FisherRF: Active view selection and uncertainty quantification for radiance fields using fisher information.” 2024. Available: https://arxiv.org/abs/2311.17874

[9]

A. Avetisyan et al., “SceneScript: Reconstructing scenes with an autoregressive structured language model.” 2024. Available: https://arxiv.org/abs/2403.13064

--- title: "Literature Review" phase: thesis audience: public status: current owner: jan format: html --- # Literature Review This section is a thesis-oriented distillation, not a general survey. Each page extracts what ARIA-NBV can adopt from the local paper corpus: {{< gls next-best-view >}} objectives, egocentric observation contracts, candidate proposal mechanisms, rollout/value-learning gates, and failure modes that affect target-conditioned reconstruction. The current project direction is deliberately narrow: keep {{< gls relative-reconstruction-improvement >}} and {{< gls target-specific-rri >}} as the authoritative utility signals; use {{< gls project-aria >}}, {{< gls aria-synthetic-environments >}}, {{< gls egocentric-foundation-model-3d >}}, and {{< gls egocentric-voxel-lifting >}} as the actor-visible state substrate; require target-conditioned fitted Double-Q / {{< gls finite-horizon-q-function >}} over trusted finite candidate sets; and keep IQL, continuous actor-critic, simulator-backed RL, and 3DGS control behind evidence gates. ## Adoption-State Overview | domain | papers | adoption state | core signal | do not adopt | |---|---|---|---|---| | {{< gls next-best-view >}} objective and candidate planning | [VIN-NBV](vin_nbv.qmd), [PB-NBV](pb_nbv.qmd), [GenNBV](gen_nbv.qmd), [Hestia](hestia.qmd), receding-horizon and shadowcasting NBV | thesis-core method for RRI ranking; proposal/diagnostic for projection and frontier heuristics; stretch/bridge for continuous policies | Mesh-supervised {{< gls oracle-rri >}}, finite {{< gls candidate-view >}} sets, efficient candidate shortlists, validity-aware motion constraints | Replacing {{< gls relative-reconstruction-improvement >}} with coverage, projected frontier area, or policy reward before calibration | | ARIA ecosystem and actor-visible state | [Project Aria](project_aria.qmd), [EFM3D/EVL](efm3d.qmd), [EFM3D scene embeddings](../theory/efm3d_scene_embeddings.qmd), {{< gls aria-synthetic-environments >}}, {{< gls machine-perception-services >}} | core substrate | Calibrated egocentric streams, trajectories, online calibration, semi-dense points, DINO/EVL local evidence, predicted {{< gls oriented-bounding-box >}} support, semidense+DINO representation ablations | Leaking GT meshes or GT boxes into actor-visible selection/scoring | | Rollout, value learning, and RL | [RL sources for rollout and Q_H](rl_planning.qmd), Trajectory Transformer, Gumbel-Top-k, Double DQN, IQL, soft Q-learning, PPO/SAC | thesis-core method for fitted Double-Q / {{< gls finite-horizon-q-function >}}; gated follow-up for IQL and actor-critic | Deterministic rollout traces first, stochastic rollout data second, masked target-conditioned fitted Double-Q over finite candidates third | Starting with continuous online RL before a trusted reward loop and support-aware offline store exist | | Coverage/information utility channels | [SCONE and FisherRF](scone_fisherrf.qmd), SCONE, FisherRF | proposal/diagnostic | Target-local support, visibility, directional novelty, Fisher-style diminishing returns | Replacing target RRI with coverage or uncertainty reduction | | {{< gls three-dimensional-gaussian-splatting >}} / radiance-field active reconstruction | [Active 3DGS and targeted NBV](active_3dgs_nbv.qmd), ActiveNeRF, FisherRF, Next Best Sense, dynamic/object-centric 3DGS, FOV-HPE | proposal/diagnostic and stretch/bridge | Uncertainty, Fisher information, target/object weighting, downstream-task view selection | Treating 3DGS uncertainty or human-pose reward as a substitute for ASE mesh-supervised target RRI | | Semantic scene representations | [SceneScript](scene_script.qmd) | stretch/bridge | Structured scene language, editable entities, global layout priors, ASE-scale scene representation | Making SceneScript a thesis-core dependency before observed target contracts and target RRI are trusted | Adoption-state labels used in the pages: - **core substrate**: required observation, dataset, or representation contract. - **thesis-core method**: needed for the current thesis claim. - **proposal/diagnostic**: useful for candidate proposals, reports, or sanity checks. - **gated follow-up**: useful only after prerequisite evidence exists. - **stretch/bridge**: future direction beyond the required thesis result. - **background**: context only. ## Domain Hierarchy ### 1. NBV Objective And Candidate Planning - [VIN-NBV](vin_nbv.qmd): source-backed {{< gls relative-reconstruction-improvement >}} objective, {{< gls oracle-rri >}} labels, CORAL ordinal training, and greedy candidate ranking [@VIN-NBV-frahm2025]. - [PB-NBV](pb_nbv.qmd): projection/ellipsoid candidate shortlisting and frontier/occupied evidence separation [@PB-NBV-jia2025]. - [GenNBV](gen_nbv.qmd): continuous {{< gls five-degrees-of-freedom >}} PPO baseline with coverage-gain rewards, useful as a simulator-gated contrast [@GenNBV-chen2024]. - [Hestia](hestia.qmd): hierarchical look-at-then-fly control and directional voxel-face visibility, useful for continuous-policy bridge design [@Hestia-lu2026]. ### 2. ARIA Ecosystem And Actor-Visible State - [Project Aria](project_aria.qmd): calibrated egocentric device, VRS/tooling path, {{< gls machine-perception-services >}} trajectories, online calibration, and semi-dense maps [@projectaria-engel2023]. - [EFM3D/EVL](efm3d.qmd): local actor-visible DINO/voxel evidence, occupancy/head outputs, and {{< gls oriented-bounding-box >}} support, with broader scene memory delegated to semidense/fused point evidence [@EFM3D-straub2024]. ### 3. Rollout, Value Learning, And RL - [RL sources for rollout and Q_H](rl_planning.qmd): planning-as-sequence-decoding, stochastic beams, overestimation control, offline-RL support constraints, and the mandatory target-conditioned fitted Double-Q / {{< gls finite-horizon-q-function >}} gate. ### 4. 3DGS / Radiance-Field Active Reconstruction - [SCONE and FisherRF](scone_fisherrf.qmd): coverage-as-support and information-as-diminishing-returns channels for candidate tokens and diagnostics [@SCONE-guedon2022; @FisherRF-jiang2024]. - [Active 3DGS and targeted NBV](active_3dgs_nbv.qmd): uncertainty, Fisher information, object-centric utility, dynamic scenes, and task-specific view selection as proposal/diagnostic signals. ### 5. Semantic Scene Representations - [SceneScript](scene_script.qmd): structured scene language and editable entity-level representation as stretch-only semantic/global planning context [@SceneScript-avetisyan2024]. ## Current Synthesis ```{mermaid} flowchart LR A["Project Aria / ASE observed state"] --> B["EVL and semi-dense reconstruction proxy"] B --> C["Scene + target oracle RRI labels"] C --> D["One-step candidate scorer"] D --> E["Trusted finite-candidate rollouts"] E --> F["Target-conditioned fitted Double-Q Q_H"] F -. "after evidence" .-> G["IQL / actor-critic / simulator bridge"] ``` The thesis should first prove that deterministic bounded oracle lookahead improves cumulative {{< gls target-specific-rri >}} over one-step greedy under equal acquisition budget. It should then train a target-conditioned fitted Double-Q / {{< gls finite-horizon-q-function >}} model over finite candidate sets and require it to beat one-step greedy/model scoring on cumulative target RRI. IQL, actor-critic bridges, SB3/PPO/SAC, Habitat/Isaac, and 3DGS control remain gated follow-up or stretch work. ## Local Corpus The local source mirrors and paper manifest are tracked under `docs/literature/`. - [`sources.jsonl`](../../literature/sources.jsonl): canonical paper manifest. - [`tex-src/`](../../literature/tex-src/): local LaTeX mirrors when available. Key local mirrors include VIN-NBV, GenNBV, Hestia, PB-NBV, EFM3D, Project Aria, SceneScript, Trajectory Transformer, Double DQN, IQL, Gumbel-Top-k, Deep Energy-Based Policies, SCONE, FisherRF, Dynamic 3DGS, Next Best Sense, and Instance/Object-centric NBV. FOV-HPE is tracked as DOI/PDF evidence in the local corpus, not as a local TeX mirror. ## Navigation Return to [main documentation](../../index.qmd), [research questions](../thesis/questions.qmd), [roadmap](../thesis/roadmap.qmd), or [RRI theory](../theory/rri_theory.qmd).