Master Thesis Roadmap

1 Master Thesis Roadmap

This roadmap covers full-time thesis work from 2026-04-29 through 2026-09-30. The planned capacity is approximately 40 hours per week. The system name is ARIA-NBV, and the final claim is centered on target-conditioned, quality-driven Next-Best-View (NBV) planning:

Can ARIA-NBV perform target-conditioned, RRI-based multi-step NBV by training a finite-candidate value model \(Q_{H,\theta}\) that predicts bounded cumulative target-specific RRI for a target of interest and improves endpoint target reconstruction quality, after first measuring oracle-lookahead headroom over one-step selection under a fixed acquisition budget?

The technical spine follows the current ARIA-NBV system: ASE snippets and ground-truth meshes [1], frozen EVL/EFM3D features [2], oracle RRI supervision inspired by VIN-NBV [3], an ASE mesh/oracle counterfactual rollout loop, and a target-conditioned finite-candidate \(Q_H\) value model, first implemented as a candidate-to-state query Transformer over finite candidate actions and fixed actor-visible state tokens. External mesh/oracle-compatible substrates are RQ4 support and scale evidence. Online discrete \(Q_H\) is the RQ5 bridge after offline finite-candidate evidence, and continuous target-then-pose actor-critic policies are RQ6 headroom tests after finite-candidate and online-discrete evidence, not substitutes for the M5 \(Q_H\) result. See the research questions, the RRI theory page, the finite-candidate rollout and Q_H contract, the candidate sampling and target-selection theory page, the VIN model API, the generated implementation contracts, and the M1 contract report for the main connected nodes. The advisor-facing proposal is maintained in docs/typst/thesis/main.typ; archived proposal/advisor Typst handouts under .agents/archive/docs/typst/thesis/ are provenance only. The seminar paper is historical implemented evidence, not the current thesis contract.

1.1 Thesis Outcome

The September deliverable is a reproducible target-aware NBV stack:

a trusted oracle for scene-level and target RRI;
a V0/V1 target contract where GT OBBs define target-RRI crops and observed/predicted OBBs are the main actor-visible input;
an observed-only automatic target selector and mixed observed candidate set;
a one-step VIN-style scorer conditioned on the target encoding;
stable random-valid, oracle-greedy/lookahead, and oracle-scored temperature-softmax rollout data, with Gumbel-Top-k as preferred later diversity evidence;
a mandatory target-conditioned finite-candidate value model \(Q_H\) trained from ASE oracle rollout traces, first implemented as a candidate-to-state query Transformer;
final figures, ablations, and failure cases that report cumulative root-normalized target gain, diagnostic target RRI, and cost;
final experiments scaled toward the full 100 GT-mesh ASE scenes and 4,608 snippet windows after small-subset correctness passes, or a scene-level held-out subset with explicit coverage reporting if full coverage is blocked;
RQ4 support and scale analysis that preserves mesh/oracle target-RRI supervision across ASE-wide and external-compatible settings;
an RQ5 online-discrete bridge before considering time-permitting RQ6 continuous policies.

M5 first measures non-myopic headroom: bounded oracle lookahead selects actions by the same root-normalized target-gain return used for \(Q_H\) training and must improve endpoint target-quality gain over one-step oracle greedy before learned \(Q_H\) gains are interpreted as planning gains. If that headroom is positive, learned \(Q_H\) is evaluated by oracle re-evaluation of its selected actions, by endpoint gain, cumulative target root gain, diagnostic target RRI under matched budgets, and by recovered headroom over the learned one-step target-conditioned scorer. If headroom is near zero, the thesis reports no measurable non-myopic headroom for the evaluated split, target set, horizon, branch factor, and candidate distribution. Scaling then tests whether more coverage, broader mesh/oracle-compatible substrates, or RQ5 online-discrete interaction changes that conclusion before RQ6 continuous action spaces are attempted. Full continuous control, VLM/global planning, SceneScript-style memory, and real-device deployment remain lower-priority escalations or future work unless the finite-candidate and scaling evidence justifies experiment-grade escalation. Hestia-style hierarchy [4], GenNBV-style continuous control [5], and SceneScript-style semantic memory [6] are treated as design references, not thesis-core claims.

1.2 Scientific Claim and Literature Grounding

Primary claim to test. Given a logged egocentric snippet, an actor-visible target record, a finite feasible candidate table, and a fixed acquisition budget, bounded oracle lookahead first exposes whether target-specific RRI has non-myopic headroom under the evaluated split, horizon, branch factor, and candidate distribution. When it does, a learned finite-horizon value model, first implemented as candidate-to-state cross-attention over actor-visible target, map, history, and candidate tokens, should recover measurable headroom over myopic learned one-step selection after oracle re-evaluation.

Explicit non-claims. The thesis does not claim real-device deployment, continuous actor-critic performance, Habitat/Isaac simulator performance, 3DGS-backed control, VLM planning, or full semantic world modeling unless these are produced as lower-priority escalation experiments with mesh/oracle-compatible target-RRI supervision. Invalid candidates are not the lowest RRI class; invalidity is a hard mask and reason-code contract. Hard invalidity covers true infeasibility or missing evaluation samples; low immediate target support is reported as diagnostic evidence unless it prevents a meaningful oracle/evaluation sample.

Source family	Role in ARIA-NBV	Adopt	Do not adopt as thesis core
Project Aria / ASE / ATEK [1], [7], [8]	Actor-visible egocentric sensing and mesh-supervised substrate.	Poses, calibration, camera streams, semi-dense points, snippets, GT meshes for oracle labels.	Dense GT geometry, GT OBBs, or semantic labels as V1 actor input.
EFM3D / EVL [2], [9]	Local egocentric evidence and predicted object support.	EVL local evidence, pre-head features, OBB predictions, actor-visible target descriptors, support masks, and visibility-gated logged DINO descriptors as ablations.	Treating predicted OBBs as GT, ignoring local extent limits, treating EVL heads as the complete scene memory, or using projection-valid DINO samples as if they were visible observations.
VIN-NBV [3]	Closest quality-driven one-step candidate-ranking precedent.	Oracle RRI labels, ordinal ranking, learned myopic scorer baseline.	One-step greedy as sufficient for multi-step thesis claims.
GenNBV / Hestia [4], [5]	Continuous-control and hierarchy references.	5-DoF notation, target-then-pose factorization, directional observability ideas.	Coverage reward or continuous actor-critic as the first thesis-core result.
PB-NBV / active 3DGS / downstream NBV [10], [11], [12], [13], [14], [15], [16]	Proposal, uncertainty, semantic, object-focus, and simulator-bridge signals.	Utility-channel separation, candidate shortlist diagnostics, target-focus arguments.	Replacing mesh-supervised target RRI with projection, uncertainty, or downstream-task proxy labels.
Offline value learning and sequence planning [17], [18], [19], [20], [21], [22], [23], [24], [25]	Finite-action value learning, replay, support, and stochastic rollout guardrails.	Masked Double-Q first, replay rows, support-aware ablations, stochastic diversity after deterministic lookahead.	Optimizing unsupported or invalid actions, or replacing typed \(Q_H\) before rollout data is trusted.
SceneScript and HITL structured scenes [6], [26]	Future semantic/global memory and human correction.	Grounded entity, region, and portal tokens for later planning narratives.	GT semantic commands as actor-visible target inputs.

1.3 Mathematical Model

The thesis-core decision process is a finite-horizon constrained NBV surrogate

\[ \mathcal{M}_{\mathrm{NBV}} = (\mathcal{S}, \mathcal{A}, T, r_e, \gamma, H). \]

At step \(t\), the actor-visible logged state is

\[ s_t^{\mathrm{obs}} = \left(P_t^{\mathrm{semi/fused}}, M_t^{\mathrm{ray}}, F_0^{\mathrm{EVL}}, F_t^{\mathrm{DINO@pt}}, O_t^{\mathrm{pred}}, h_t, b_t\right), \]

while the privileged oracle state augments it with ASE GT assets:

\[ s_t^{\mathrm{oracle}} = \left(s_t^{\mathrm{obs}}, M_{\mathrm{GT}}, \{M_e^{\mathrm{GT}}\}_{e\in\mathcal{E}}, \{P_{q_{t,i}}\}_{i=1}^{N_t}\right). \]

Only s_t^{obs} and counterfactual geometry derived from selected history are actor-visible. GT meshes, GT OBBs, GT crops, and all-candidate GT renders are label/evaluation assets. The planner state is

\[ s_t^{\mathrm{cf0}} = \left(F_0^{\mathrm{EVL}}, P_t^{\mathrm{semi/fused}}, M_t^{\mathrm{ray}}, F_t^{\mathrm{DINO@pt}}, z_e, h_t, b_t, Q_t, m_t, \rho_t\right), \]

where \(P_t^{\mathrm{semi/fused}}\) is the broad actor-visible point state, \(M_t^{\mathrm{ray}}\) is a sparse occupied/free/unknown evidence map derived from logged observations and selected successor geometry, \(F_0^{\mathrm{EVL}}\) is the root local EVL evidence field, \(F_t^{\mathrm{DINO@pt}}\) is an optional visibility-gated logged-frame descriptor bank attached to semidense/fused points, and \(O_t^{\mathrm{pred}}\) stores observed or predicted target hypotheses. EVL is fixed across counterfactual rollout unless a named ablation recomputes it; candidate successors may extend \(P_t^{\mathrm{semi/fused}}\) and \(M_t^{\mathrm{ray}}\) with selected geometry, free-space, support history, and directional memory, but not fresh RGB/DINO/EVL features from unvisited poses. Cube R-CNN-style outputs can enter \(O_t^{\mathrm{pred}}\) as auxiliary target proposals or ROI descriptors after adaptation evidence; they are not the default broad scene memory. \(Q_t=\{q_{t,i}\}_{i=1}^{N_t}\) is the finite candidate table, \(m_{t,i}\in\{0,1\}\) is the hard validity mask, and \(\rho_{t,i}\) is the invalid-reason code. The admissible actions are candidate indices:

\[ \mathcal{A}(s_t) = \{i\in\{1,\ldots,N_t\}:m_{t,i}=1\}. \]

Rollout/data generation uses an oracle target-task protocol first: GT OBBs define identity-valid target tasks, ambiguity checks, target crops, and labels. Actor-visible target selection remains a separate V1 protocol for later deployable claims:

\[ z_e^{\mathrm{oracle}} = \psi_{\mathrm{task}}\!\left(O_{\mathrm{GT}}, P_t^{\mathrm{semi/fused}}, F_0^{\mathrm{EVL}}, I_{1:t}\right), \qquad M_e = \operatorname{crop}\!\left(M_{\mathrm{GT}}, e\right). \]

z_e^{\mathrm{oracle}} may contain OBB center, orientation, extents, class, confidence, projected area, semi-dense support, EVL support, relative pose, and identity-gate diagnostics. M_e is used only for labels and endpoint metrics. The V1 OBS-SEL / PRED-Q / GT-EVAL path later replaces GT target-task input with observed or predicted target descriptors and keeps GT crops hidden from the actor.

Candidate generation is modeled as a logged finite mixture:

\[ q_{t,i}\sim \sum_{k=1}^{K_{\mathrm{fam}}} \pi_k(s_t,z_e)\, \mathcal{G}_k(q\mid s_t,z_e;\eta_k), \qquad \sum_k\pi_k=1. \]

The first sampler should mix target-centric TARGET_POINT candidates with RADIAL_AWAY, RADIAL_TOWARDS, and FORWARD_RIG exploration/continuity families. Candidate provenance, pose jitter, target support, validity, and invalid reason are stored per row.

Let \(C_e(P)\) denote the oracle-only target crop applied to accumulated counterfactual points for target-RRI labels and evaluation. Let \(\Delta_t^e\) be the target-cropped point-mesh oracle error at step \(t\):

\[ \Delta_t^e = d(C_e(P_t),M_e) = D_{P\to M,t}^e + D_{M\to P,t}^e, \]

where \(D_{P\to M,t}^e\) is point-to-mesh accuracy and \(D_{M\to P,t}^e\) is mesh-to-point completeness for the matched target crop. For a selected valid candidate,

\[ P_{t+1}=P_t\cup P_{q_{t,a_t}}, \qquad r_{t,\mathrm{root}}^e = \frac{\Delta_t^e-\Delta_{t+1}^e}{\Delta_0^e+\varepsilon}. \]

The state-relative diagnostic \((\Delta_t^e-\Delta_{t+1}^e)/(\Delta_t^e+\varepsilon)\) remains a VIN-compatible label and one-step analysis signal, not the default rollout or \(Q_H\) training reward.

The additive value-learning return and endpoint reporting metric are kept separate:

\[ G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k,\mathrm{root}}^e, \qquad J_{e,\Delta}^{(H)} = \frac{\Delta_0^e-\Delta_H^e}{\Delta_0^e+\varepsilon}. \]

Non-myopic planning is interpreted through oracle-lookahead headroom. The lookahead policy selects the first action of a bounded search maximizing root-normalized target-gain return; endpoint gain evaluates the resulting trajectory:

\[ \Delta_{\mathrm{look}} = J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}look}}) - J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}1}}). \]

When \(\Delta_{\mathrm{look}}>0\), learned \(Q_H\) is reported by recovered headroom over the learned one-step target scorer:

\[ \eta_Q = \frac{ J_{e,\Delta}^{(H)}(\pi_Q) - J_{e,\Delta}^{(H)}(\pi_{\mathrm{learned\text{-}1}}) }{ J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}look}}) - J_{e,\Delta}^{(H)}(\pi_{\mathrm{learned\text{-}1}}) +\varepsilon }. \]

The one-step target scorer is the myopic control:

\[ \hat r_{t,i}^e = f_\theta^{\mathrm{VIN}}(s_t^{\mathrm{cf0}},z_e,q_{t,i}). \]

The mandatory planner is a masked finite-candidate value model, first implemented as a candidate-to-state query Transformer:

\[ Q_{H,\theta}(s_t^{\mathrm{cf0}},z_e,q_{t,i}) \approx \mathbb{E}\!\left[G_t^{(H)}\mid s_t^{\mathrm{cf0}},z_e,a_t=i\right], \]

decoded from candidate tokens and selected only over valid indices. The first backup is masked Double-Q:

\[ j^\star = \arg\max_{j:m_{t+1,j}=1} Q_\theta(s_{t+1}^{\mathrm{cf0}},z_e,q_{t+1,j}), \]

\[ y_t = r_t^e+\gamma(1-d_t) Q_{\bar\theta}(s_{t+1}^{\mathrm{cf0}},z_e,q_{t+1,j^\star}). \]

All selected actions, including learned actions, are re-evaluated by the oracle before they become thesis evidence.

1.4 Evidence Flow Diagrams

Code

flowchart LR
  ASE["ASE / Project Aria snippet<br/>RGB, poses, calibration, semidense points"] --> EVL["Frozen EFM3D / EVL<br/>local voxel evidence + OBB predictions"]
  ASE --> OracleMesh["GT mesh + GT OBBs<br/>oracle and evaluation only"]
  OracleMesh --> TargetSel["Oracle target-task sampler<br/>identity-valid GT target tasks"]
  EVL -. "audit descriptors" .-> TargetSel
  TargetSel --> CandidateGen["Mixed finite candidates<br/>target-centric + exploration"]
  CandidateGen --> Validity["Validity masks + reasons<br/>collision, bounds, no-depth, support"]
  OracleMesh --> RRIOracle["Oracle scene + target RRI<br/>render, fuse, crop, score"]
  Validity --> RRIOracle
  EVL --> OneStep["Target-conditioned VIN scorer<br/>ordinal RRI ranking"]
  RRIOracle --> OneStep
  OneStep --> Rollouts["Replayable rollout traces<br/>random-valid, greedy, lookahead, temp-softmax"]
  RRIOracle --> Rollouts
  Rollouts --> Headroom["Oracle headroom<br/>lookahead vs one-step greedy"]
  Headroom --> QH["Finite-candidate Q_H<br/>candidate-to-state query first"]
  QH --> Eval["Oracle re-evaluation<br/>endpoint target gain, recovered headroom, cost"]
  Eval --> Thesis["Thesis figures, ablations, failure cases"]

flowchart LR
  ASE["ASE / Project Aria snippet<br/>RGB, poses, calibration, semidense points"] --> EVL["Frozen EFM3D / EVL<br/>local voxel evidence + OBB predictions"]
  ASE --> OracleMesh["GT mesh + GT OBBs<br/>oracle and evaluation only"]
  OracleMesh --> TargetSel["Oracle target-task sampler<br/>identity-valid GT target tasks"]
  EVL -. "audit descriptors" .-> TargetSel
  TargetSel --> CandidateGen["Mixed finite candidates<br/>target-centric + exploration"]
  CandidateGen --> Validity["Validity masks + reasons<br/>collision, bounds, no-depth, support"]
  OracleMesh --> RRIOracle["Oracle scene + target RRI<br/>render, fuse, crop, score"]
  Validity --> RRIOracle
  EVL --> OneStep["Target-conditioned VIN scorer<br/>ordinal RRI ranking"]
  RRIOracle --> OneStep
  OneStep --> Rollouts["Replayable rollout traces<br/>random-valid, greedy, lookahead, temp-softmax"]
  RRIOracle --> Rollouts
  Rollouts --> Headroom["Oracle headroom<br/>lookahead vs one-step greedy"]
  Headroom --> QH["Finite-candidate Q_H<br/>candidate-to-state query first"]
  QH --> Eval["Oracle re-evaluation<br/>endpoint target gain, recovered headroom, cost"]
  Eval --> Thesis["Thesis figures, ablations, failure cases"]

Code

gantt
    title ARIA-NBV thesis roadmap: contracts, target RRI, headroom, Q_H
    dateFormat  YYYY-MM-DD
    axisFormat  %d %b
    excludes    weekends
    section Scope and contracts
    M0 proposal contract                :m0, 2026-04-29, 2026-05-10
    M1 oracle and geometry trust        :m1, 2026-05-11, 2026-05-31
    M2 one-step scorer and scale check  :m2, 2026-06-01, 2026-06-21
    section Target RRI
    M3 V1 target oracle and selector    :m3, 2026-06-22, 2026-07-12
    M4 target-conditioned one-step      :m4, 2026-07-13, 2026-08-09
    section Planning evidence
    M5 lookahead headroom and Q_H       :crit, m5, 2026-08-10, 2026-08-30
    section Scaling and escalation
    M6 online / continuous escalation   :m6, 2026-08-31, 2026-09-13
    M7 final experiments and writing    :crit, m7, 2026-09-14, 2026-09-27
    M8 release freeze                   :crit, m8, 2026-09-28, 2026-09-30

gantt
    title ARIA-NBV thesis roadmap: contracts, target RRI, headroom, Q_H
    dateFormat  YYYY-MM-DD
    axisFormat  %d %b
    excludes    weekends
    section Scope and contracts
    M0 proposal contract                :m0, 2026-04-29, 2026-05-10
    M1 oracle and geometry trust        :m1, 2026-05-11, 2026-05-31
    M2 one-step scorer and scale check  :m2, 2026-06-01, 2026-06-21
    section Target RRI
    M3 V1 target oracle and selector    :m3, 2026-06-22, 2026-07-12
    M4 target-conditioned one-step      :m4, 2026-07-13, 2026-08-09
    section Planning evidence
    M5 lookahead headroom and Q_H       :crit, m5, 2026-08-10, 2026-08-30
    section Scaling and escalation
    M6 online / continuous escalation   :m6, 2026-08-31, 2026-09-13
    M7 final experiments and writing    :crit, m7, 2026-09-14, 2026-09-27
    M8 release freeze                   :crit, m8, 2026-09-28, 2026-09-30

1.5 Milestone Timeline

Dates	Milestone	Exit criteria
2026-04-29 to 2026-05-10	M0 - Scope, repo, docs foundation	Dirty worktree classified; available partial VIN offline store smoke path understood; Rerun inspector smoke command documented; compact proposal freeze and bibliography/source-policy audit tracked as advisor deliverables; finite-candidate \(Q_H\) thesis boundary adopted.
2026-05-11 to 2026-05-31	M1 - Data, cache, oracle correctness	Offline store and RRI contracts stable; public M1 contract report records store/split/frame/CW90/candidate-alignment/depth-backprojection/Rerun normal-boundary-failure evidence; oracle throughput measured.
2026-06-01 to 2026-06-21	M2 - VIN baseline and scale check	One-step VIN baseline reproducible; calibration and ranking plots available; ablation matrix fixed; LRZ sharding/storage plan and Zarr rollout/Q schema are ready enough for M3 scale work.
2026-06-22 to 2026-07-12	M3 - Entity/target-aware RRI and generation readiness	V0 GT-OBB target RRI trusted on a small subset; V1 observed-target / GT-label contract and observed-only target selector defined; deterministic sharding, Slurm/DSS staging, and Zarr store checks pass before full-scale generation.
2026-07-13 to 2026-08-09	M4 - Target-conditioned VIN	Model accepts observed/predicted target encoding and predicts target-specific RRI with matched GT labels; scene-level and target-level scorers compared.
2026-08-10 to 2026-08-30	M5 - Multi-step target-aware rollouts and Q_H	Random-valid, oracle-greedy/lookahead, and oracle-scored temperature-softmax rollout data are stable; oracle lookahead headroom is measured with root-normalized target gain; if headroom is positive, finite-candidate \(Q_H\) is trained and oracle-evaluated by endpoint gain, cumulative target root gain, diagnostic target RRI, and recovered headroom over learned one-step scoring.
2026-08-31 to 2026-09-13	M6 - Scaling and online/continuous escalation	Evaluate whether offline scale, external mesh/oracle-compatible substrates, or online discrete \(Q_H\) are justified after M5; continuous target-then-pose actor-critic remains time-permitting and requires online training evidence.
2026-09-14 to 2026-09-27	M7 - Full-scale experiments and writing	Full 100 GT-mesh ASE scenes / 4,608 snippet windows are generated, or a scene-level held-out subset reports scenes, snippets, targets, trajectories, rollout seeds, transitions, and exact coverage gaps; final tables, figures, failure cases, and thesis narrative frozen.
2026-09-28 to 2026-09-30	M8 - Release freeze	Reproducible configs, docs, demo path, and final smoke checks complete.

1.6 Milestone Details

1.6.1 M0 - Scope, Repo, Docs Foundation

Primary question: Which work is thesis-critical, and which work is support infrastructure?

Implementation surfaces: root README, docs/contents/thesis/roadmap.qmd, docs/contents/thesis/questions.qmd, the Streamlit VIN diagnostics page, the Rerun offline inspector, and the available VIN offline store.

Exit checks:

classify existing worktree changes into data/cache cleanup, RL/rollout scaffold, docs cleanup, and unrelated operator work;
freeze the compact advisor proposal before M1 scale-up: ARIA-NBV name and title capitalization, examiner/supervisor placeholders or explicit blockers, dataset limits, invalidity semantics, V0/V1 target contract, mandatory finite-candidate \(Q_H\) value model, a compact timeline, and the continuous/simulator/3DGS stretch boundary;
audit docs/references.bib for proposal-critical citation hygiene: no generated contentReference comments, no Wikipedia support for proposal-critical claims, primary metadata for cited papers/datasets, and any legacy duplicate aliases explicitly quarantined;
render the active thesis seed with make thesis-pdf and keep archived proposal/advisor handouts as provenance rather than active source owners;
align proposal, roadmap, and questions on the six-RQ thesis boundary: objective/metric split, target and matching protocol, candidate and rollout support, headroom-gated \(Q_H\), scaling, and online/continuous escalation;
validate that offline_only.toml can load the available VIN offline store into the VIN diagnostics/API surface once corrected command behavior and validation land;
save one Rerun recording from a validation sample once the inspector CLI and .configs/rerun_offline.toml are present;
record KG-friendly authoring rules in this roadmap and use them in new public docstrings;
keep thesis scope aligned with RQ1 through RQ6 and the shared evidence protocol.

1.6.2 M1 - Data, Cache, Oracle Correctness

Primary question: Is the supervision substrate trustworthy enough for model and planning claims?

Implementation surfaces: the data-handling API, the offline store contracts, the RRI metric API, and the M1 contract map.

Exit checks:

one canonical offline store path, manifest, sample index, and split contract;
thesis-facing M1 contract report covering offline store version/sample count, split source, sample-index semantics, pose frames, CW90/display-only convention, candidate shell/valid/RRI alignment, depth render/backprojection, known limitations, commands used, and pass/block status;
explicit frame semantics for candidate poses, rig poses, and display-only CW90 corrections;
saved or referenced Rerun smoke recording paths for normal, boundary, and failure samples when available, without committing generated .rrd artifacts;
oracle throughput report on representative snippets;
no aggressive mesh or point-cloud downsampling in fine-detail oracle runs.
no target, rollout, stochastic, or Q_H scale-up until the report is passable or the blockers are explicitly recorded.

1.6.3 M2 - VIN Baseline and Scale Gates

Primary question: What is the reproducible one-step baseline before adding target conditioning?

Implementation surfaces: the VIN model module, the VinModelV3 API, the Lightning training surface, W&B/Optuna reports, and Streamlit diagnostics.

Exit checks:

fixed train/val split, seed set, checkpoint naming, and offline_only.toml baseline run;
calibration, ordinal-bin, rank-correlation, and top-k selection diagnostics;
controlled ablation matrix for surface reconstruction inputs, CORAL variants, auxiliary regression, and candidate features.
Zarr-first rollout/Q storage schema drafted before large sequence data is written;
LRZ deterministic sharding, Slurm/DSS staging, and full-scale storage budget planned before M3/M5 generation.

1.6.4 M3 - Entity/Target-Aware RRI

Primary question: Can the oracle measure improvement for a selected target, not only the whole scene?

Implementation surfaces: GT OBBs from ASE/EFM views, oracle target-task sampling, cropped point/mesh RRI, and diagnostics linking target-level and scene-level metrics. The V1 actor-visible selector is a separate deployable input protocol after the oracle label path is trusted.

Exit checks:

oracle target-task contract: identity-valid GT OBB tasks, GT crop/evaluation, explicit ambiguity diagnostics, and target-RRI labels for rollout/data generation;
V1 deployable-input contract: observed/predicted OBB descriptors matched to GT OBB target-RRI labels under OBS-SEL / PRED-Q / GT-EVAL before actor-visible main-result claims;
observed-target eligibility policy based on predicted OBBs/classes, confidence, projected area, and semidense/EVL point support for the V1 path;
GT-OBB cropped target RRI on a small trusted subset, then full-scale label generation only after the LRZ/Zarr gate passes;
diagnostics showing current points, candidate points, cropped mesh, target OBB, target RRI, and scene RRI side by side;
clear handling of invalid or unsupported targets.

1.6.5 M4 - Target-Conditioned VIN

Primary question: Can a VIN-style scorer rank candidates by target-specific RRI when conditioned on the first actor-visible target encoding?

Implementation surfaces: VIN target encoder, target-aware batch fields, offline-store payload extensions, and model diagnostics.

Exit checks:

target encoding selected for the main controlled run: observed/predicted OBB geometry plus class, confidence, projected area, semidense support, EVL support, and relative pose fields;
compact actor-visible crop descriptor prepared as the first target-input ablation once the OBB-level contract is stable;
target-specific RRI labels loaded through the data-handling surface;
baseline comparison: scene-level scorer, target-conditioned scorer, and oracle target RRI;
one-step scorer evidence gate: held-out ranking, oracle-evaluated model-selected rollouts, calibration and stage-shift diagnostics, and Rerun visualizations of representative successes and failures;
failure cases grouped by occlusion, small target, invalid candidates, and poor target encoding.

1.6.6 M5 - Multi-Step Target-Aware Rollouts and Q_H

Primary question: Does bounded oracle lookahead expose target endpoint-gain headroom over one-step selection, and can learned \(Q_H\) recover that headroom?

Implementation surfaces: the counterfactual rollout API, candidate generation, target-aware scorer backends, rollout diagnostics, and the finite-candidate rollout and Q_H contract.

Exit checks:

compare random-valid, deterministic one-step oracle greedy, the learned one-step target-conditioned scorer, deterministic bounded oracle lookahead, oracle-scored temperature-softmax, and finite-candidate \(Q_H\) under equal budget; add Gumbel-Top-k as preferred later evidence when schedule permits;
make bounded oracle-RRI lookahead versus one-step greedy under equal budget the trusted headroom estimate before interpreting \(Q_H\); the lookahead policy selects by cumulative root-normalized target gain, while endpoint gain evaluates the trajectory;
report \(\Delta_{\mathrm{look}}\) and, when it is positive, the recovered headroom \(\eta_Q\) of \(Q_H\) over the learned one-step target scorer;
require \(Q_H\) to beat the learned one-step scorer and one-step model/greedy selection on endpoint target gain and report cumulative target-root gain under matched acquisition and candidate budgets;
report cumulative target-root gain, diagnostic target RRI, endpoint target gain, scene RRI, number of views, path length, invalid action rate, and runtime;
use cumulative root-normalized target gain as the main multi-step return while preserving current one-step RRI labels; keep log-improvement and episode-normalized rewards as follow-up ablations;
keep rollout complexity bounded by explicit horizon, branch factor, and beam width;
train \(Q_H\) as a finite-candidate value model, first implemented as a candidate-to-state query Transformer over target, ray-aware map, local EVL, history, budget, and candidate tokens to predict one masked bounded-horizon Q value per candidate; candidate-candidate self-attention, scalar motion, and rule penalties are extensions;
require hard validity masks and explicit invalid reason codes in Q_H data;
require learned \(Q_H\) selected actions to be oracle-evaluated under equal acquisition budget; if \(\Delta_{\mathrm{look}}\approx 0\), report no measurable non-myopic headroom for the evaluated split, target set, horizon, branch factor, and candidate distribution instead of overstating a learned planning failure;
visualize representative successes and failures.

1.6.7 M6 - RQ5 Online and RQ6 Continuous Escalation

Primary question: Which online-discrete or continuous escalation is justified after the mandatory offline finite-candidate \(Q_H\) result and RQ4 support evidence?

Implementation surfaces: the counterfactual RL API, transition datasets, online discrete \(Q_H\) in the ASE mesh/oracle loop, and actor-critic or continuous-control design notes. External mesh/oracle-compatible substrate notes remain RQ4 support evidence unless the task explicitly targets online interaction.

Exit checks:

require M5 \(Q_H\) evidence before spending time on quantitative RQ5 online or RQ6 continuous-control work;
decide the exact RQ5 scope for online discrete \(Q_H\) in the current ASE mesh/oracle loop: bridge design only, smoke experiment, or quantitative comparator;
preserve target-specific point-mesh supervision for thesis-grade scaling evidence; proxy coverage, uncertainty, or semantic rewards remain contrast signals;
document how an RQ6 actor would propose continuous target-then-pose actions and handle feasibility after online finite-candidate evidence exists, but do not require a quantitative continuous baseline;
defer imitation-learning variants beyond the planned RRI + \(Q_H\) approach;
defer SB3/DQN/PPO/SAC until an online Gymnasium-style simulator with mesh/oracle target-RRI evaluation exists;
do not claim full continuous RL unless online interaction, reward speed, and evaluation are all thesis-grade.

1.6.8 M7 - Thesis Experiments and Writing

Primary question: What evidence supports the final thesis claim?

Implementation surfaces: final configs, W&B/Optuna export, Typst paper, Quarto docs, and defense figures.

Exit checks:

final experiment table for one-step vs multi-step, scene-level vs target-aware scoring, oracle-lookahead headroom, and \(Q_H\) recovered headroom;
final ablations for target encoding, candidate generation, invalid handling, and supervision scale;
final scale report over the full 100 GT-mesh ASE scenes / 4,608 snippets, or an explicit pass/block coverage report that separates scenes, snippets, targets, trajectories, rollout seeds, transitions, and missing gaps if any subset is unavailable;
final failure-case catalog;
paper, docs, and slides use the same terminology and claims.

1.6.9 M8 - Release Freeze

Primary question: Can the thesis result be reproduced and demonstrated?

Implementation surfaces: tagged configs, smoke tests, final docs, and demo script.

Exit checks:

final smoke matrix passes on the intended machine;
all final figures trace back to configs and run IDs;
public docs link to the final thesis narrative and not to stale scratchpad claims;
no release-critical placeholder, stale path, or wrong repo link remains.

1.7 Required Ablations

Ablations should isolate scientific questions rather than accumulate architecture toggles.

Axis	Levels	Primary evidence
Target input	V0 GT OBB sanity; V1 observed/predicted OBB; V1 plus crop descriptor; optional entity token	target-RRI rank correlation, target top-k hit, endpoint target gain
Candidate mixture	generic shell; target-point only; mixed target plus exploration; PB-NBV/frontier shortlist	valid fraction, target visibility, RRI distribution, selected-view diversity
Objective	scene RRI; target root gain; target RRI diagnostic; log-gain ablation	endpoint target gain, cumulative target-root gain, quality-cost curves
Planner	random-valid; learned one-step scorer; one-step oracle greedy; bounded oracle lookahead; \(Q_H\)	oracle-lookahead headroom, recovered headroom, and oracle-evaluated endpoint gain under equal budget
Invalidity	hard mask; hard mask plus validity head; scalar penalty only after masks work	invalid-action rate, value leakage, invalid-reason distribution
Ordinal/scalar loss	one-step CORAL; balanced CORAL; focal threshold; finite-horizon Huber or quantile return	calibration, Spearman correlation, top-k hit, confusion structure, return-unit calibration
State richness	CF0 geometry-only; ray-aware occupied/free/unknown memory; selected synthetic observations; visibility-gated DINO; optional directional observability	planning improvement, support gap, overfitting gap
Scale and online bridge	small trusted subset; scene-level held-out subset; full 100 GT-mesh scenes; external mesh/oracle-compatible substrate; RQ5 online discrete \(Q_H\)	confidence intervals, coverage report, oracle throughput, recovered headroom under matched budgets

1.8 Evidence Reporting Contract

Every thesis-grade table must report the amount of evidence behind the claim:

\[ \left( N_{\mathrm{scene}}, N_{\mathrm{snippet}}, N_{\mathrm{target}}, N_{\mathrm{candidate}}, N_{\mathrm{transition}}, N_{\mathrm{seed}}, \mathrm{split}, \mathrm{invalid\_rate}, \mathrm{coverage\_gap} \right). \]

Final figures should include a candidate-table diagnostic, an oracle-label diagnostic, a one-step ranking/calibration figure, a multi-step trajectory comparison for greedy/lookahead/\(Q_H\), and a failure-case catalog. Paired method comparisons should use identical roots and candidate budgets:

\[ \Delta_e^{(H)}(A,B) = J_{e,\Delta}^{(H)}(\tau_A)-J_{e,\Delta}^{(H)}(\tau_B), \]

with bootstrap confidence intervals over scene-level or target-level units. Raw endpoint gains are not interpretable unless horizon, target count, candidate distribution, invalidity, and coverage gaps are reported.

Unless stated otherwise, equal budget means equal selected-view horizon \(H\), equal candidate count \(N_q\) per decision step, equal candidate-generation distribution, and matched validity constraints. Path length, runtime, and oracle evaluation count are reported separately; path/time-constrained variants are explicit ablations.

Scale axes must be reported separately: scenes, snippets, anchor poses per trajectory, candidate sets, candidate-distribution variants, targets per snippet, rollout seeds, transitions, and stage/calibration bins. Architecture, rollout, scorer, or \(Q_H\) conclusions must not be compared across runs that silently change scene, snippet, target, or candidate coverage. Final train/validation/test boundaries are scene-level; sample-level leakage across snippets from the same scene is not acceptable for final claims.

The Zarr-first rollout/Q store should avoid duplicating raw ASE/ATEK assets: full meshes remain external path/hash/version references, high-detail target crops are stored once per target with crop metadata, and rollout rows reference those assets. LRZ deterministic sharding, Slurm/DSS staging, resume-safe writes, and storage-budget reporting are hard gates before full-scale generation.

1.9 Risk Register

Risk	Scientific consequence	Mitigation / fallback
Frame, CW90, or projection mismatch	Labels and visualizations can look plausible while measuring the wrong geometry.	M1 frame report, Rerun normal/boundary/failure recordings, pose/camera consistency assertions.
Target labels are sparse or ambiguous	Target RRI becomes unstable or actor leakage creeps in through GT matching.	Eligibility thresholds, V0/V1 separation, explicit unmatched-target counts, matched-GT diagnostics.
Candidate valid fraction is low	\(Q_H\) learns feasibility artifacts instead of utility.	Mixed sampler tuning, reason codes, minimum valid-count gates, candidate provenance diagnostics, and separation of true infeasibility from low immediate target support.
Offline rollouts are narrow	Value learning overfits behavior support and overestimates unsupported actions.	Random-valid, oracle-scored temperature-softmax, later Gumbel-Top-k, scene-level splits, support-aware ablations.
Oracle throughput blocks scale	Final evidence may be too small for broad claims.	Zarr-first storage, LRZ deterministic sharding, subset confidence intervals, exact coverage-gap reporting.
Oracle lookahead has little headroom	The evaluated split, target set, horizon, branch factor, and candidate distribution expose little non-myopic headroom.	Report \(\Delta_{\mathrm{look}}\) honestly, catalog cases where setup actions matter, and do not overclaim \(Q_H\) planning gains.
\(Q_H\) fails despite positive headroom	The learned value model or offline support is insufficient.	Report the failing gate and preserve a defensible target-aware oracle, one-step scorer, and rollout-data study; do not replace it with unvalidated continuous RL.

1.10 KG-Friendly Authoring Rules

These rules apply to new roadmap, research-question, docstring, and thesis writing so the repository can support later knowledge-graph construction.

Give every major concept a stable anchor, a one-sentence definition, known aliases, and links to related internal nodes.
Prefer internal links to Quarto pages, Typst sections, API reference pages, configs, and canonical memory over repeating the same explanation.
Prefer BibTeX citations from docs/references.bib for papers and datasets; use raw external URLs only for tools or libraries without bibliography keys.
Public Python docstrings should state tensor shapes, coordinate frames, units, related config classes, and related data containers when those are part of the contract.
Each milestone should point to its research questions, implementation surfaces, expected tests, and thesis figures once those artifacts exist.

1.11 Standing Verification

Docs changes should render the touched pages and preserve the context index:

cd docs
quarto render contents/thesis/roadmap.qmd
quarto render contents/thesis/questions.qmd
cd ..
scripts/nbv_qmd_outline.sh --compact

For Mermaid edits, validate each diagram source with Mermaid CLI before committing.

For package work, use the verification row in the nearest AGENTS.md; for roadmap or canonical-memory changes, run the matching memory or docs checks.

References

[1]

Meta Platforms Inc., “Aria synthetic environments dataset.” [Online]. Available: https://facebookresearch.github.io/projectaria_tools/docs/open_datasets/aria_synthetic_environments_dataset

[2]

J. Straub, D. DeTone, T. Shen, N. Yang, C. Sweeney, and R. Newcombe, “EFM3D: A benchmark for measuring progress towards 3D egocentric foundation models.” 2024. Available: https://arxiv.org/abs/2406.10224

[3]

N. Frahm et al., “VIN-NBV: A view introspection network for next-best-view selection.” 2025. Available: https://arxiv.org/abs/2505.06219

[4]

C.-Y. Lu et al., “Hestia: Voxel-face-aware hierarchical next-best-view acquisition for efficient 3D reconstruction,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2026. Available: https://openaccess.thecvf.com/content/WACV2026/papers/Lu_Hestia_Voxel-Face-Aware_Hierarchical_Next-Best-View_Acquisition_for_Efficient_3D_Reconstruction_WACV_2026_paper.pdf

[5]

X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html

[6]

A. Avetisyan et al., “SceneScript: Reconstructing scenes with an autoregressive structured language model.” 2024. Available: https://arxiv.org/abs/2403.13064

[7]

J. Engel et al., “Project aria: A new tool for egocentric multi-modal AI research.” 2023. Available: https://arxiv.org/abs/2308.13561

[8]

Meta Platforms Inc., “ATEK data store documentation.” [Online]. Available: https://github.com/facebookresearch/ATEK/blob/main/docs/ATEK_Data_Store.md

[9]

Meta Platforms Inc., “Egocentric voxel lifting (EVL) documentation.” [Online]. Available: https://facebookresearch.github.io/projectaria_tools/docs/open_models/evl

[10]

Z. Jia, Y. Li, Q. Hao, and S. Zhang, “PB-NBV: Efficient projection-based next-best-view planning framework for reconstruction of unknown objects,” IEEE Robotics and Automation Letters, vol. 10, no. 7, pp. 7444–7451, 2025, doi: 10.1109/LRA.2025.3573631.

[11]

X. Pan, Z. Lai, S. Song, and G. Huang, “ActiveNeRF: Learning where to see with uncertainty estimation.” 2022. Available: https://arxiv.org/abs/2209.08546

[12]

W. Jiang, B. Lei, and K. Daniilidis, “FisherRF: Active view selection and uncertainty quantification for radiance fields using fisher information.” 2024. Available: https://arxiv.org/abs/2311.17874

[13]

M. Strong, B. Lei, A. Swann, W. Jiang, K. Daniilidis, and M. K. III, “Next best sense: Guiding vision and touch with FisherRF for 3D gaussian splatting.” 2024. Available: https://arxiv.org/abs/2410.04680

[14]

Y. Li, W. Jiang, and K. Daniilidis, “Next best view selections for semantic and dynamic 3D gaussian splatting.” 2025. Available: https://arxiv.org/abs/2512.22771

[15]

S. Jeong, E. Lee, J. Kim, and A. Kim, “Informative object-centric next best view for object-aware 3D gaussian splatting in cluttered scenes.” 2026. Available: https://arxiv.org/abs/2602.08266

[16]

W. G. Bae, S. Lee, and J. T. Lee, “Finding optimal viewpoints for monocular 3D human pose estimation in dynamic 3D gaussian splatting space,” in Proceedings of the IEEE international conference on advanced video and signal-based surveillance, 2025. doi: 10.1109/AVSS65446.2025.11149906.

[17]

V. Mnih et al., “Playing atari with deep reinforcement learning,” CoRR, vol. abs/1312.5602, 2013, Available: http://arxiv.org/abs/1312.5602

[18]

H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning.” 2015. Available: https://arxiv.org/abs/1509.06461

[19]

I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning.” 2021. Available: https://arxiv.org/abs/2110.06169

[20]

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” in Advances in neural information processing systems, 2020, pp. 1179–1191. Available: https://papers.nips.cc/paper/2020/hash/0d2b2061826a5df3221116a5085a6052-Abstract.html

[21]

S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration,” in Proceedings of the 36th international conference on machine learning, in Proceedings of machine learning research, vol. 97. PMLR, 2019, pp. 2052–2062. Available: https://proceedings.mlr.press/v97/fujimoto19a.html

[22]

L. Chen et al., “Decision transformer: Reinforcement learning via sequence modeling,” in Advances in neural information processing systems, 2021, pp. 15084–15097. Available: https://papers.nips.cc/paper_files/paper/2021/hash/7f489f642a0ddb10272b5c31057f0663-Abstract.html

[23]

M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem.” 2021. Available: https://arxiv.org/abs/2106.02039

[24]

T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in Proceedings of the 34th international conference on machine learning, 2017. Available: https://arxiv.org/abs/1702.08165

[25]

W. Kool, H. van Hoof, and M. Welling, “Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement,” in Proceedings of the 36th international conference on machine learning, 2019. Available: https://arxiv.org/abs/1903.06059

[26]

C. Xie et al., “Human-in-the-loop local corrections of 3D scene layouts via infilling.” 2025. Available: https://arxiv.org/abs/2503.11806