---
title: "Master Thesis Roadmap"
phase: thesis
audience: advisor
status: current
owner: jan
format: html
---
# Master Thesis Roadmap {#roadmap}
This roadmap covers full-time thesis work from **2026-04-29** through
**2026-09-30**. The planned capacity is approximately 40 hours per week. The
system name is **ARIA-NBV**, and the final claim is centered on
target-conditioned, quality-driven
{{< glsfull next-best-view >}} planning:
> Can ARIA-NBV perform target-conditioned, RRI-based multi-step NBV by training
> a finite-candidate value model $Q_{H,\theta}$ that predicts bounded
> cumulative target-specific RRI for a target of interest and improves endpoint
> target reconstruction quality, after first measuring oracle-lookahead
> headroom over one-step selection under a fixed acquisition budget?
The technical spine follows the current ARIA-NBV system: ASE snippets and
ground-truth meshes [@ProjectAria-ASE-2025], frozen EVL/EFM3D features
[@EFM3D-straub2024], {{< gls oracle-rri >}} supervision inspired by VIN-NBV
[@VIN-NBV-frahm2025], an ASE mesh/oracle counterfactual rollout loop, and a
target-conditioned finite-candidate $Q_H$ value model, first implemented as a
candidate-to-state query Transformer over finite candidate actions and fixed
actor-visible state tokens.
External mesh/oracle-compatible substrates are RQ4 support and scale evidence.
Online discrete $Q_H$ is the RQ5 bridge after offline finite-candidate evidence,
and continuous target-then-pose actor-critic policies are RQ6 headroom tests
after finite-candidate and online-discrete evidence, not substitutes for the M5
$Q_H$ result.
See the
[research questions](questions.qmd), the
[RRI theory page](../theory/rri_theory.qmd), the
[finite-candidate rollout and Q_H contract](../theory/rl_planning.qmd), the
[candidate sampling and target-selection theory page](../theory/candidate_sampling_target_selection.qmd), the
[VIN model API](../../reference/vin.model_v3.qmd), the
[generated implementation contracts](../../reference/index.qmd), and the
[M1 contract report](m1_contract_report.qmd) for the main connected nodes. The
advisor-facing proposal is maintained in
`docs/typst/thesis/main.typ`; archived proposal/advisor Typst handouts under
`.agents/archive/docs/typst/thesis/` are provenance only. The seminar paper is
historical implemented evidence, not the current thesis contract.
## Thesis Outcome {#roadmap-thesis-outcome}
The September deliverable is a reproducible target-aware
{{< gls next-best-view >}} stack:
- a trusted oracle for scene-level and {{< gls target-specific-rri >}};
- a V0/V1 target contract where GT OBBs define target-RRI crops and
observed/predicted OBBs are the main actor-visible input;
- an observed-only automatic target selector and mixed observed candidate set;
- a one-step VIN-style scorer conditioned on the target encoding;
- stable random-valid, oracle-greedy/lookahead, and oracle-scored
temperature-softmax rollout data, with Gumbel-Top-k as preferred later
diversity evidence;
- a mandatory target-conditioned finite-candidate value model $Q_H$ trained from
ASE oracle rollout traces, first implemented as a candidate-to-state query
Transformer;
- final figures, ablations, and failure cases that report cumulative
root-normalized target gain, diagnostic target RRI, and
{{< gls acquisition-cost >}};
- final experiments scaled toward the full 100 GT-mesh ASE scenes and 4,608
snippet windows after small-subset correctness passes, or a scene-level
held-out subset with explicit coverage reporting if full coverage is blocked;
- RQ4 support and scale analysis that preserves mesh/oracle target-RRI
supervision across ASE-wide and external-compatible settings;
- an RQ5 online-discrete bridge before considering time-permitting RQ6
continuous policies.
M5 first measures non-myopic headroom:
bounded oracle lookahead selects actions by the same root-normalized target-gain
return used for $Q_H$ training and must improve endpoint target-quality gain
over one-step oracle greedy before learned $Q_H$ gains are interpreted as
planning gains. If that headroom is positive, learned $Q_H$ is evaluated by
oracle re-evaluation of its selected actions, by endpoint gain, cumulative
target root gain, diagnostic target RRI under matched budgets, and by recovered headroom over the learned
one-step target-conditioned scorer. If headroom is near zero, the thesis reports
no measurable non-myopic headroom for the evaluated split, target set, horizon,
branch factor, and candidate distribution. Scaling then tests whether more
coverage, broader mesh/oracle-compatible substrates, or RQ5 online-discrete
interaction changes that conclusion before RQ6 continuous action spaces are
attempted. Full continuous control, VLM/global planning,
SceneScript-style memory, and real-device deployment remain lower-priority
escalations or future work unless the finite-candidate and scaling evidence
justifies experiment-grade escalation.
Hestia-style hierarchy [@Hestia-lu2026], GenNBV-style continuous control
[@GenNBV-chen2024], and SceneScript-style semantic memory
[@SceneScript-avetisyan2024] are treated as design references, not thesis-core
claims.
## Scientific Claim and Literature Grounding {#roadmap-literature-grounding}
**Primary claim to test.** Given a logged egocentric snippet, an actor-visible
target record, a finite feasible candidate table, and a fixed acquisition
budget, bounded oracle lookahead first exposes whether target-specific RRI has
non-myopic headroom under the evaluated split, horizon, branch factor, and
candidate distribution. When it does, a learned finite-horizon value model,
first implemented as candidate-to-state cross-attention over actor-visible
target, map, history, and candidate tokens, should recover measurable headroom
over myopic learned one-step selection after oracle re-evaluation.
**Explicit non-claims.** The thesis does not claim real-device deployment,
continuous actor-critic performance, Habitat/Isaac simulator performance,
3DGS-backed control, VLM planning, or full semantic world modeling unless these
are produced as lower-priority escalation experiments with mesh/oracle-compatible
target-RRI supervision. Invalid candidates are not the lowest RRI class;
invalidity is a hard mask and reason-code contract.
Hard invalidity covers true infeasibility or missing evaluation samples; low
immediate target support is reported as diagnostic evidence unless it prevents a
meaningful oracle/evaluation sample.
| Source family | Role in ARIA-NBV | Adopt | Do not adopt as thesis core |
|---|---|---|---|
| Project Aria / ASE / ATEK [@projectaria-engel2023; @ProjectAria-ASE-2025; @ATEK-DataStore-2025] | Actor-visible egocentric sensing and mesh-supervised substrate. | Poses, calibration, camera streams, semi-dense points, snippets, GT meshes for oracle labels. | Dense GT geometry, GT OBBs, or semantic labels as V1 actor input. |
| EFM3D / EVL [@EFM3D-straub2024; @EVL-Doc-2025] | Local egocentric evidence and predicted object support. | EVL local evidence, pre-head features, OBB predictions, actor-visible target descriptors, support masks, and visibility-gated logged DINO descriptors as ablations. | Treating predicted OBBs as GT, ignoring local extent limits, treating EVL heads as the complete scene memory, or using projection-valid DINO samples as if they were visible observations. |
| VIN-NBV [@VIN-NBV-frahm2025] | Closest quality-driven one-step candidate-ranking precedent. | Oracle RRI labels, ordinal ranking, learned myopic scorer baseline. | One-step greedy as sufficient for multi-step thesis claims. |
| GenNBV / Hestia [@GenNBV-chen2024; @Hestia-lu2026] | Continuous-control and hierarchy references. | 5-DoF notation, target-then-pose factorization, directional observability ideas. | Coverage reward or continuous actor-critic as the first thesis-core result. |
| PB-NBV / active 3DGS / downstream NBV [@PB-NBV-jia2025; @ActiveNeRF-pan2022; @FisherRF-jiang2024; @NextBestSense-strong2024; @li2025bestviewselectionssemantic; @ObjectCentricNBV-jeong2026; @FOVHPE-bae2025] | Proposal, uncertainty, semantic, object-focus, and simulator-bridge signals. | Utility-channel separation, candidate shortlist diagnostics, target-focus arguments. | Replacing mesh-supervised target RRI with projection, uncertainty, or downstream-task proxy labels. |
| Offline value learning and sequence planning [@DBLP:journals/corr/MnihKSGAWR13; @DoubleDQN-vanHasselt2015; @IQL-kostrikov2021; @CQL-kumar2020; @BCQ-fujimoto2019; @DecisionTransformer-chen2021; @TrajectoryTransformer-janner2021; @DeepEnergyPolicies-haarnoja2017; @GumbelTopK-kool2019] | Finite-action value learning, replay, support, and stochastic rollout guardrails. | Masked Double-Q first, replay rows, support-aware ablations, stochastic diversity after deterministic lookahead. | Optimizing unsupported or invalid actions, or replacing typed $Q_H$ before rollout data is trusted. |
| SceneScript and HITL structured scenes [@SceneScript-avetisyan2024; @HITL-SceneScript-xie2025] | Future semantic/global memory and human correction. | Grounded entity, region, and portal tokens for later planning narratives. | GT semantic commands as actor-visible target inputs. |
## Mathematical Model {#roadmap-mathematical-model}
The thesis-core decision process is a finite-horizon constrained NBV surrogate
$$
\mathcal{M}_{\mathrm{NBV}}
=
(\mathcal{S}, \mathcal{A}, T, r_e, \gamma, H).
$$
At step $t$, the actor-visible logged state is
$$
s_t^{\mathrm{obs}}
=
\left(P_t^{\mathrm{semi/fused}}, M_t^{\mathrm{ray}}, F_0^{\mathrm{EVL}}, F_t^{\mathrm{DINO@pt}}, O_t^{\mathrm{pred}}, h_t, b_t\right),
$$
while the privileged oracle state augments it with ASE GT assets:
$$
s_t^{\mathrm{oracle}}
=
\left(s_t^{\mathrm{obs}}, M_{\mathrm{GT}}, \{M_e^{\mathrm{GT}}\}_{e\in\mathcal{E}}, \{P_{q_{t,i}}\}_{i=1}^{N_t}\right).
$$
Only `s_t^{obs}` and counterfactual geometry derived from selected history are
actor-visible. GT meshes, GT OBBs, GT crops, and all-candidate GT renders are
label/evaluation assets. The planner state is
$$
s_t^{\mathrm{cf0}}
=
\left(F_0^{\mathrm{EVL}}, P_t^{\mathrm{semi/fused}}, M_t^{\mathrm{ray}}, F_t^{\mathrm{DINO@pt}}, z_e, h_t, b_t, Q_t, m_t, \rho_t\right),
$$
where $P_t^{\mathrm{semi/fused}}$ is the broad actor-visible point state,
$M_t^{\mathrm{ray}}$ is a sparse occupied/free/unknown evidence map derived from
logged observations and selected successor geometry, $F_0^{\mathrm{EVL}}$ is the
root local EVL evidence field, $F_t^{\mathrm{DINO@pt}}$ is an optional
visibility-gated logged-frame descriptor bank attached to semidense/fused
points, and $O_t^{\mathrm{pred}}$ stores observed or predicted target
hypotheses. EVL is fixed across counterfactual rollout unless a named ablation
recomputes it; candidate successors may extend $P_t^{\mathrm{semi/fused}}$ and
$M_t^{\mathrm{ray}}$ with selected geometry, free-space, support history, and
directional memory, but not fresh RGB/DINO/EVL features from unvisited poses.
Cube R-CNN-style outputs can enter $O_t^{\mathrm{pred}}$ as auxiliary target
proposals or ROI descriptors after adaptation evidence; they are not the default
broad scene memory. $Q_t=\{q_{t,i}\}_{i=1}^{N_t}$ is the finite candidate table,
$m_{t,i}\in\{0,1\}$ is the hard validity mask, and
$\rho_{t,i}$ is the invalid-reason code. The admissible actions are candidate
indices:
$$
\mathcal{A}(s_t)
=
\{i\in\{1,\ldots,N_t\}:m_{t,i}=1\}.
$$
Rollout/data generation uses an oracle target-task protocol first: GT OBBs
define identity-valid target tasks, ambiguity checks, target crops, and labels.
Actor-visible target selection remains a separate V1 protocol for later
deployable claims:
$$
z_e^{\mathrm{oracle}}
=
\psi_{\mathrm{task}}\!\left(O_{\mathrm{GT}}, P_t^{\mathrm{semi/fused}}, F_0^{\mathrm{EVL}}, I_{1:t}\right),
\qquad
M_e =
\operatorname{crop}\!\left(M_{\mathrm{GT}}, e\right).
$$
`z_e^{\mathrm{oracle}}` may contain OBB center, orientation, extents, class,
confidence, projected area, semi-dense support, EVL support, relative pose, and
identity-gate diagnostics. `M_e` is used only for labels and endpoint metrics.
The V1 OBS-SEL / PRED-Q / GT-EVAL path later replaces GT target-task input with
observed or predicted target descriptors and keeps GT crops hidden from the
actor.
Candidate generation is modeled as a logged finite mixture:
$$
q_{t,i}\sim
\sum_{k=1}^{K_{\mathrm{fam}}}
\pi_k(s_t,z_e)\,
\mathcal{G}_k(q\mid s_t,z_e;\eta_k),
\qquad
\sum_k\pi_k=1.
$$
The first sampler should mix target-centric `TARGET_POINT` candidates with
`RADIAL_AWAY`, `RADIAL_TOWARDS`, and `FORWARD_RIG` exploration/continuity
families. Candidate provenance, pose jitter, target support, validity, and
invalid reason are stored per row.
Let $C_e(P)$ denote the oracle-only target crop applied to accumulated
counterfactual points for target-RRI labels and evaluation. Let $\Delta_t^e$ be
the target-cropped point-mesh oracle error at step $t$:
$$
\Delta_t^e =
d(C_e(P_t),M_e)
=
D_{P\to M,t}^e + D_{M\to P,t}^e,
$$
where $D_{P\to M,t}^e$ is point-to-mesh accuracy and $D_{M\to P,t}^e$ is
mesh-to-point completeness for the matched target crop. For a selected valid
candidate,
$$
P_{t+1}=P_t\cup P_{q_{t,a_t}},
\qquad
r_{t,\mathrm{root}}^e =
\frac{\Delta_t^e-\Delta_{t+1}^e}{\Delta_0^e+\varepsilon}.
$$
The state-relative diagnostic
$(\Delta_t^e-\Delta_{t+1}^e)/(\Delta_t^e+\varepsilon)$ remains a
VIN-compatible label and one-step analysis signal, not the default rollout or
$Q_H$ training reward.
The additive value-learning return and endpoint reporting metric are kept
separate:
$$
G_t^{(H)}=\sum_{k=0}^{H-1}\gamma^k r_{t+k,\mathrm{root}}^e,
\qquad
J_{e,\Delta}^{(H)}
=
\frac{\Delta_0^e-\Delta_H^e}{\Delta_0^e+\varepsilon}.
$$
Non-myopic planning is interpreted through oracle-lookahead headroom. The
lookahead policy selects the first action of a bounded search maximizing
root-normalized target-gain return; endpoint gain evaluates the resulting trajectory:
$$
\Delta_{\mathrm{look}}
=
J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}look}})
-
J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}1}}).
$$
When $\Delta_{\mathrm{look}}>0$, learned $Q_H$ is reported by recovered
headroom over the learned one-step target scorer:
$$
\eta_Q
=
\frac{
J_{e,\Delta}^{(H)}(\pi_Q)
-
J_{e,\Delta}^{(H)}(\pi_{\mathrm{learned\text{-}1}})
}{
J_{e,\Delta}^{(H)}(\pi_{\mathrm{oracle\text{-}look}})
-
J_{e,\Delta}^{(H)}(\pi_{\mathrm{learned\text{-}1}})
+\varepsilon
}.
$$
The one-step target scorer is the myopic control:
$$
\hat r_{t,i}^e
=
f_\theta^{\mathrm{VIN}}(s_t^{\mathrm{cf0}},z_e,q_{t,i}).
$$
The mandatory planner is a masked finite-candidate value model, first
implemented as a candidate-to-state query Transformer:
$$
Q_{H,\theta}(s_t^{\mathrm{cf0}},z_e,q_{t,i})
\approx
\mathbb{E}\!\left[G_t^{(H)}\mid s_t^{\mathrm{cf0}},z_e,a_t=i\right],
$$
decoded from candidate tokens and selected only over valid indices. The first
backup is masked Double-Q:
$$
j^\star
=
\arg\max_{j:m_{t+1,j}=1}
Q_\theta(s_{t+1}^{\mathrm{cf0}},z_e,q_{t+1,j}),
$$
$$
y_t =
r_t^e+\gamma(1-d_t)
Q_{\bar\theta}(s_{t+1}^{\mathrm{cf0}},z_e,q_{t+1,j^\star}).
$$
All selected actions, including learned actions, are re-evaluated by the oracle
before they become thesis evidence.
## Evidence Flow Diagrams {#roadmap-evidence-flow}
```{mermaid}
flowchart LR
ASE["ASE / Project Aria snippet<br/>RGB, poses, calibration, semidense points"] --> EVL["Frozen EFM3D / EVL<br/>local voxel evidence + OBB predictions"]
ASE --> OracleMesh["GT mesh + GT OBBs<br/>oracle and evaluation only"]
OracleMesh --> TargetSel["Oracle target-task sampler<br/>identity-valid GT target tasks"]
EVL -. "audit descriptors" .-> TargetSel
TargetSel --> CandidateGen["Mixed finite candidates<br/>target-centric + exploration"]
CandidateGen --> Validity["Validity masks + reasons<br/>collision, bounds, no-depth, support"]
OracleMesh --> RRIOracle["Oracle scene + target RRI<br/>render, fuse, crop, score"]
Validity --> RRIOracle
EVL --> OneStep["Target-conditioned VIN scorer<br/>ordinal RRI ranking"]
RRIOracle --> OneStep
OneStep --> Rollouts["Replayable rollout traces<br/>random-valid, greedy, lookahead, temp-softmax"]
RRIOracle --> Rollouts
Rollouts --> Headroom["Oracle headroom<br/>lookahead vs one-step greedy"]
Headroom --> QH["Finite-candidate Q_H<br/>candidate-to-state query first"]
QH --> Eval["Oracle re-evaluation<br/>endpoint target gain, recovered headroom, cost"]
Eval --> Thesis["Thesis figures, ablations, failure cases"]
```
```{mermaid}
gantt
title ARIA-NBV thesis roadmap: contracts, target RRI, headroom, Q_H
dateFormat YYYY-MM-DD
axisFormat %d %b
excludes weekends
section Scope and contracts
M0 proposal contract :m0, 2026-04-29, 2026-05-10
M1 oracle and geometry trust :m1, 2026-05-11, 2026-05-31
M2 one-step scorer and scale check :m2, 2026-06-01, 2026-06-21
section Target RRI
M3 V1 target oracle and selector :m3, 2026-06-22, 2026-07-12
M4 target-conditioned one-step :m4, 2026-07-13, 2026-08-09
section Planning evidence
M5 lookahead headroom and Q_H :crit, m5, 2026-08-10, 2026-08-30
section Scaling and escalation
M6 online / continuous escalation :m6, 2026-08-31, 2026-09-13
M7 final experiments and writing :crit, m7, 2026-09-14, 2026-09-27
M8 release freeze :crit, m8, 2026-09-28, 2026-09-30
```
## Milestone Timeline {#roadmap-milestones}
| Dates | Milestone | Exit criteria |
|---|---|---|
| 2026-04-29 to 2026-05-10 | **M0 - Scope, repo, docs foundation** | Dirty worktree classified; available partial VIN offline store smoke path understood; Rerun inspector smoke command documented; compact proposal freeze and bibliography/source-policy audit tracked as advisor deliverables; finite-candidate $Q_H$ thesis boundary adopted. |
| 2026-05-11 to 2026-05-31 | **M1 - Data, cache, oracle correctness** | Offline store and RRI contracts stable; public M1 contract report records store/split/frame/CW90/candidate-alignment/depth-backprojection/Rerun normal-boundary-failure evidence; oracle throughput measured. |
| 2026-06-01 to 2026-06-21 | **M2 - VIN baseline and scale check** | One-step VIN baseline reproducible; calibration and ranking plots available; ablation matrix fixed; LRZ sharding/storage plan and Zarr rollout/Q schema are ready enough for M3 scale work. |
| 2026-06-22 to 2026-07-12 | **M3 - Entity/target-aware RRI and generation readiness** | V0 GT-OBB target RRI trusted on a small subset; V1 observed-target / GT-label contract and observed-only target selector defined; deterministic sharding, Slurm/DSS staging, and Zarr store checks pass before full-scale generation. |
| 2026-07-13 to 2026-08-09 | **M4 - Target-conditioned VIN** | Model accepts observed/predicted target encoding and predicts target-specific RRI with matched GT labels; scene-level and target-level scorers compared. |
| 2026-08-10 to 2026-08-30 | **M5 - Multi-step target-aware rollouts and Q_H** | Random-valid, oracle-greedy/lookahead, and oracle-scored temperature-softmax rollout data are stable; oracle lookahead headroom is measured with root-normalized target gain; if headroom is positive, finite-candidate $Q_H$ is trained and oracle-evaluated by endpoint gain, cumulative target root gain, diagnostic target RRI, and recovered headroom over learned one-step scoring. |
| 2026-08-31 to 2026-09-13 | **M6 - Scaling and online/continuous escalation** | Evaluate whether offline scale, external mesh/oracle-compatible substrates, or online discrete $Q_H$ are justified after M5; continuous target-then-pose actor-critic remains time-permitting and requires online training evidence. |
| 2026-09-14 to 2026-09-27 | **M7 - Full-scale experiments and writing** | Full 100 GT-mesh ASE scenes / 4,608 snippet windows are generated, or a scene-level held-out subset reports scenes, snippets, targets, trajectories, rollout seeds, transitions, and exact coverage gaps; final tables, figures, failure cases, and thesis narrative frozen. |
| 2026-09-28 to 2026-09-30 | **M8 - Release freeze** | Reproducible configs, docs, demo path, and final smoke checks complete. |
## Milestone Details {#roadmap-details}
### M0 - Scope, Repo, Docs Foundation {#roadmap-m0}
**Primary question:** Which work is thesis-critical, and which work is support
infrastructure?
**Implementation surfaces:** root README, `docs/contents/thesis/roadmap.qmd`,
`docs/contents/thesis/questions.qmd`, the Streamlit VIN diagnostics page, the
Rerun offline inspector, and the available VIN offline store.
**Exit checks:**
- classify existing worktree changes into data/cache cleanup, RL/rollout
scaffold, docs cleanup, and unrelated operator work;
- freeze the compact advisor proposal before M1 scale-up: ARIA-NBV name and
title capitalization, examiner/supervisor placeholders or explicit blockers,
dataset limits, invalidity semantics, V0/V1 target contract, mandatory
finite-candidate $Q_H$ value model, a compact timeline, and
the continuous/simulator/3DGS stretch boundary;
- audit `docs/references.bib` for proposal-critical citation hygiene: no
generated `contentReference` comments, no Wikipedia support for
proposal-critical claims, primary metadata for cited papers/datasets, and
any legacy duplicate aliases explicitly quarantined;
- render the active thesis seed with `make thesis-pdf` and keep archived
proposal/advisor handouts as provenance rather than active source owners;
- align proposal, roadmap, and questions on the six-RQ thesis boundary:
objective/metric split, target and matching protocol, candidate and rollout
support, headroom-gated $Q_H$, scaling, and online/continuous escalation;
- validate that `offline_only.toml` can load the available VIN offline store
into the [VIN diagnostics/API surface](../../reference/lightning.qmd)
once corrected command behavior and validation land;
- save one Rerun recording from a validation sample once the inspector CLI and
`.configs/rerun_offline.toml` are present;
- record KG-friendly authoring rules in this roadmap and use them in new public
docstrings;
- keep thesis scope aligned with [RQ1](questions.qmd#rq1-objective) through
[RQ6](questions.qmd#rq6-continuous) and the
[shared evidence protocol](questions.qmd#rq-evidence-protocol).
### M1 - Data, Cache, Oracle Correctness {#roadmap-m1}
**Primary question:** Is the supervision substrate trustworthy enough for model
and planning claims?
**Implementation surfaces:** the
[data-handling API](../../reference/data_handling.qmd), the
[offline store contracts](../../reference/aria_nbv.data_handling.VinSnippetView.qmd),
the [RRI metric API](../../reference/rri_metrics.qmd), and the
[M1 contract map](m1_contract_report.qmd#m1-contract-map).
**Exit checks:**
- one canonical offline store path, manifest, sample index, and split contract;
- thesis-facing M1 contract report covering offline store version/sample count,
split source, sample-index semantics, pose frames, CW90/display-only
convention, candidate shell/valid/RRI alignment, depth render/backprojection,
known limitations, commands used, and pass/block status;
- explicit frame semantics for candidate poses, rig poses, and display-only
CW90 corrections;
- saved or referenced Rerun smoke recording paths for normal, boundary, and
failure samples when available, without committing generated `.rrd`
artifacts;
- oracle throughput report on representative snippets;
- no aggressive mesh or point-cloud downsampling in fine-detail oracle runs.
- no target, rollout, stochastic, or Q_H scale-up until the report is passable
or the blockers are explicitly recorded.
### M2 - VIN Baseline and Scale Gates {#roadmap-m2}
**Primary question:** What is the reproducible one-step baseline before adding
target conditioning?
**Implementation surfaces:** the [VIN model module](../../reference/vin.model_v3.qmd),
the [VinModelV3 API](../../reference/aria_nbv.vin.model_v3.VinModelV3.qmd), the
Lightning training surface, W&B/Optuna reports, and Streamlit diagnostics.
**Exit checks:**
- fixed train/val split, seed set, checkpoint naming, and `offline_only.toml`
baseline run;
- calibration, ordinal-bin, rank-correlation, and top-k selection diagnostics;
- controlled ablation matrix for surface reconstruction inputs, CORAL variants,
auxiliary regression, and candidate features.
- [Zarr-first rollout/Q storage schema](../theory/rl_planning.qmd)
drafted before large sequence data is written;
- LRZ deterministic sharding, Slurm/DSS staging, and full-scale storage budget
planned before M3/M5 generation.
### M3 - Entity/Target-Aware RRI {#roadmap-m3}
**Primary question:** Can the oracle measure improvement for a selected target,
not only the whole scene?
**Implementation surfaces:** GT OBBs from ASE/EFM views, oracle target-task
sampling, cropped point/mesh RRI, and diagnostics linking target-level and
scene-level metrics. The V1 actor-visible selector is a separate deployable
input protocol after the oracle label path is trusted.
**Exit checks:**
- oracle target-task contract: identity-valid GT OBB tasks, GT crop/evaluation,
explicit ambiguity diagnostics, and target-RRI labels for rollout/data
generation;
- V1 deployable-input contract: observed/predicted OBB descriptors matched to
GT OBB target-RRI labels under OBS-SEL / PRED-Q / GT-EVAL before
actor-visible main-result claims;
- observed-target eligibility policy based on predicted OBBs/classes,
confidence, projected area, and semidense/EVL point support for the V1 path;
- GT-OBB cropped target RRI on a small trusted subset, then full-scale label
generation only after the LRZ/Zarr gate passes;
- diagnostics showing current points, candidate points, cropped mesh, target
OBB, target RRI, and scene RRI side by side;
- clear handling of invalid or unsupported targets.
### M4 - Target-Conditioned VIN {#roadmap-m4}
**Primary question:** Can a VIN-style scorer rank candidates by target-specific
RRI when conditioned on the first actor-visible target encoding?
**Implementation surfaces:** VIN target encoder, target-aware batch fields,
offline-store payload extensions, and model diagnostics.
**Exit checks:**
- target encoding selected for the main controlled run: observed/predicted OBB
geometry plus class, confidence, projected area, semidense support, EVL
support, and relative pose fields;
- compact actor-visible crop descriptor prepared as the first target-input
ablation once the OBB-level contract is stable;
- target-specific RRI labels loaded through the data-handling surface;
- baseline comparison: scene-level scorer, target-conditioned scorer, and oracle
target RRI;
- one-step scorer evidence gate: held-out ranking, oracle-evaluated
model-selected rollouts, calibration and stage-shift diagnostics, and Rerun
visualizations of representative successes and failures;
- failure cases grouped by occlusion, small target, invalid candidates, and
poor target encoding.
### M5 - Multi-Step Target-Aware Rollouts and Q_H {#roadmap-m5}
**Primary question:** Does bounded oracle lookahead expose target endpoint-gain
headroom over one-step selection, and can learned $Q_H$ recover that headroom?
**Implementation surfaces:** the
[counterfactual rollout API](../../reference/aria_nbv.pose_generation.CounterfactualPoseGenerator.qmd),
candidate generation, target-aware scorer backends, rollout diagnostics, and
the [finite-candidate rollout and Q_H contract](../theory/rl_planning.qmd).
**Exit checks:**
- compare random-valid, deterministic one-step oracle greedy, the learned
one-step target-conditioned scorer, deterministic bounded oracle lookahead,
oracle-scored temperature-softmax, and finite-candidate $Q_H$
under equal budget; add Gumbel-Top-k as preferred later evidence when
schedule permits;
- make bounded oracle-RRI lookahead versus one-step greedy under equal budget
the trusted headroom estimate before interpreting $Q_H$; the lookahead policy
selects by cumulative root-normalized target gain, while endpoint gain evaluates the trajectory;
- report $\Delta_{\mathrm{look}}$ and, when it is positive, the recovered
headroom $\eta_Q$ of $Q_H$ over the learned one-step target scorer;
- require $Q_H$ to beat the learned one-step scorer and one-step model/greedy
selection on endpoint target gain and report cumulative target-root gain under
matched acquisition and candidate budgets;
- report cumulative target-root gain, diagnostic target RRI, endpoint target
gain, scene RRI, number of views, path length, invalid action rate, and runtime;
- use cumulative root-normalized target gain as the main multi-step return while preserving
current one-step RRI labels; keep log-improvement and episode-normalized
rewards as follow-up ablations;
- keep rollout complexity bounded by explicit horizon, branch factor, and beam
width;
- train $Q_H$ as a finite-candidate value model, first implemented as a
candidate-to-state query Transformer over target, ray-aware map, local EVL,
history, budget, and candidate tokens to predict one masked bounded-horizon Q
value per candidate; candidate-candidate self-attention, scalar motion, and
rule penalties are extensions;
- require hard validity masks and explicit invalid reason codes in Q_H data;
- require learned $Q_H$ selected actions to be oracle-evaluated under equal
acquisition budget; if $\Delta_{\mathrm{look}}\approx 0$, report no measurable
non-myopic headroom for the evaluated split, target set, horizon, branch
factor, and candidate distribution instead of overstating a learned planning
failure;
- visualize representative successes and failures.
### M6 - RQ5 Online and RQ6 Continuous Escalation {#roadmap-m6}
**Primary question:** Which online-discrete or continuous escalation is
justified after the mandatory offline finite-candidate $Q_H$ result and RQ4
support evidence?
**Implementation surfaces:** the
[counterfactual RL API](../../reference/aria_nbv.pose_generation.CounterfactualRolloutResult.qmd),
transition datasets, online discrete $Q_H$ in the ASE mesh/oracle loop, and
actor-critic or continuous-control design notes. External mesh/oracle-compatible
substrate notes remain RQ4 support evidence unless the task explicitly targets
online interaction.
**Exit checks:**
- require M5 $Q_H$ evidence before spending time on quantitative RQ5 online or
RQ6 continuous-control work;
- decide the exact RQ5 scope for online discrete $Q_H$ in the current ASE
mesh/oracle loop: bridge design only, smoke experiment, or quantitative
comparator;
- preserve target-specific point-mesh supervision for thesis-grade scaling
evidence; proxy coverage, uncertainty, or semantic rewards remain contrast
signals;
- document how an RQ6 actor would propose continuous target-then-pose actions
and handle feasibility after online finite-candidate evidence exists, but do
not require a quantitative continuous baseline;
- defer imitation-learning variants beyond the planned RRI + $Q_H$ approach;
- defer SB3/DQN/PPO/SAC until an online Gymnasium-style simulator with
mesh/oracle target-RRI evaluation exists;
- do not claim full continuous RL unless online interaction, reward speed, and
evaluation are all thesis-grade.
### M7 - Thesis Experiments and Writing {#roadmap-m7}
**Primary question:** What evidence supports the final thesis claim?
**Implementation surfaces:** final configs, W&B/Optuna export, Typst paper,
Quarto docs, and defense figures.
**Exit checks:**
- final experiment table for one-step vs multi-step, scene-level vs
target-aware scoring, oracle-lookahead headroom, and $Q_H$ recovered
headroom;
- final ablations for target encoding, candidate generation, invalid handling,
and supervision scale;
- final scale report over the full 100 GT-mesh ASE scenes / 4,608 snippets, or
an explicit pass/block coverage report that separates scenes, snippets,
targets, trajectories, rollout seeds, transitions, and missing gaps if any
subset is unavailable;
- final failure-case catalog;
- paper, docs, and slides use the same terminology and claims.
### M8 - Release Freeze {#roadmap-m8}
**Primary question:** Can the thesis result be reproduced and demonstrated?
**Implementation surfaces:** tagged configs, smoke tests, final docs, and demo
script.
**Exit checks:**
- final smoke matrix passes on the intended machine;
- all final figures trace back to configs and run IDs;
- public docs link to the final thesis narrative and not to stale scratchpad
claims;
- no release-critical placeholder, stale path, or wrong repo link remains.
## Required Ablations {#roadmap-ablations}
Ablations should isolate scientific questions rather than accumulate
architecture toggles.
| Axis | Levels | Primary evidence |
|---|---|---|
| Target input | V0 GT OBB sanity; V1 observed/predicted OBB; V1 plus crop descriptor; optional entity token | target-RRI rank correlation, target top-k hit, endpoint target gain |
| Candidate mixture | generic shell; target-point only; mixed target plus exploration; PB-NBV/frontier shortlist | valid fraction, target visibility, RRI distribution, selected-view diversity |
| Objective | scene RRI; target root gain; target RRI diagnostic; log-gain ablation | endpoint target gain, cumulative target-root gain, quality-cost curves |
| Planner | random-valid; learned one-step scorer; one-step oracle greedy; bounded oracle lookahead; $Q_H$ | oracle-lookahead headroom, recovered headroom, and oracle-evaluated endpoint gain under equal budget |
| Invalidity | hard mask; hard mask plus validity head; scalar penalty only after masks work | invalid-action rate, value leakage, invalid-reason distribution |
| Ordinal/scalar loss | one-step CORAL; balanced CORAL; focal threshold; finite-horizon Huber or quantile return | calibration, Spearman correlation, top-k hit, confusion structure, return-unit calibration |
| State richness | CF0 geometry-only; ray-aware occupied/free/unknown memory; selected synthetic observations; visibility-gated DINO; optional directional observability | planning improvement, support gap, overfitting gap |
| Scale and online bridge | small trusted subset; scene-level held-out subset; full 100 GT-mesh scenes; external mesh/oracle-compatible substrate; RQ5 online discrete $Q_H$ | confidence intervals, coverage report, oracle throughput, recovered headroom under matched budgets |
## Evidence Reporting Contract {#roadmap-evidence-contract}
Every thesis-grade table must report the amount of evidence behind the claim:
$$
\left(
N_{\mathrm{scene}},
N_{\mathrm{snippet}},
N_{\mathrm{target}},
N_{\mathrm{candidate}},
N_{\mathrm{transition}},
N_{\mathrm{seed}},
\mathrm{split},
\mathrm{invalid\_rate},
\mathrm{coverage\_gap}
\right).
$$
Final figures should include a candidate-table diagnostic, an oracle-label
diagnostic, a one-step ranking/calibration figure, a multi-step trajectory
comparison for greedy/lookahead/$Q_H$, and a failure-case catalog. Paired
method comparisons should use identical roots and candidate budgets:
$$
\Delta_e^{(H)}(A,B)
=
J_{e,\Delta}^{(H)}(\tau_A)-J_{e,\Delta}^{(H)}(\tau_B),
$$
with bootstrap confidence intervals over scene-level or target-level units.
Raw endpoint gains are not interpretable unless horizon, target count,
candidate distribution, invalidity, and coverage gaps are reported.
Unless stated otherwise, equal budget means equal selected-view horizon $H$,
equal candidate count $N_q$ per decision step, equal candidate-generation
distribution, and matched validity constraints. Path length, runtime, and oracle
evaluation count are reported separately; path/time-constrained variants are
explicit ablations.
Scale axes must be reported separately: scenes, snippets, anchor poses per
trajectory, candidate sets, candidate-distribution variants, targets per
snippet, rollout seeds, transitions, and stage/calibration bins. Architecture,
rollout, scorer, or $Q_H$ conclusions must not be compared across runs that
silently change scene, snippet, target, or candidate coverage. Final
train/validation/test boundaries are scene-level; sample-level leakage across
snippets from the same scene is not acceptable for final claims.
The Zarr-first rollout/Q store should avoid duplicating raw ASE/ATEK assets:
full meshes remain external path/hash/version references, high-detail target
crops are stored once per target with crop metadata, and rollout rows reference
those assets. LRZ deterministic sharding, Slurm/DSS staging, resume-safe writes,
and storage-budget reporting are hard gates before full-scale generation.
## Risk Register {#roadmap-risks}
| Risk | Scientific consequence | Mitigation / fallback |
|---|---|---|
| Frame, CW90, or projection mismatch | Labels and visualizations can look plausible while measuring the wrong geometry. | M1 frame report, Rerun normal/boundary/failure recordings, pose/camera consistency assertions. |
| Target labels are sparse or ambiguous | Target RRI becomes unstable or actor leakage creeps in through GT matching. | Eligibility thresholds, V0/V1 separation, explicit unmatched-target counts, matched-GT diagnostics. |
| Candidate valid fraction is low | $Q_H$ learns feasibility artifacts instead of utility. | Mixed sampler tuning, reason codes, minimum valid-count gates, candidate provenance diagnostics, and separation of true infeasibility from low immediate target support. |
| Offline rollouts are narrow | Value learning overfits behavior support and overestimates unsupported actions. | Random-valid, oracle-scored temperature-softmax, later Gumbel-Top-k, scene-level splits, support-aware ablations. |
| Oracle throughput blocks scale | Final evidence may be too small for broad claims. | Zarr-first storage, LRZ deterministic sharding, subset confidence intervals, exact coverage-gap reporting. |
| Oracle lookahead has little headroom | The evaluated split, target set, horizon, branch factor, and candidate distribution expose little non-myopic headroom. | Report $\Delta_{\mathrm{look}}$ honestly, catalog cases where setup actions matter, and do not overclaim $Q_H$ planning gains. |
| $Q_H$ fails despite positive headroom | The learned value model or offline support is insufficient. | Report the failing gate and preserve a defensible target-aware oracle, one-step scorer, and rollout-data study; do not replace it with unvalidated continuous RL. |
## KG-Friendly Authoring Rules {#kg-authoring-rules}
These rules apply to new roadmap, research-question, docstring, and thesis
writing so the repository can support later knowledge-graph construction.
- Give every major concept a stable anchor, a one-sentence definition, known
aliases, and links to related internal nodes.
- Prefer internal links to Quarto pages, Typst sections, API reference pages,
configs, and canonical memory over repeating the same explanation.
- Prefer BibTeX citations from `docs/references.bib` for papers and datasets;
use raw external URLs only for tools or libraries without bibliography keys.
- Public Python docstrings should state tensor shapes, coordinate frames, units,
related config classes, and related data containers when those are part of the
contract.
- Each milestone should point to its research questions, implementation
surfaces, expected tests, and thesis figures once those artifacts exist.
## Standing Verification {#roadmap-verification}
Docs changes should render the touched pages and preserve the context index:
```sh
cd docs
quarto render contents/thesis/roadmap.qmd
quarto render contents/thesis/questions.qmd
cd ..
scripts/nbv_qmd_outline.sh --compact
```
For Mermaid edits, validate each diagram source with Mermaid CLI before
committing.
For package work, use the verification row in the nearest `AGENTS.md`; for
roadmap or canonical-memory changes, run the matching memory or docs checks.