M1 Contract Report

1 M1 Contract Report

This is the working M1 contract report for the data/cache/oracle correctness gate. Status is current as of 2026-05-06; the roadmap M1 window is 2026-05-11 to 2026-05-31. The report separates code-level contract checks that now pass on synthetic fixtures from local-store evidence that is useful for diagnostics but still blocks M1 exit, Rerun evidence, and GPU-backed throughput evidence that must be collected before scale-up.

1.1 Status Summary

Area	Status	Evidence
Offline store layout	Pass for source contract	Immutable format is owned by `aria_nbv/aria_nbv/data_handling/_offline_store.py`, `_offline_format.py`, `_offline_writer.py`, and `_offline_dataset.py`.
Split contract	Pass for source contract	`splits/*.npy` contain global sample indices; `_assign_splits` uses stable sample-key hashing and preserves sample-index order inside each split.
Candidate order	Pass for synthetic guard	`CandidateSamplingResult.candidate_shell_indices()` maps compact valid views to full-shell indices and rejects ambiguous layouts.
Candidate-label alignment	Pass for synthetic guard	`prepare_vin_offline_sample()` validates candidate poses, depth rows, camera rows, RRI vectors, optional point clouds, and candidate shell indices before writing rows.
Rollout replay alignment	Pass for synthetic guard	The historical trace DTO was superseded by `RolloutZarrRecord` plus factual `steps/` and `candidates/` tables; invalid rows remain masked with false action/train masks and `NaN` labels.
Frame and CW90 semantics	Pass for documented source contract	Candidate generation applies `rotate_yaw_cw90` once to the reference pose; plotting/Rerun display corrections remain display-only. `VinModelV3` requires a `cw90_corrected` camera tag if model-side CW90 undo is enabled.
Current local store evidence	Blocked for M1 exit	`.configs/offline_only.toml` resolves to `.data/offline_cache/vin_offline`; the evidence snapshot below records manifest v7, 48 sample-index rows, and a partial one-scene diagnostic store.
Scene-level split evidence	Blocked	The local store has one scene (`81286`) and a 38/10 train/val sample split, so it cannot prove the final scene-level split contract.
Rerun smoke recordings	Blocked	Normal, boundary, and failure `.rrd` recordings are not collected in this pass.
Oracle throughput	Blocked	No representative GPU/dataset timing was run in this pass.

1.2 Evidence Snapshot

The current configured store is a diagnostic smoke asset, not final M1 training evidence.

Evidence item	Status	Value
Config source	Pass for local smoke	`.configs/offline_only.toml` uses `datamodule_config.source.kind = "offline"` and `store_dir = "vin_offline"`.
Resolved store root	Pass for local smoke	`.data/offline_cache/vin_offline`; the manifest records the same repo-local store as an operator-local absolute path.
Manifest version	Pass for reader compatibility	`version = 6`, matching the current immutable reader contract.
Manifest hash	Recorded	`sha256:9b72f7203942db28f613f4c921ee26eba7ba753c91c5967aae404a933bb3ff00` for `manifest.json`.
Manifest created time	Recorded	`2026-04-30T11:37:11Z`.
Store status	Blocked for M1 exit	`stats.interrupted = true`; manifest metadata also says the version-6 migration rewrote only `manifest.json`, not shard arrays, payloads, index rows, or split files.
Sample-index rows	Recorded	`48` JSONL rows, with global `sample_index` values `0..47`.
Scene/snippet coverage	Blocked for scale	One scene (`81286`) and 48 snippet ids; this is not the full 100 GT-mesh ASE / 4,608 snippet target.
Split counts	Blocked for scene split	`splits/all.npy = 48`, `splits/train.npy = 38`, `splits/val.npy = 10`; the split policy is `sha1(sample_key)`, and the current one-scene store cannot prove scene-level isolation.
Shards	Recorded	4 shard directories with backbone, depth, candidate, and candidate point-cloud blocks.
Materialized blocks	Partial	`backbone`, `depths`, and `candidate_pcs` are present; counterfactuals, detected OBBs, GT OBBs, and trajectory blocks are absent.
Rerun normal recording	Blocked	Expected artifact path: `.artifacts/rerun/m1-normal.rrd`; not collected in this pass.
Rerun boundary recording	Blocked	Expected artifact path: `.artifacts/rerun/m1-boundary.rrd`; not collected in this pass.
Rerun failure recording	Blocked	Expected artifact path: `.artifacts/rerun/m1-failure.rrd`; not collected in this pass.
Oracle throughput	Blocked	No representative GPU timing has been recorded for mesh loading, candidate rendering, backprojection, and RRI scoring.

1.3 Contract Map

Contract	Current path	Guard or invariant
Store root	`VinOfflineStoreConfig.store_dir`	Store contains `manifest.json`, `sample_index.jsonl`, `splits/{all,train,val}.npy`, and immutable `shards/`.
Manifest version	`OFFLINE_DATASET_VERSION` in `_offline_store.py`	`VinOfflineStoreReader` rejects unsupported versions and asks for a rebuild.
Sample index	`VinOfflineIndexRecord` in `_offline_format.py`	Rows carry global `sample_index`, `sample_key`, `scene_id`, `snippet_id`, split, shard id, and shard-local row.
Splits	`_assign_splits()` and `VinOfflineStoreReader.get_split_records()`	Split arrays reference global sample indices; selected records are returned in split-array order.
Candidate shell	`CandidateSamplingResult` in `pose_generation/types.py`	`views` is normally the compact valid-candidate table; `mask_valid` and `shell_poses` describe the full sampled shell.
Rendered candidates	`CandidateDepths` in `rendering/candidate_depth_renderer.py`	`candidate_indices` maps rendered rows back to full-shell indices.
Oracle labels	`RriResult` in `rri_metrics/types.py`	RRI and point-to-mesh component vectors are candidate-major and must match rendered candidate count.
Offline row prep	`prepare_vin_offline_sample()`	Fixed numeric blocks pad candidate-major arrays to `max_candidates`; invalid padded tails use `NaN`, `False`, `-1`, or zero depending on field type.
VIN batch masking	`VinOracleBatch.candidate_valid_mask()`	Model/training paths use `candidate_count` to mask padded tails instead of treating finite tail values as labels.
Rollout replay	`RolloutZarrRecord` and `rollouts.zarr` factual tables in `aria_nbv.rollouts`	Actor-visible full-shell pose/validity fields stay separate from oracle-only score and metric vectors.
CW90	`utils.frames.rotate_yaw_cw90`, `CandidateViewGenerator`, `VinModelV3`	Physical/store/model inputs must not receive display-only rotations.

1.4 Passing Checks

The lightweight checks use CPU-only synthetic fixtures and do not require ASE data, meshes, or a GPU.

cd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py -k 'candidate_label_order or candidate_index_drift or label_length_drift'
uv run pytest tests/pose_generation/test_counterfactuals.py -k 'candidate_depth_renderer or rollout_trace_maps_scores'

These checks prove:

rendered candidate poses, candidate shell indices, RRI labels, and optional diagnostic payloads stay in the same order during offline row preparation;
the writer rejects candidate-index drift before materializing a shard;
the writer rejects oracle label vectors whose length no longer matches the rendered candidate table;
the renderer rejects ambiguous views/mask_valid layouts instead of silently falling back to sequential candidate ids;
rollout trace serialization preserves full-shell validity, NaN invalid rows, selected shell index, scores, and RRI metric vectors.

1.5 Blockers Before M1 Exit

These are not solved by the synthetic guard pass:

collect fresh diagnostics for the configured immutable store: manifest version, sample count, scene count, block inventory, split counts, candidate count distribution, and RRI summaries;
replace the current one-scene sample split evidence with a scene-level train/val split or mark the exact scene-level split blocker in the agents DB;
produce normal, boundary, and failure Rerun recordings through nbv-rerun-inspect without committing generated .rrd artifacts;
measure oracle throughput on representative snippets with the intended mesh, depth-render, backprojection, and RRI settings;
confirm no fine-detail oracle run uses aggressive mesh or point-cloud downsampling;
record any real-store blocker in the active agents DB rather than presenting it as a model or planning result.

1.6 Command Ledger

Use these commands to update the report evidence as M1 progresses:

# Evidence snapshot collected from the repo root on 2026-05-06.
sha256sum .data/offline_cache/vin_offline/manifest.json
wc -l .data/offline_cache/vin_offline/sample_index.jsonl
aria_nbv/.venv/bin/python - <<'PY'
from __future__ import annotations
import json
from pathlib import Path
import numpy as np

root = Path(".data/offline_cache/vin_offline")
manifest = json.loads((root / "manifest.json").read_text())
records = [
    json.loads(line)
    for line in (root / "sample_index.jsonl").read_text().splitlines()
    if line
]
print("version", manifest["version"])
print("stats", manifest["stats"])
print("scenes", sorted({record["scene_id"] for record in records}))
print("snippet_count", len({record["snippet_id"] for record in records}))
for split in ("all", "train", "val"):
    print(split, int(np.load(root / "splits" / f"{split}.npy").shape[0]))
PY

cd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py
uv run pytest tests/pose_generation/test_counterfactuals.py
uv run pytest tests/lightning/test_vin_batch_collate.py
uv run pytest tests/vin/test_vin_model_v3_methods.py

cd aria_nbv
uv run nbv-rerun-inspect --config-path ../.configs/rerun_offline.toml --split val --index 0 --save ../.artifacts/rerun/m1-val-000.rrd

cd docs
quarto render contents/thesis/m1_contract_report.qmd

--- title: "M1 Contract Report" phase: thesis audience: advisor status: current owner: code format: html --- # M1 Contract Report {#m1-contract-report} This is the working M1 contract report for the data/cache/oracle correctness gate. Status is current as of 2026-05-06; the roadmap M1 window is 2026-05-11 to 2026-05-31. The report separates code-level contract checks that now pass on synthetic fixtures from local-store evidence that is useful for diagnostics but still blocks M1 exit, Rerun evidence, and GPU-backed throughput evidence that must be collected before scale-up. ## Status Summary {#m1-status-summary} | Area | Status | Evidence | |---|---|---| | Offline store layout | Pass for source contract | Immutable format is owned by `aria_nbv/aria_nbv/data_handling/_offline_store.py`, `_offline_format.py`, `_offline_writer.py`, and `_offline_dataset.py`. | | Split contract | Pass for source contract | `splits/*.npy` contain global sample indices; `_assign_splits` uses stable sample-key hashing and preserves sample-index order inside each split. | | Candidate order | Pass for synthetic guard | `CandidateSamplingResult.candidate_shell_indices()` maps compact valid views to full-shell indices and rejects ambiguous layouts. | | Candidate-label alignment | Pass for synthetic guard | `prepare_vin_offline_sample()` validates candidate poses, depth rows, camera rows, RRI vectors, optional point clouds, and candidate shell indices before writing rows. | | Rollout replay alignment | Pass for synthetic guard | The historical trace DTO was superseded by `RolloutZarrRecord` plus factual `steps/` and `candidates/` tables; invalid rows remain masked with false action/train masks and `NaN` labels. | | Frame and CW90 semantics | Pass for documented source contract | Candidate generation applies `rotate_yaw_cw90` once to the reference pose; plotting/Rerun display corrections remain display-only. `VinModelV3` requires a `cw90_corrected` camera tag if model-side CW90 undo is enabled. | | Current local store evidence | Blocked for M1 exit | `.configs/offline_only.toml` resolves to `.data/offline_cache/vin_offline`; the evidence snapshot below records manifest v7, 48 sample-index rows, and a partial one-scene diagnostic store. | | Scene-level split evidence | Blocked | The local store has one scene (`81286`) and a 38/10 train/val sample split, so it cannot prove the final scene-level split contract. | | Rerun smoke recordings | Blocked | Normal, boundary, and failure `.rrd` recordings are not collected in this pass. | | Oracle throughput | Blocked | No representative GPU/dataset timing was run in this pass. | ## Evidence Snapshot {#m1-evidence-snapshot} The current configured store is a diagnostic smoke asset, not final M1 training evidence. | Evidence item | Status | Value | |---|---|---| | Config source | Pass for local smoke | `.configs/offline_only.toml` uses `datamodule_config.source.kind = "offline"` and `store_dir = "vin_offline"`. | | Resolved store root | Pass for local smoke | `.data/offline_cache/vin_offline`; the manifest records the same repo-local store as an operator-local absolute path. | | Manifest version | Pass for reader compatibility | `version = 6`, matching the current immutable reader contract. | | Manifest hash | Recorded | `sha256:9b72f7203942db28f613f4c921ee26eba7ba753c91c5967aae404a933bb3ff00` for `manifest.json`. | | Manifest created time | Recorded | `2026-04-30T11:37:11Z`. | | Store status | Blocked for M1 exit | `stats.interrupted = true`; manifest metadata also says the version-6 migration rewrote only `manifest.json`, not shard arrays, payloads, index rows, or split files. | | Sample-index rows | Recorded | `48` JSONL rows, with global `sample_index` values `0..47`. | | Scene/snippet coverage | Blocked for scale | One scene (`81286`) and 48 snippet ids; this is not the full 100 GT-mesh ASE / 4,608 snippet target. | | Split counts | Blocked for scene split | `splits/all.npy = 48`, `splits/train.npy = 38`, `splits/val.npy = 10`; the split policy is `sha1(sample_key)`, and the current one-scene store cannot prove scene-level isolation. | | Shards | Recorded | 4 shard directories with backbone, depth, candidate, and candidate point-cloud blocks. | | Materialized blocks | Partial | `backbone`, `depths`, and `candidate_pcs` are present; counterfactuals, detected OBBs, GT OBBs, and trajectory blocks are absent. | | Rerun normal recording | Blocked | Expected artifact path: `.artifacts/rerun/m1-normal.rrd`; not collected in this pass. | | Rerun boundary recording | Blocked | Expected artifact path: `.artifacts/rerun/m1-boundary.rrd`; not collected in this pass. | | Rerun failure recording | Blocked | Expected artifact path: `.artifacts/rerun/m1-failure.rrd`; not collected in this pass. | | Oracle throughput | Blocked | No representative GPU timing has been recorded for mesh loading, candidate rendering, backprojection, and RRI scoring. | ## Contract Map {#m1-contract-map} | Contract | Current path | Guard or invariant | |---|---|---| | Store root | `VinOfflineStoreConfig.store_dir` | Store contains `manifest.json`, `sample_index.jsonl`, `splits/{all,train,val}.npy`, and immutable `shards/`. | | Manifest version | `OFFLINE_DATASET_VERSION` in `_offline_store.py` | `VinOfflineStoreReader` rejects unsupported versions and asks for a rebuild. | | Sample index | `VinOfflineIndexRecord` in `_offline_format.py` | Rows carry global `sample_index`, `sample_key`, `scene_id`, `snippet_id`, split, shard id, and shard-local row. | | Splits | `_assign_splits()` and `VinOfflineStoreReader.get_split_records()` | Split arrays reference global sample indices; selected records are returned in split-array order. | | Candidate shell | `CandidateSamplingResult` in `pose_generation/types.py` | `views` is normally the compact valid-candidate table; `mask_valid` and `shell_poses` describe the full sampled shell. | | Rendered candidates | `CandidateDepths` in `rendering/candidate_depth_renderer.py` | `candidate_indices` maps rendered rows back to full-shell indices. | | Oracle labels | `RriResult` in `rri_metrics/types.py` | RRI and point-to-mesh component vectors are candidate-major and must match rendered candidate count. | | Offline row prep | `prepare_vin_offline_sample()` | Fixed numeric blocks pad candidate-major arrays to `max_candidates`; invalid padded tails use `NaN`, `False`, `-1`, or zero depending on field type. | | VIN batch masking | `VinOracleBatch.candidate_valid_mask()` | Model/training paths use `candidate_count` to mask padded tails instead of treating finite tail values as labels. | | Rollout replay | `RolloutZarrRecord` and `rollouts.zarr` factual tables in `aria_nbv.rollouts` | Actor-visible full-shell pose/validity fields stay separate from oracle-only score and metric vectors. | | CW90 | `utils.frames.rotate_yaw_cw90`, `CandidateViewGenerator`, `VinModelV3` | Physical/store/model inputs must not receive display-only rotations. | ## Passing Checks {#m1-passing-checks} The lightweight checks use CPU-only synthetic fixtures and do not require ASE data, meshes, or a GPU. ```sh cd aria_nbv uv run pytest tests/data_handling/test_vin_offline_store.py -k 'candidate_label_order or candidate_index_drift or label_length_drift' uv run pytest tests/pose_generation/test_counterfactuals.py -k 'candidate_depth_renderer or rollout_trace_maps_scores' ``` These checks prove: - rendered candidate poses, candidate shell indices, RRI labels, and optional diagnostic payloads stay in the same order during offline row preparation; - the writer rejects candidate-index drift before materializing a shard; - the writer rejects oracle label vectors whose length no longer matches the rendered candidate table; - the renderer rejects ambiguous `views`/`mask_valid` layouts instead of silently falling back to sequential candidate ids; - rollout trace serialization preserves full-shell validity, `NaN` invalid rows, selected shell index, scores, and RRI metric vectors. ## Blockers Before M1 Exit {#m1-blockers} These are not solved by the synthetic guard pass: - collect fresh diagnostics for the configured immutable store: manifest version, sample count, scene count, block inventory, split counts, candidate count distribution, and RRI summaries; - replace the current one-scene sample split evidence with a scene-level train/val split or mark the exact scene-level split blocker in the agents DB; - produce normal, boundary, and failure Rerun recordings through `nbv-rerun-inspect` without committing generated `.rrd` artifacts; - measure oracle throughput on representative snippets with the intended mesh, depth-render, backprojection, and RRI settings; - confirm no fine-detail oracle run uses aggressive mesh or point-cloud downsampling; - record any real-store blocker in the active agents DB rather than presenting it as a model or planning result. ## Command Ledger {#m1-command-ledger} Use these commands to update the report evidence as M1 progresses: ```sh # Evidence snapshot collected from the repo root on 2026-05-06. sha256sum .data/offline_cache/vin_offline/manifest.json wc -l .data/offline_cache/vin_offline/sample_index.jsonl aria_nbv/.venv/bin/python - <<'PY' from __future__ import annotations import json from pathlib import Path import numpy as np root = Path(".data/offline_cache/vin_offline") manifest = json.loads((root / "manifest.json").read_text()) records = [ json.loads(line) for line in (root / "sample_index.jsonl").read_text().splitlines() if line ] print("version", manifest["version"]) print("stats", manifest["stats"]) print("scenes", sorted({record["scene_id"] for record in records})) print("snippet_count", len({record["snippet_id"] for record in records})) for split in ("all", "train", "val"): print(split, int(np.load(root / "splits" / f"{split}.npy").shape[0])) PY ``` ```sh cd aria_nbv uv run pytest tests/data_handling/test_vin_offline_store.py uv run pytest tests/pose_generation/test_counterfactuals.py uv run pytest tests/lightning/test_vin_batch_collate.py uv run pytest tests/vin/test_vin_model_v3_methods.py ``` ```sh cd aria_nbv uv run nbv-rerun-inspect --config-path ../.configs/rerun_offline.toml --split val --index 0 --save ../.artifacts/rerun/m1-val-000.rrd ``` ```sh cd docs quarto render contents/thesis/m1_contract_report.qmd ```