M1 Contract Report
1 M1 Contract Report
This is the working M1 contract report for the data/cache/oracle correctness gate. Status is current as of 2026-05-06; the roadmap M1 window is 2026-05-11 to 2026-05-31. The report separates code-level contract checks that now pass on synthetic fixtures from local-store evidence that is useful for diagnostics but still blocks M1 exit, Rerun evidence, and GPU-backed throughput evidence that must be collected before scale-up.
1.1 Status Summary
| Area | Status | Evidence |
|---|---|---|
| Offline store layout | Pass for source contract | Immutable format is owned by aria_nbv/aria_nbv/data_handling/_offline_store.py, _offline_format.py, _offline_writer.py, and _offline_dataset.py. |
| Split contract | Pass for source contract | splits/*.npy contain global sample indices; _assign_splits uses stable sample-key hashing and preserves sample-index order inside each split. |
| Candidate order | Pass for synthetic guard | CandidateSamplingResult.candidate_shell_indices() maps compact valid views to full-shell indices and rejects ambiguous layouts. |
| Candidate-label alignment | Pass for synthetic guard | prepare_vin_offline_sample() validates candidate poses, depth rows, camera rows, RRI vectors, optional point clouds, and candidate shell indices before writing rows. |
| Rollout replay alignment | Pass for synthetic guard | The historical trace DTO was superseded by RolloutZarrRecord plus factual steps/ and candidates/ tables; invalid rows remain masked with false action/train masks and NaN labels. |
| Frame and CW90 semantics | Pass for documented source contract | Candidate generation applies rotate_yaw_cw90 once to the reference pose; plotting/Rerun display corrections remain display-only. VinModelV3 requires a cw90_corrected camera tag if model-side CW90 undo is enabled. |
| Current local store evidence | Blocked for M1 exit | .configs/offline_only.toml resolves to .data/offline_cache/vin_offline; the evidence snapshot below records manifest v7, 48 sample-index rows, and a partial one-scene diagnostic store. |
| Scene-level split evidence | Blocked | The local store has one scene (81286) and a 38/10 train/val sample split, so it cannot prove the final scene-level split contract. |
| Rerun smoke recordings | Blocked | Normal, boundary, and failure .rrd recordings are not collected in this pass. |
| Oracle throughput | Blocked | No representative GPU/dataset timing was run in this pass. |
1.2 Evidence Snapshot
The current configured store is a diagnostic smoke asset, not final M1 training evidence.
| Evidence item | Status | Value |
|---|---|---|
| Config source | Pass for local smoke | .configs/offline_only.toml uses datamodule_config.source.kind = "offline" and store_dir = "vin_offline". |
| Resolved store root | Pass for local smoke | .data/offline_cache/vin_offline; the manifest records the same repo-local store as an operator-local absolute path. |
| Manifest version | Pass for reader compatibility | version = 6, matching the current immutable reader contract. |
| Manifest hash | Recorded | sha256:9b72f7203942db28f613f4c921ee26eba7ba753c91c5967aae404a933bb3ff00 for manifest.json. |
| Manifest created time | Recorded | 2026-04-30T11:37:11Z. |
| Store status | Blocked for M1 exit | stats.interrupted = true; manifest metadata also says the version-6 migration rewrote only manifest.json, not shard arrays, payloads, index rows, or split files. |
| Sample-index rows | Recorded | 48 JSONL rows, with global sample_index values 0..47. |
| Scene/snippet coverage | Blocked for scale | One scene (81286) and 48 snippet ids; this is not the full 100 GT-mesh ASE / 4,608 snippet target. |
| Split counts | Blocked for scene split | splits/all.npy = 48, splits/train.npy = 38, splits/val.npy = 10; the split policy is sha1(sample_key), and the current one-scene store cannot prove scene-level isolation. |
| Shards | Recorded | 4 shard directories with backbone, depth, candidate, and candidate point-cloud blocks. |
| Materialized blocks | Partial | backbone, depths, and candidate_pcs are present; counterfactuals, detected OBBs, GT OBBs, and trajectory blocks are absent. |
| Rerun normal recording | Blocked | Expected artifact path: .artifacts/rerun/m1-normal.rrd; not collected in this pass. |
| Rerun boundary recording | Blocked | Expected artifact path: .artifacts/rerun/m1-boundary.rrd; not collected in this pass. |
| Rerun failure recording | Blocked | Expected artifact path: .artifacts/rerun/m1-failure.rrd; not collected in this pass. |
| Oracle throughput | Blocked | No representative GPU timing has been recorded for mesh loading, candidate rendering, backprojection, and RRI scoring. |
1.3 Contract Map
| Contract | Current path | Guard or invariant |
|---|---|---|
| Store root | VinOfflineStoreConfig.store_dir |
Store contains manifest.json, sample_index.jsonl, splits/{all,train,val}.npy, and immutable shards/. |
| Manifest version | OFFLINE_DATASET_VERSION in _offline_store.py |
VinOfflineStoreReader rejects unsupported versions and asks for a rebuild. |
| Sample index | VinOfflineIndexRecord in _offline_format.py |
Rows carry global sample_index, sample_key, scene_id, snippet_id, split, shard id, and shard-local row. |
| Splits | _assign_splits() and VinOfflineStoreReader.get_split_records() |
Split arrays reference global sample indices; selected records are returned in split-array order. |
| Candidate shell | CandidateSamplingResult in pose_generation/types.py |
views is normally the compact valid-candidate table; mask_valid and shell_poses describe the full sampled shell. |
| Rendered candidates | CandidateDepths in rendering/candidate_depth_renderer.py |
candidate_indices maps rendered rows back to full-shell indices. |
| Oracle labels | RriResult in rri_metrics/types.py |
RRI and point-to-mesh component vectors are candidate-major and must match rendered candidate count. |
| Offline row prep | prepare_vin_offline_sample() |
Fixed numeric blocks pad candidate-major arrays to max_candidates; invalid padded tails use NaN, False, -1, or zero depending on field type. |
| VIN batch masking | VinOracleBatch.candidate_valid_mask() |
Model/training paths use candidate_count to mask padded tails instead of treating finite tail values as labels. |
| Rollout replay | RolloutZarrRecord and rollouts.zarr factual tables in aria_nbv.rollouts |
Actor-visible full-shell pose/validity fields stay separate from oracle-only score and metric vectors. |
| CW90 | utils.frames.rotate_yaw_cw90, CandidateViewGenerator, VinModelV3 |
Physical/store/model inputs must not receive display-only rotations. |
1.4 Passing Checks
The lightweight checks use CPU-only synthetic fixtures and do not require ASE data, meshes, or a GPU.
cd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py -k 'candidate_label_order or candidate_index_drift or label_length_drift'
uv run pytest tests/pose_generation/test_counterfactuals.py -k 'candidate_depth_renderer or rollout_trace_maps_scores'These checks prove:
- rendered candidate poses, candidate shell indices, RRI labels, and optional diagnostic payloads stay in the same order during offline row preparation;
- the writer rejects candidate-index drift before materializing a shard;
- the writer rejects oracle label vectors whose length no longer matches the rendered candidate table;
- the renderer rejects ambiguous
views/mask_validlayouts instead of silently falling back to sequential candidate ids; - rollout trace serialization preserves full-shell validity,
NaNinvalid rows, selected shell index, scores, and RRI metric vectors.
1.5 Blockers Before M1 Exit
These are not solved by the synthetic guard pass:
- collect fresh diagnostics for the configured immutable store: manifest version, sample count, scene count, block inventory, split counts, candidate count distribution, and RRI summaries;
- replace the current one-scene sample split evidence with a scene-level train/val split or mark the exact scene-level split blocker in the agents DB;
- produce normal, boundary, and failure Rerun recordings through
nbv-rerun-inspectwithout committing generated.rrdartifacts; - measure oracle throughput on representative snippets with the intended mesh, depth-render, backprojection, and RRI settings;
- confirm no fine-detail oracle run uses aggressive mesh or point-cloud downsampling;
- record any real-store blocker in the active agents DB rather than presenting it as a model or planning result.
1.6 Command Ledger
Use these commands to update the report evidence as M1 progresses:
# Evidence snapshot collected from the repo root on 2026-05-06.
sha256sum .data/offline_cache/vin_offline/manifest.json
wc -l .data/offline_cache/vin_offline/sample_index.jsonl
aria_nbv/.venv/bin/python - <<'PY'
from __future__ import annotations
import json
from pathlib import Path
import numpy as np
root = Path(".data/offline_cache/vin_offline")
manifest = json.loads((root / "manifest.json").read_text())
records = [
json.loads(line)
for line in (root / "sample_index.jsonl").read_text().splitlines()
if line
]
print("version", manifest["version"])
print("stats", manifest["stats"])
print("scenes", sorted({record["scene_id"] for record in records}))
print("snippet_count", len({record["snippet_id"] for record in records}))
for split in ("all", "train", "val"):
print(split, int(np.load(root / "splits" / f"{split}.npy").shape[0]))
PYcd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py
uv run pytest tests/pose_generation/test_counterfactuals.py
uv run pytest tests/lightning/test_vin_batch_collate.py
uv run pytest tests/vin/test_vin_model_v3_methods.pycd aria_nbv
uv run nbv-rerun-inspect --config-path ../.configs/rerun_offline.toml --split val --index 0 --save ../.artifacts/rerun/m1-val-000.rrdcd docs
quarto render contents/thesis/m1_contract_report.qmd