M1 Contract Report

1 M1 Contract Report

This is the working M1 contract report for the data/cache/oracle correctness gate. Status is current as of 2026-05-06; the roadmap M1 window is 2026-05-11 to 2026-05-31. The report separates code-level contract checks that now pass on synthetic fixtures from local-store evidence that is useful for diagnostics but still blocks M1 exit, Rerun evidence, and GPU-backed throughput evidence that must be collected before scale-up.

1.1 Status Summary

Area Status Evidence
Offline store layout Pass for source contract Immutable format is owned by aria_nbv/aria_nbv/data_handling/_offline_store.py, _offline_format.py, _offline_writer.py, and _offline_dataset.py.
Split contract Pass for source contract splits/*.npy contain global sample indices; _assign_splits uses stable sample-key hashing and preserves sample-index order inside each split.
Candidate order Pass for synthetic guard CandidateSamplingResult.candidate_shell_indices() maps compact valid views to full-shell indices and rejects ambiguous layouts.
Candidate-label alignment Pass for synthetic guard prepare_vin_offline_sample() validates candidate poses, depth rows, camera rows, RRI vectors, optional point clouds, and candidate shell indices before writing rows.
Rollout replay alignment Pass for synthetic guard The historical trace DTO was superseded by RolloutZarrRecord plus factual steps/ and candidates/ tables; invalid rows remain masked with false action/train masks and NaN labels.
Frame and CW90 semantics Pass for documented source contract Candidate generation applies rotate_yaw_cw90 once to the reference pose; plotting/Rerun display corrections remain display-only. VinModelV3 requires a cw90_corrected camera tag if model-side CW90 undo is enabled.
Current local store evidence Blocked for M1 exit .configs/offline_only.toml resolves to .data/offline_cache/vin_offline; the evidence snapshot below records manifest v7, 48 sample-index rows, and a partial one-scene diagnostic store.
Scene-level split evidence Blocked The local store has one scene (81286) and a 38/10 train/val sample split, so it cannot prove the final scene-level split contract.
Rerun smoke recordings Blocked Normal, boundary, and failure .rrd recordings are not collected in this pass.
Oracle throughput Blocked No representative GPU/dataset timing was run in this pass.

1.2 Evidence Snapshot

The current configured store is a diagnostic smoke asset, not final M1 training evidence.

Evidence item Status Value
Config source Pass for local smoke .configs/offline_only.toml uses datamodule_config.source.kind = "offline" and store_dir = "vin_offline".
Resolved store root Pass for local smoke .data/offline_cache/vin_offline; the manifest records the same repo-local store as an operator-local absolute path.
Manifest version Pass for reader compatibility version = 6, matching the current immutable reader contract.
Manifest hash Recorded sha256:9b72f7203942db28f613f4c921ee26eba7ba753c91c5967aae404a933bb3ff00 for manifest.json.
Manifest created time Recorded 2026-04-30T11:37:11Z.
Store status Blocked for M1 exit stats.interrupted = true; manifest metadata also says the version-6 migration rewrote only manifest.json, not shard arrays, payloads, index rows, or split files.
Sample-index rows Recorded 48 JSONL rows, with global sample_index values 0..47.
Scene/snippet coverage Blocked for scale One scene (81286) and 48 snippet ids; this is not the full 100 GT-mesh ASE / 4,608 snippet target.
Split counts Blocked for scene split splits/all.npy = 48, splits/train.npy = 38, splits/val.npy = 10; the split policy is sha1(sample_key), and the current one-scene store cannot prove scene-level isolation.
Shards Recorded 4 shard directories with backbone, depth, candidate, and candidate point-cloud blocks.
Materialized blocks Partial backbone, depths, and candidate_pcs are present; counterfactuals, detected OBBs, GT OBBs, and trajectory blocks are absent.
Rerun normal recording Blocked Expected artifact path: .artifacts/rerun/m1-normal.rrd; not collected in this pass.
Rerun boundary recording Blocked Expected artifact path: .artifacts/rerun/m1-boundary.rrd; not collected in this pass.
Rerun failure recording Blocked Expected artifact path: .artifacts/rerun/m1-failure.rrd; not collected in this pass.
Oracle throughput Blocked No representative GPU timing has been recorded for mesh loading, candidate rendering, backprojection, and RRI scoring.

1.3 Contract Map

Contract Current path Guard or invariant
Store root VinOfflineStoreConfig.store_dir Store contains manifest.json, sample_index.jsonl, splits/{all,train,val}.npy, and immutable shards/.
Manifest version OFFLINE_DATASET_VERSION in _offline_store.py VinOfflineStoreReader rejects unsupported versions and asks for a rebuild.
Sample index VinOfflineIndexRecord in _offline_format.py Rows carry global sample_index, sample_key, scene_id, snippet_id, split, shard id, and shard-local row.
Splits _assign_splits() and VinOfflineStoreReader.get_split_records() Split arrays reference global sample indices; selected records are returned in split-array order.
Candidate shell CandidateSamplingResult in pose_generation/types.py views is normally the compact valid-candidate table; mask_valid and shell_poses describe the full sampled shell.
Rendered candidates CandidateDepths in rendering/candidate_depth_renderer.py candidate_indices maps rendered rows back to full-shell indices.
Oracle labels RriResult in rri_metrics/types.py RRI and point-to-mesh component vectors are candidate-major and must match rendered candidate count.
Offline row prep prepare_vin_offline_sample() Fixed numeric blocks pad candidate-major arrays to max_candidates; invalid padded tails use NaN, False, -1, or zero depending on field type.
VIN batch masking VinOracleBatch.candidate_valid_mask() Model/training paths use candidate_count to mask padded tails instead of treating finite tail values as labels.
Rollout replay RolloutZarrRecord and rollouts.zarr factual tables in aria_nbv.rollouts Actor-visible full-shell pose/validity fields stay separate from oracle-only score and metric vectors.
CW90 utils.frames.rotate_yaw_cw90, CandidateViewGenerator, VinModelV3 Physical/store/model inputs must not receive display-only rotations.

1.4 Passing Checks

The lightweight checks use CPU-only synthetic fixtures and do not require ASE data, meshes, or a GPU.

cd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py -k 'candidate_label_order or candidate_index_drift or label_length_drift'
uv run pytest tests/pose_generation/test_counterfactuals.py -k 'candidate_depth_renderer or rollout_trace_maps_scores'

These checks prove:

  • rendered candidate poses, candidate shell indices, RRI labels, and optional diagnostic payloads stay in the same order during offline row preparation;
  • the writer rejects candidate-index drift before materializing a shard;
  • the writer rejects oracle label vectors whose length no longer matches the rendered candidate table;
  • the renderer rejects ambiguous views/mask_valid layouts instead of silently falling back to sequential candidate ids;
  • rollout trace serialization preserves full-shell validity, NaN invalid rows, selected shell index, scores, and RRI metric vectors.

1.5 Blockers Before M1 Exit

These are not solved by the synthetic guard pass:

  • collect fresh diagnostics for the configured immutable store: manifest version, sample count, scene count, block inventory, split counts, candidate count distribution, and RRI summaries;
  • replace the current one-scene sample split evidence with a scene-level train/val split or mark the exact scene-level split blocker in the agents DB;
  • produce normal, boundary, and failure Rerun recordings through nbv-rerun-inspect without committing generated .rrd artifacts;
  • measure oracle throughput on representative snippets with the intended mesh, depth-render, backprojection, and RRI settings;
  • confirm no fine-detail oracle run uses aggressive mesh or point-cloud downsampling;
  • record any real-store blocker in the active agents DB rather than presenting it as a model or planning result.

1.6 Command Ledger

Use these commands to update the report evidence as M1 progresses:

# Evidence snapshot collected from the repo root on 2026-05-06.
sha256sum .data/offline_cache/vin_offline/manifest.json
wc -l .data/offline_cache/vin_offline/sample_index.jsonl
aria_nbv/.venv/bin/python - <<'PY'
from __future__ import annotations
import json
from pathlib import Path
import numpy as np

root = Path(".data/offline_cache/vin_offline")
manifest = json.loads((root / "manifest.json").read_text())
records = [
    json.loads(line)
    for line in (root / "sample_index.jsonl").read_text().splitlines()
    if line
]
print("version", manifest["version"])
print("stats", manifest["stats"])
print("scenes", sorted({record["scene_id"] for record in records}))
print("snippet_count", len({record["snippet_id"] for record in records}))
for split in ("all", "train", "val"):
    print(split, int(np.load(root / "splits" / f"{split}.npy").shape[0]))
PY
cd aria_nbv
uv run pytest tests/data_handling/test_vin_offline_store.py
uv run pytest tests/pose_generation/test_counterfactuals.py
uv run pytest tests/lightning/test_vin_batch_collate.py
uv run pytest tests/vin/test_vin_model_v3_methods.py
cd aria_nbv
uv run nbv-rerun-inspect --config-path ../.configs/rerun_offline.toml --split val --index 0 --save ../.artifacts/rerun/m1-val-000.rrd
cd docs
quarto render contents/thesis/m1_contract_report.qmd