Candidate Sampling And Target Selection

1 Candidate Sampling And Target Selection

This page owns the reader-facing theory contract for target-conditioned sampling in ARIA-NBV. It explains how actor-visible targets are selected, how a finite candidate table is sampled, how mixed candidate families are combined, and how rollout branch selection connects those candidates to target-RRI and \(Q_H\).

The finite-horizon value-learning contract is maintained in Finite-Candidate Rollout And Q_H Contract. The metric contract is maintained in RRI Theory. Generated API details come from the pose_generation and data_handling._target_selection docstrings.

This page distinguishes the current target-first thesis contract from scale-up policy. The formulas below define operational utilities and provenance fields; they are not claims that the default thresholds, mixture weights, or target-ranking products are statistically optimal. Before thesis-scale generation, the selector and sampler must report separate eligibility, interest, matching, validity, and provenance diagnostics.

1.1 Actor-Visible Target Selection

The V1 target source order is actor-visible by construction. The selector first uses detected OBB records when present, then falls back to EVL/backbone predicted OBB records. Ground-truth OBBs are forbidden as V1 actor input. They may appear only in explicit V0 sanity mode or after V1 selection as oracle-only GT match audit fields.

Target selection is a three-stage contract. A hard eligibility mask decides whether the actor-visible row is labelable; a target interest score orders or samples eligible rows; and the post-selection GT match audit decides whether the selected row can receive target-RRI supervision. These stages are kept separate so a weakly observed target fails as a labelability case rather than being misreported as a geometric GT-match failure.

Code

---
config:
  htmlLabels: true
  flowchart:
    htmlLabels: true
    nodeSpacing: 16
    rankSpacing: 26
  layout: elk
  themeVariables:
    fontSize: "18px"
  themeCSS: |
    .nodeLabel { font-size: 18px; }
    .node span.nodeLabel { color: #1F2937 !important; fill: #1F2937 !important; stroke: none !important; }
    .edgeLabel { font-size: 16px; }
    .cluster-label { font-size: 19px; font-weight: 700; }
---
flowchart TB
  subgraph Actor["Actor-visible selection"]
    direction TB
    Obs["detected OBBs<br/>else predicted OBBs"]:::input
    Rows["target rows<br/>confidence + support + projection"]:::compute
    Mask["hard mask m_e<br/>finite, visible, supported"]:::compute
    Score["score S_e<br/>p_e s_vis s_sup s_def"]:::compute
    Policy["greedy top-K<br/>or target softmax"]:::compute
    Selected["selected target rows z_e"]:::output
  end

  subgraph Oracle["Oracle audit after selection"]
    direction TB
    Gt["GT OBB table<br/>evaluation only"]:::data
    Match["semantic IoU match<br/>G(e,g) threshold"]:::compute
    Audit["gt_label_valid<br/>match status + reason bits"]:::data
  end

  Obs --> Rows --> Mask --> Score --> Policy --> Selected
  Selected --> Match
  Gt --> Match --> Audit

  classDef input fill:#D5E8D4,stroke:#82B366,stroke-width:1.5px,rx:0,ry:0;
  classDef output fill:#F8CECC,stroke:#B85450,stroke-width:1.5px,rx:0,ry:0;
  classDef compute fill:#E1D5E7,stroke:#9673A6,stroke-width:1.5px,rx:8,ry:8;
  classDef data fill:#F5F5F5,stroke:#9E9E9E,stroke-width:1.2px,rx:0,ry:0;

  style Actor fill:#f1fff6,stroke:#97ddb5,stroke-width:2px,rx:12,ry:12
  style Oracle fill:#fff8e8,stroke:#e0b66a,stroke-width:2px,rx:12,ry:12

For every actor-visible entity candidate \(e\), the selector combines semi-dense support and EVL support into an effective support count. The coefficient \(w_{\mathrm{EVL}}\) is the configured evl_support_weight, with current default 1.0. It treats each positive EVL support location as one weighted support vote. This is an operational ablation/config knob, not a learned value or a claim of optimal calibration. \[ n_e^{\mathrm{eff}} = n_e^{\mathrm{semi}} + w_{\mathrm{EVL}} n_e^{\mathrm{EVL}}. \]

The support score verifies that the target has enough actor-visible evidence to be labelable. The threshold \(n_{\min}\) is the configured min_support_points, with current default 3, and is also the hard minimum effective support required for target eligibility. \[ s_e^{\mathrm{sup}} = \operatorname{clip} \left( \frac{n_e^{\mathrm{eff}}}{n_{\min}}, 0,1 \right). \]

The deficit score deliberately favors targets that are supported but not already saturated. The threshold \(n_{\mathrm{sat}}\) is the configured support_saturation_points, with current default 128. It is the soft support saturation point at which this deficit term reaches zero.

\[ s_e^{\mathrm{def}} = 1- \operatorname{clip} \left( \frac{n_e^{\mathrm{eff}}}{n_{\mathrm{sat}}}, 0,1 \right). \]

For projected visibility, let \(A_e\) be the maximum clipped visible 2D OBB overlap over actor-visible RGB/SLAM camera boxes:

\[ A_e = [\min(x_2,W)-\max(x_1,0)]_+ [\min(y_2,H)-\max(y_1,0)]_+. \]

This single persisted projected-area field is used for target feasibility and scoring. Raw projected extents outside the image are not part of the main target contract because they can inflate visibility for boxes that are mostly or entirely outside the image. Let \(A_{\mathrm{img}}\) be the configured projected_area_normalizer_pixels, with current default 240*240, and define \(a_e=A_e/A_{\mathrm{img}}\). The full-score fraction \(a_{\max}\) is the configured projected_area_full_score_fraction, with current default 0.05.

\[ x_e=\operatorname{clip}(a_e/a_{\max},0,1), \qquad s_e^{\mathrm{vis}}=x_e^2(3-2x_e). \]

The visibility curve is the standard cubic smoothstep. It is used as a smooth operational weighting because it maps [0,1] to [0,1] with zero slope at both endpoints. If no projected box exists, visibility is either a hard invalidity when require_projected_visibility=True, or the configured fallback score missing_projection_visibility_score, with current default 0.35.

Hard eligibility is:

\[ m_e=1 \iff p_e\ge\tau_p \land n_e^{\mathrm{eff}}\ge n_{\min} \land \mathrm{finite}(\hat B_e) \land \mathrm{visibility\ policy\ passes}. \]

The baseline actor-visible target interest score is:

\[ S_e = p_e s_e^{\mathrm{vis}} s_e^{\mathrm{sup}} s_e^{\mathrm{def}}, \]

where \(p_e\) is the OBB confidence as predicted by the EFM.

This product score is an operational baseline interest utility. For scale-up audits, keep its factors separable: confidence, projected visibility, support sufficiency, and support deficit should be reported independently so failed target rows can be attributed to a concrete gate rather than to a low product.

Greedy top-K selection is deterministic:

\[ E_K=\operatorname{TopK}_{e:m_e=1} S_e. \]

The subscript \(e:m_e=1\) means that top-K is taken only over eligible target rows.

The stochastic target-selection policy samples top-K without replacement:

\[ P(e\mid R) = \frac{\exp(S_e/\tau)} {\sum_{j\in R}\exp(S_j/\tau)}, \]

where \(R\) is the remaining eligible, not-yet-selected target set at the current sampling step and \(\tau\) is the target temperature. This local symbol is not a global glossary term. This stochastic policy operates over target rows, not candidate actions. Because \(S_e\) is a bounded product score, raw-score softmax can become nearly uniform for many small scores or brittle for very small temperatures. The default thesis-scale target sampling policy remains deferred until coverage audits are available. The leading candidate is stratified target sampling over support, projected visibility, distance, and class bins; robust target logits and greedy top-K remain comparison policies. Rollout branch softmax already uses robust score normalization and should not be conflated with this target-row policy.

After selection, the GT label audit matches the selected actor-visible row to a GT OBB:

\[ G(e,g) = \mathbb{1}[\kappa(\hat y_e,y_g)=1]\, \operatorname{IoU}_{3D}(\hat B_e,B_g^{GT}). \]

The match is accepted only if semantic class, IoU, match score, and ambiguity thresholds pass. The selected target row therefore contains actor-visible fields for the model plus oracle-only GT match fields for label validity and debugging. Support and visibility remain eligibility and audit fields; they do not rank competing GT objects.

1.2 Candidate Center Sampling

Candidate generation produces a full sampled shell and a compact valid table. The full shell retains invalid candidates, masks, reason codes, and provenance. The compact table exposes only valid candidates to rendering, rollout policies, and \(Q_H\) candidate tokens.

Code

---
config:
  htmlLabels: true
  flowchart:
    htmlLabels: true
    nodeSpacing: 16
    rankSpacing: 26
  layout: elk
  themeVariables:
    fontSize: "18px"
  themeCSS: |
    .nodeLabel { font-size: 18px; }
    .node span.nodeLabel { color: #1F2937 !important; fill: #1F2937 !important; stroke: none !important; }
    .edgeLabel { font-size: 16px; }
    .cluster-label { font-size: 19px; font-weight: 700; }
---
flowchart LR
  Ref["reference pose<br/>world <- rig"]:::input
  Grav["gravity-aligned<br/>sampling frame"]:::compute
  Dir["direction draw<br/>uniform or PowerSpherical"]:::compute
  Caps["azimuth/elevation caps<br/>radius sample"]:::compute
  Pos["position family<br/>forward, target, bypass"]:::compute
  Orient["orientation builder<br/>rig, radial, target look-at"]:::compute
  Prune["pruning rules<br/>bounds, mesh, path, motion"]:::compute
  Shell["full shell<br/>masks + provenance"]:::data
  Valid["compact valid table<br/>finite actions"]:::output

  Ref --> Grav --> Dir --> Caps --> Pos --> Orient --> Prune
  Prune --> Shell
  Shell --> Valid

  classDef input fill:#D5E8D4,stroke:#82B366,stroke-width:1.5px,rx:0,ry:0;
  classDef output fill:#F8CECC,stroke:#B85450,stroke-width:1.5px,rx:0,ry:0;
  classDef compute fill:#E1D5E7,stroke:#9673A6,stroke-width:1.5px,rx:8,ry:8;
  classDef data fill:#F5F5F5,stroke:#9E9E9E,stroke-width:1.2px,rx:0,ry:0;

Candidate centers are sampled in a gravity-aligned reference frame, then transformed to the world frame. A raw direction can be uniform on the sphere:

\[ u\sim \mathrm{Unif}(\mathbb{S}^2), \]

or forward-biased:

\[ u\sim \mathrm{PowerSpherical}(\mu=e_z,\kappa), \qquad p(u)\propto (1+\mu^\top u)^\kappa. \]

Angular caps map the raw sampled direction into the configured yaw and elevation limits without rejection:

\[ \psi=\operatorname{atan2}(u_x,u_z), \qquad \psi'=\psi\frac{\Delta\psi}{2\pi}, \]

\[ y'=\sin\theta_{\min} + \frac{u_y+1}{2} \left(\sin\theta_{\max}-\sin\theta_{\min}\right). \]

The candidate radius is then drawn uniformly:

\[ r\sim \mathcal{U}(r_{\min},r_{\max}), \qquad o_r=r\,d_r, \qquad c_w=T^w_r o_r. \]

The position family transforms the capped direction \(d_r\) into a concrete spatial prior:

Position family	Role
`upper_bound_free_shell`	Ablation shell using the capped sampled direction directly.
`forward_local`	Local continuity around the device forward axis with spread `0.45`.
`local_refinement`	Ablation-only tighter local continuity around forward with spread `0.25`.
`revisit_backtrack`	Ablation-only backtracking family around the local backward direction with spread `0.35`.
`target_bearing_local`	Actor-visible target-centric family around the selected target bearing with spread `0.4`.
`lateral_target_bypass`	Side-step family that combines target bearing, signed lateral bypass, and bounded vertical offset.

For lateral target bypass, the direction is:

\[ d' = \operatorname{norm} \left( 0.55\,b_e + 0.85\,\operatorname{sign}(u_x)\ell_e + \operatorname{clip}(u_y,-0.35,0.35)u_{\mathrm{up}} \right), \]

where \(b_e\) is the actor-visible target bearing, \(\ell_e\) is the horizontal lateral direction around that bearing, and \(u_{\mathrm{up}}\) is world up in the sampling frame.

The caps above constrain the raw draw, not necessarily the final movement direction after a target-aware position family has blended or rebuilt the direction. Final egocentric realism is therefore owned by post-mode diagnostics and pruning: report final azimuth/elevation, step length, height delta, backward displacement, and target-bearing statistics by component. A stricter profile may horizontalize target-bearing displacement so target height affects orientation while center motion stays in the local walking plane.

1.3 Layer 3: Candidate Orientation And Pruning

Candidate orientation is built after center sampling. The orientation mode controls the base camera frame:

Orientation mode	Meaning
`forward_rig`	Reuse the reference rig rotation at each candidate center.
`radial_away`	Look along the reference-to-candidate ray.
`radial_towards`	Look back along the candidate-to-reference ray.
`target_point`	Look at the selected actor-visible target center.

For a target look-at frame at candidate center \(c_w\) and target center \(p_e\):

\[ z_w=\operatorname{norm}(p_e-c_w), \]

\[ y_w=\operatorname{norm}(u_{\mathrm{up}}-(u_{\mathrm{up}}^\top z_w)z_w), \qquad x_w=y_w\times z_w. \]

Optional view jitter samples yaw and pitch deltas in camera coordinates:

\[ \delta\psi\sim \mathcal{U}(-\psi_{\max},\psi_{\max}), \qquad \delta\theta\sim \mathcal{U}(-\theta_{\max},\theta_{\max}), \]

\[ z_c' = \begin{bmatrix} \cos\delta\theta\sin\delta\psi\\ \sin\delta\theta\\ \cos\delta\theta\cos\delta\psi \end{bmatrix}. \]

Pruning keeps invalidity as an explicit mask contract. A candidate is valid only if its center lies in the occupancy support,

\[ c_i\in B_{\mathrm{occ}}, \]

it satisfies GT-mesh clearance,

\[ d_i=\min_{x\in\mathcal{M}_{GT}}\lVert c_i-x\rVert_2 > d_{\min}, \]

and it respects egocentric motion bounds,

\[ \lVert o_i\rVert_2\le d_{\max}, \quad |\Delta h_i|\le h_{\max}, \quad \max(0,-o_{i,z})\le b_{\max}. \]

Additional guards may reject path collisions, excessive yaw changes, or missing candidate depths. These invalid candidates remain in the full shell with reason codes and NaN labels; they are not low-quality valid actions.

Mixture component counts are full-shell counts, not guaranteed valid-action counts. A scale-ready sampler must therefore report valid candidates per step and per component, invalid reasons per component, and selected/gain histograms before interpreting low headroom as a planning result.

1.4 Layer 4: Mixture And Rollout Branch Selection

The current default target-conditioned sampler is a 60-row mixed candidate table:

Code

---
config:
  htmlLabels: true
  flowchart:
    htmlLabels: true
    nodeSpacing: 16
    rankSpacing: 24
  layout: elk
  themeVariables:
    fontSize: "18px"
  themeCSS: |
    .nodeLabel { font-size: 18px; }
    .node span.nodeLabel { color: #1F2937 !important; fill: #1F2937 !important; stroke: none !important; }
    .edgeLabel { font-size: 16px; }
    .cluster-label { font-size: 19px; font-weight: 700; }
---
flowchart TB
  Target["selected actor-visible target"]:::input
  Ref["current reference pose"]:::input

  subgraph Mix["60-row default candidate mixture"]
    direction LR
    Targ["target_bearing_local<br/>18 rows"]:::compute
    Fwd["forward_local<br/>18 rows"]:::compute
    Lat["lateral_target_bypass<br/>12 rows"]:::compute
    Refine["local_refinement<br/>6 rows"]:::compute
    Back["revisit_backtrack<br/>6 rows"]:::compute
  end

  Prov["stable provenance<br/>position_id + strategy_id + mixture_id"]:::data
  Table["full shell<br/>sampler_probability = 1/N"]:::output

  Target --> Targ
  Target --> Lat
  Ref --> Targ
  Ref --> Fwd
  Ref --> Lat
  Ref --> Refine
  Ref --> Back
  Targ --> Prov
  Fwd --> Prov
  Lat --> Prov
  Refine --> Prov
  Back --> Prov
  Prov --> Table

  classDef input fill:#D5E8D4,stroke:#82B366,stroke-width:1.5px,rx:0,ry:0;
  classDef output fill:#F8CECC,stroke:#B85450,stroke-width:1.5px,rx:0,ry:0;
  classDef compute fill:#E1D5E7,stroke:#9673A6,stroke-width:1.5px,rx:8,ry:8;
  classDef data fill:#F5F5F5,stroke:#9E9E9E,stroke-width:1.2px,rx:0,ry:0;

  style Mix fill:#f0fbff,stroke:#8fd0ff,stroke-width:2px,rx:12,ry:12

Component	Count	Position family	Orientation family
`forward_local`	24	`forward_local`	`forward_rig`
`target_bearing_local`	24	`target_bearing_local`	`target_point`
`lateral_target_bypass`	12	`lateral_target_bypass`	`target_point`

Every full-shell row stores stable position_id, strategy_id, mixture_id, component name, and sampler_probability = 1/N. The explicit upper-bound ablation remains upper_bound_free_shell.

For the first thesis production profile, the simpler comparison is more important than exposing every hand-coded family at once. The locked profile matrix is:

Profile	Purpose
`v1_realistic_3family`	Main local egocentric distribution with `forward_local`, `target_bearing_local`, and `lateral_target_bypass`.
`v1_relaxed_3family`	Same three families with looser local-motion bounds to measure sensitivity.
`v1_rich_5family`	Ablation profile retaining `local_refinement` and `revisit_backtrack` for comparison.
`upper_bound_free_shell`	Explicit broad-shell ablation, reported separately from V1 realism.

local_refinement and revisit_backtrack are useful diagnostics only if component-level validity and target-gain reports show that they contribute valid, selected, target-improving actions. In persisted rollout stores, position_id is a hot candidate and \(Q_H\) training-view field, with aligned candidate diagnostics retained for inspection.

The realistic profile uses stricter local motion limits than the earlier seminar-style shell:

Limit	Default
max step distance	`1.0m`
max height delta	`0.25m`
max backward step	`0.25m`
max yaw delta	`70deg`

Full-shell count remains fixed, but thesis-scale generation must gate roots with too few valid actions and report blocked roots, valid counts by position_id, invalid reasons by position_id, and target-gain distributions by position_id.

Rollout branch selection acts on valid candidate rows, not on target rows:

Code

---
config:
  htmlLabels: true
  flowchart:
    htmlLabels: true
    nodeSpacing: 16
    rankSpacing: 26
  layout: elk
  themeVariables:
    fontSize: "18px"
  themeCSS: |
    .nodeLabel { font-size: 18px; }
    .node span.nodeLabel { color: #1F2937 !important; fill: #1F2937 !important; stroke: none !important; }
    .edgeLabel { font-size: 16px; }
    .cluster-label { font-size: 19px; font-weight: 700; }
---
flowchart LR
  Cand["candidate table<br/>valid mask m_i"]:::input
  Scores["policy scores<br/>heuristic or oracle evaluator"]:::compute
  Policy["selection policy<br/>greedy, random, softmax"]:::compute
  Diversity["diversity guards<br/>distance, yaw, target bearing, strategy"]:::compute
  Branches["selected sibling branches"]:::output
  Store["rollout / Q_H store<br/>action lineage + rewards"]:::data
  Next["next counterfactual state<br/>selected history updated"]:::output

  Cand --> Scores --> Policy --> Diversity --> Branches
  Branches --> Store
  Branches --> Next

  classDef input fill:#D5E8D4,stroke:#82B366,stroke-width:1.5px,rx:0,ry:0;
  classDef output fill:#F8CECC,stroke:#B85450,stroke-width:1.5px,rx:0,ry:0;
  classDef compute fill:#E1D5E7,stroke:#9673A6,stroke-width:1.5px,rx:8,ry:8;
  classDef data fill:#F5F5F5,stroke:#9E9E9E,stroke-width:1.2px,rx:0,ry:0;

Available rollout policies include:

Policy	Score or sampling rule
`farthest_from_history`	Distance to the nearest previously selected pose.
`farthest_from_reference`	Distance from the current reference pose.
`random` / `random_valid`	Uniform sampling over eligible valid rows.
`oracle_greedy`	Greedy argmax over oracle/evaluator score.
`temperature_softmax`	Stochastic branch sampling over finite evaluator scores.

Temperature-softmax candidate selection uses robust logits:

\[ \ell_i = \frac{s_i-\operatorname{median}(s)} {\operatorname{IQR}(s)\tau}, \]

with a standard-deviation fallback for tiny candidate sets. The selected candidate distribution is:

\[ P(i\mid M)= \frac{\exp(\ell_i)} {\sum_{j:m_j=1}\exp(\ell_j)}. \]

Diversity guards can require minimum sibling distance, yaw separation, target-bearing separation, and strategy diversity. These guards operate after the score policy and before branch expansion so rollout data does not collapse to near-duplicate sibling branches. They are data-generation guardrails, not a primary thesis contribution. If a configured guard rejects every remaining candidate and the implementation falls back to unconstrained remaining rows, reports should expose guard activation and fallback counts alongside entropy, selected component histograms, and target-root-gain distributions.

1.5 Connection To Target-RRI And Q_H

Target selection and candidate sampling are actor-visible. Oracle target-RRI is computed only after an actor-visible target row has been selected and matched to GT for evaluation. The selected target conditions the finite candidate set, and the scorer evaluates valid candidate rows with target-specific point-mesh improvement. The default rollout and \(Q_H\) reward is root-normalized target gain from the finite-horizon contract, while state-relative target RRI remains a diagnostic label.

This separation protects the thesis result: actor policies see observed and predicted target evidence, valid candidate tokens, history, and budget state; labelers see GT mesh/OBB assets only to create trusted supervision and endpoint metrics.