This page owns the reader-facing theory contract for target-conditioned sampling in ARIA-NBV. It explains how actor-visible targets are selected, how a finite candidate table is sampled, how mixed candidate families are combined, and how rollout branch selection connects those candidates to target-RRI and \(Q_H\).
The finite-horizon value-learning contract is maintained in Finite-Candidate Rollout And Q_H Contract. The metric contract is maintained in RRI Theory. Generated API details come from the pose_generation and data_handling._target_selection docstrings.
This page distinguishes the current target-first thesis contract from scale-up policy. The formulas below define operational utilities and provenance fields; they are not claims that the default thresholds, mixture weights, or target-ranking products are statistically optimal. Before thesis-scale generation, the selector and sampler must report separate eligibility, interest, matching, validity, and provenance diagnostics.
1.1 Actor-Visible Target Selection
The V1 target source order is actor-visible by construction. The selector first uses detected OBB records when present, then falls back to EVL/backbone predicted OBB records. Ground-truth OBBs are forbidden as V1 actor input. They may appear only in explicit V0 sanity mode or after V1 selection as oracle-only GT match audit fields.
Target selection is a three-stage contract. A hard eligibility mask decides whether the actor-visible row is labelable; a target interest score orders or samples eligible rows; and the post-selection GT match audit decides whether the selected row can receive target-RRI supervision. These stages are kept separate so a weakly observed target fails as a labelability case rather than being misreported as a geometric GT-match failure.
For every actor-visible entity candidate \(e\), the selector combines semi-dense support and EVL support into an effective support count. The coefficient \(w_{\mathrm{EVL}}\) is the configured evl_support_weight, with current default 1.0. It treats each positive EVL support location as one weighted support vote. This is an operational ablation/config knob, not a learned value or a claim of optimal calibration. \[
n_e^{\mathrm{eff}}
=
n_e^{\mathrm{semi}}
+
w_{\mathrm{EVL}} n_e^{\mathrm{EVL}}.
\]
The support score verifies that the target has enough actor-visible evidence to be labelable. The threshold \(n_{\min}\) is the configured min_support_points, with current default 3, and is also the hard minimum effective support required for target eligibility. \[
s_e^{\mathrm{sup}}
=
\operatorname{clip}
\left(
\frac{n_e^{\mathrm{eff}}}{n_{\min}},
0,1
\right).
\]
The deficit score deliberately favors targets that are supported but not already saturated. The threshold \(n_{\mathrm{sat}}\) is the configured support_saturation_points, with current default 128. It is the soft support saturation point at which this deficit term reaches zero.
This single persisted projected-area field is used for target feasibility and scoring. Raw projected extents outside the image are not part of the main target contract because they can inflate visibility for boxes that are mostly or entirely outside the image. Let \(A_{\mathrm{img}}\) be the configured projected_area_normalizer_pixels, with current default 240*240, and define \(a_e=A_e/A_{\mathrm{img}}\). The full-score fraction \(a_{\max}\) is the configured projected_area_full_score_fraction, with current default 0.05.
The visibility curve is the standard cubic smoothstep. It is used as a smooth operational weighting because it maps [0,1] to [0,1] with zero slope at both endpoints. If no projected box exists, visibility is either a hard invalidity when require_projected_visibility=True, or the configured fallback score missing_projection_visibility_score, with current default 0.35.
where \(p_e\) is the OBB confidence as predicted by the EFM.
This product score is an operational baseline interest utility. For scale-up audits, keep its factors separable: confidence, projected visibility, support sufficiency, and support deficit should be reported independently so failed target rows can be attributed to a concrete gate rather than to a low product.
Greedy top-K selection is deterministic:
\[
E_K=\operatorname{TopK}_{e:m_e=1} S_e.
\]
The subscript \(e:m_e=1\) means that top-K is taken only over eligible target rows.
The stochastic target-selection policy samples top-K without replacement:
where \(R\) is the remaining eligible, not-yet-selected target set at the current sampling step and \(\tau\) is the target temperature. This local symbol is not a global glossary term. This stochastic policy operates over target rows, not candidate actions. Because \(S_e\) is a bounded product score, raw-score softmax can become nearly uniform for many small scores or brittle for very small temperatures. The default thesis-scale target sampling policy remains deferred until coverage audits are available. The leading candidate is stratified target sampling over support, projected visibility, distance, and class bins; robust target logits and greedy top-K remain comparison policies. Rollout branch softmax already uses robust score normalization and should not be conflated with this target-row policy.
After selection, the GT label audit matches the selected actor-visible row to a GT OBB:
The match is accepted only if semantic class, IoU, match score, and ambiguity thresholds pass. The selected target row therefore contains actor-visible fields for the model plus oracle-only GT match fields for label validity and debugging. Support and visibility remain eligibility and audit fields; they do not rank competing GT objects.
1.2 Candidate Center Sampling
Candidate generation produces a full sampled shell and a compact valid table. The full shell retains invalid candidates, masks, reason codes, and provenance. The compact table exposes only valid candidates to rendering, rollout policies, and \(Q_H\) candidate tokens.
where \(b_e\) is the actor-visible target bearing, \(\ell_e\) is the horizontal lateral direction around that bearing, and \(u_{\mathrm{up}}\) is world up in the sampling frame.
The caps above constrain the raw draw, not necessarily the final movement direction after a target-aware position family has blended or rebuilt the direction. Final egocentric realism is therefore owned by post-mode diagnostics and pruning: report final azimuth/elevation, step length, height delta, backward displacement, and target-bearing statistics by component. A stricter profile may horizontalize target-bearing displacement so target height affects orientation while center motion stays in the local walking plane.
1.3 Layer 3: Candidate Orientation And Pruning
Candidate orientation is built after center sampling. The orientation mode controls the base camera frame:
Orientation mode
Meaning
forward_rig
Reuse the reference rig rotation at each candidate center.
radial_away
Look along the reference-to-candidate ray.
radial_towards
Look back along the candidate-to-reference ray.
target_point
Look at the selected actor-visible target center.
For a target look-at frame at candidate center \(c_w\) and target center \(p_e\):
Additional guards may reject path collisions, excessive yaw changes, or missing candidate depths. These invalid candidates remain in the full shell with reason codes and NaN labels; they are not low-quality valid actions.
Mixture component counts are full-shell counts, not guaranteed valid-action counts. A scale-ready sampler must therefore report valid candidates per step and per component, invalid reasons per component, and selected/gain histograms before interpreting low headroom as a planning result.
1.4 Layer 4: Mixture And Rollout Branch Selection
The current default target-conditioned sampler is a 60-row mixed candidate table:
Every full-shell row stores stable position_id, strategy_id, mixture_id, component name, and sampler_probability = 1/N. The explicit upper-bound ablation remains upper_bound_free_shell.
For the first thesis production profile, the simpler comparison is more important than exposing every hand-coded family at once. The locked profile matrix is:
Profile
Purpose
v1_realistic_3family
Main local egocentric distribution with forward_local, target_bearing_local, and lateral_target_bypass.
v1_relaxed_3family
Same three families with looser local-motion bounds to measure sensitivity.
v1_rich_5family
Ablation profile retaining local_refinement and revisit_backtrack for comparison.
upper_bound_free_shell
Explicit broad-shell ablation, reported separately from V1 realism.
local_refinement and revisit_backtrack are useful diagnostics only if component-level validity and target-gain reports show that they contribute valid, selected, target-improving actions. In persisted rollout stores, position_id is a hot candidate and \(Q_H\) training-view field, with aligned candidate diagnostics retained for inspection.
The realistic profile uses stricter local motion limits than the earlier seminar-style shell:
Limit
Default
max step distance
1.0m
max height delta
0.25m
max backward step
0.25m
max yaw delta
70deg
Full-shell count remains fixed, but thesis-scale generation must gate roots with too few valid actions and report blocked roots, valid counts by position_id, invalid reasons by position_id, and target-gain distributions by position_id.
Rollout branch selection acts on valid candidate rows, not on target rows:
Diversity guards can require minimum sibling distance, yaw separation, target-bearing separation, and strategy diversity. These guards operate after the score policy and before branch expansion so rollout data does not collapse to near-duplicate sibling branches. They are data-generation guardrails, not a primary thesis contribution. If a configured guard rejects every remaining candidate and the implementation falls back to unconstrained remaining rows, reports should expose guard activation and fallback counts alongside entropy, selected component histograms, and target-root-gain distributions.
1.5 Connection To Target-RRI And Q_H
Target selection and candidate sampling are actor-visible. Oracle target-RRI is computed only after an actor-visible target row has been selected and matched to GT for evaluation. The selected target conditions the finite candidate set, and the scorer evaluates valid candidate rows with target-specific point-mesh improvement. The default rollout and \(Q_H\) reward is root-normalized target gain from the finite-horizon contract, while state-relative target RRI remains a diagnostic label.
This separation protects the thesis result: actor policies see observed and predicted target evidence, valid candidate tokens, history, and budget state; labelers see GT mesh/OBB assets only to create trusted supervision and endpoint metrics.
Source Code
---title: "Candidate Sampling And Target Selection"phase: thesisaudience: publicstatus: currentowner: janformat: html---# Candidate Sampling And Target Selection {#candidate-sampling-target-selection}This page owns the reader-facing theory contract for target-conditionedsampling in ARIA-NBV. It explains how actor-visible targets are selected, how afinite candidate table is sampled, how mixed candidate families are combined,and how rollout branch selection connects those candidates to target-RRI and$Q_H$.The finite-horizon value-learning contract is maintained in[Finite-Candidate Rollout And Q_H Contract](rl_planning.qmd). The metriccontract is maintained in [RRI Theory](rri_theory.qmd). Generated API detailscome from the `pose_generation` and `data_handling._target_selection`docstrings.This page distinguishes the current target-first thesis contract fromscale-up policy. The formulas below define operational utilities andprovenance fields; they are not claims that the default thresholds, mixtureweights, or target-ranking products are statistically optimal. Beforethesis-scale generation, the selector and sampler must report separateeligibility, interest, matching, validity, and provenance diagnostics.## Actor-Visible Target Selection {#actor-visible-target-selection}The V1 target source order is actor-visible by construction. The selector firstuses detected OBB records when present, then falls back to EVL/backbonepredicted OBB records. Ground-truth OBBs are forbidden as V1 actor input. Theymay appear only in explicit V0 sanity mode or after V1 selection as oracle-onlyGT match audit fields.Target selection is a three-stage contract. A hard eligibility mask decideswhether the actor-visible row is labelable; a target interest score orders orsamples eligible rows; and the post-selection GT match audit decides whetherthe selected row can receive target-RRI supervision. These stages are keptseparate so a weakly observed target fails as a labelability case rather thanbeing misreported as a geometric GT-match failure.```{mermaid}%%| file: ../../figures/diagrams/pose_generation/mermaid/target_selection_contract.mmd```For every actor-visible entity candidate $e$, the selector combines semi-densesupport and EVL support into an effective support count. The coefficient$w_{\mathrm{EVL}}$ is the configured `evl_support_weight`, with current default`1.0`. It treats each positive EVL support location as one weighted supportvote. This is an operational ablation/config knob, not a learned value or aclaim of optimal calibration.$$n_e^{\mathrm{eff}}=n_e^{\mathrm{semi}}+w_{\mathrm{EVL}} n_e^{\mathrm{EVL}}.$$The support score verifies that the target has enough actor-visible evidence tobe labelable. The threshold $n_{\min}$ is the configured`min_support_points`, with current default `3`, and is also the hard minimumeffective support required for target eligibility.$$s_e^{\mathrm{sup}}=\operatorname{clip}\left( \frac{n_e^{\mathrm{eff}}}{n_{\min}}, 0,1\right).$$The deficit score deliberately favors targets that are supported but not alreadysaturated. The threshold $n_{\mathrm{sat}}$ is the configured`support_saturation_points`, with current default `128`. It is the soft supportsaturation point at which this deficit term reaches zero.$$s_e^{\mathrm{def}}=1-\operatorname{clip}\left( \frac{n_e^{\mathrm{eff}}}{n_{\mathrm{sat}}}, 0,1\right).$$For projected visibility, let $A_e$ be the maximum clipped visible 2D OBBoverlap over actor-visible RGB/SLAM camera boxes:$$A_e=[\min(x_2,W)-\max(x_1,0)]_+[\min(y_2,H)-\max(y_1,0)]_+.$$This single persisted projected-area field is used for target feasibility andscoring. Raw projected extents outside the image are not part of the maintarget contract because they can inflate visibility for boxes that are mostlyor entirely outside the image. Let $A_{\mathrm{img}}$ be the configured`projected_area_normalizer_pixels`, with current default `240*240`, and define$a_e=A_e/A_{\mathrm{img}}$. The full-score fraction $a_{\max}$ is theconfigured `projected_area_full_score_fraction`, with current default `0.05`.$$x_e=\operatorname{clip}(a_e/a_{\max},0,1),\qquads_e^{\mathrm{vis}}=x_e^2(3-2x_e).$$The visibility curve is the standard cubic smoothstep. It is used as a smoothoperational weighting because it maps `[0,1]` to `[0,1]` with zero slope atboth endpoints. If no projected box exists, visibility is either a hardinvalidity when `require_projected_visibility=True`, or the configured fallbackscore `missing_projection_visibility_score`, with current default `0.35`.Hard eligibility is:$$m_e=1\iffp_e\ge\tau_p\land n_e^{\mathrm{eff}}\ge n_{\min}\land \mathrm{finite}(\hat B_e)\land \mathrm{visibility\ policy\ passes}.$$The baseline actor-visible target interest score is:$$S_e=p_es_e^{\mathrm{vis}}s_e^{\mathrm{sup}}s_e^{\mathrm{def}},$$where $p_e$ is the OBB confidence as predicted by the EFM.This product score is an operational baseline interest utility. For scale-upaudits, keep its factors separable: confidence, projected visibility, supportsufficiency, and support deficit should be reported independently so failedtarget rows can be attributed to a concrete gate rather than to a low product.Greedy top-K selection is deterministic:$$E_K=\operatorname{TopK}_{e:m_e=1} S_e.$$The subscript $e:m_e=1$ means that top-K is taken only over eligible targetrows.The stochastic target-selection policy samples top-K without replacement:$$P(e\mid R)=\frac{\exp(S_e/\tau)} {\sum_{j\in R}\exp(S_j/\tau)},$$where $R$ is the remaining eligible, not-yet-selected target set at the currentsampling step and $\tau$ is the target temperature. This local symbol is not aglobal glossary term. This stochastic policy operates over target rows, notcandidate actions. Because $S_e$ is a bounded product score, raw-score softmaxcan become nearly uniform for many small scores or brittle for very smalltemperatures. The default thesis-scale target sampling policy remains deferreduntil coverage audits are available. The leading candidate is stratified targetsampling over support, projected visibility, distance, and class bins; robusttarget logits and greedy top-K remain comparison policies. Rollout branchsoftmax already uses robust score normalization and should not be conflatedwith this target-row policy.After selection, the GT label audit matches the selected actor-visible row to aGT OBB:$$G(e,g)=\mathbb{1}[\kappa(\hat y_e,y_g)=1]\,\operatorname{IoU}_{3D}(\hat B_e,B_g^{GT}).$$The match is accepted only if semantic class, IoU, match score, and ambiguitythresholds pass. The selected target row therefore contains actor-visiblefields for the model plus oracle-only GT match fields for label validity anddebugging. Support and visibility remain eligibility and audit fields; they donot rank competing GT objects.## Candidate Center Sampling {#candidate-center-sampling}Candidate generation produces a full sampled shell and a compact valid table.The full shell retains invalid candidates, masks, reason codes, and provenance.The compact table exposes only valid candidates to rendering, rollout policies,and $Q_H$ candidate tokens.```{mermaid}%%| file: ../../figures/diagrams/pose_generation/mermaid/candidate_sampling_pipeline.mmd```Candidate centers are sampled in a gravity-aligned reference frame, thentransformed to the world frame. A raw direction can be uniform on the sphere:$$u\sim \mathrm{Unif}(\mathbb{S}^2),$$or forward-biased:$$u\sim \mathrm{PowerSpherical}(\mu=e_z,\kappa),\qquadp(u)\propto (1+\mu^\top u)^\kappa.$$Angular caps map the raw sampled direction into the configured yaw andelevation limits without rejection:$$\psi=\operatorname{atan2}(u_x,u_z),\qquad\psi'=\psi\frac{\Delta\psi}{2\pi},$$$$y'=\sin\theta_{\min}+\frac{u_y+1}{2}\left(\sin\theta_{\max}-\sin\theta_{\min}\right).$$The candidate radius is then drawn uniformly:$$r\sim \mathcal{U}(r_{\min},r_{\max}),\qquado_r=r\,d_r,\qquadc_w=T^w_r o_r.$$The position family transforms the capped direction $d_r$ into a concretespatial prior:| Position family | Role ||---|---||`upper_bound_free_shell`| Ablation shell using the capped sampled direction directly. ||`forward_local`| Local continuity around the device forward axis with spread `0.45`. ||`local_refinement`| Ablation-only tighter local continuity around forward with spread `0.25`. ||`revisit_backtrack`| Ablation-only backtracking family around the local backward direction with spread `0.35`. ||`target_bearing_local`| Actor-visible target-centric family around the selected target bearing with spread `0.4`. ||`lateral_target_bypass`| Side-step family that combines target bearing, signed lateral bypass, and bounded vertical offset. |For lateral target bypass, the direction is:$$d'=\operatorname{norm}\left( 0.55\,b_e + 0.85\,\operatorname{sign}(u_x)\ell_e + \operatorname{clip}(u_y,-0.35,0.35)u_{\mathrm{up}}\right),$$where $b_e$ is the actor-visible target bearing, $\ell_e$ is the horizontallateral direction around that bearing, and $u_{\mathrm{up}}$ is world up in thesampling frame.The caps above constrain the raw draw, not necessarily the final movementdirection after a target-aware position family has blended or rebuilt thedirection. Final egocentric realism is therefore owned by post-mode diagnosticsand pruning: report final azimuth/elevation, step length, height delta,backward displacement, and target-bearing statistics by component. A stricterprofile may horizontalize target-bearing displacement so target height affectsorientation while center motion stays in the local walking plane.## Layer 3: Candidate Orientation And Pruning {#candidate-orientation-pruning}Candidate orientation is built after center sampling. The orientation modecontrols the base camera frame:| Orientation mode | Meaning ||---|---||`forward_rig`| Reuse the reference rig rotation at each candidate center. ||`radial_away`| Look along the reference-to-candidate ray. ||`radial_towards`| Look back along the candidate-to-reference ray. ||`target_point`| Look at the selected actor-visible target center. |For a target look-at frame at candidate center $c_w$ and target center $p_e$:$$z_w=\operatorname{norm}(p_e-c_w),$$$$y_w=\operatorname{norm}(u_{\mathrm{up}}-(u_{\mathrm{up}}^\top z_w)z_w),\qquadx_w=y_w\times z_w.$$Optional view jitter samples yaw and pitch deltas in camera coordinates:$$\delta\psi\sim \mathcal{U}(-\psi_{\max},\psi_{\max}),\qquad\delta\theta\sim \mathcal{U}(-\theta_{\max},\theta_{\max}),$$$$z_c'=\begin{bmatrix}\cos\delta\theta\sin\delta\psi\\\sin\delta\theta\\\cos\delta\theta\cos\delta\psi\end{bmatrix}.$$Pruning keeps invalidity as an explicit mask contract. A candidate is validonly if its center lies in the occupancy support,$$c_i\in B_{\mathrm{occ}},$$it satisfies GT-mesh clearance,$$d_i=\min_{x\in\mathcal{M}_{GT}}\lVert c_i-x\rVert_2>d_{\min},$$and it respects egocentric motion bounds,$$\lVert o_i\rVert_2\le d_{\max},\quad|\Delta h_i|\le h_{\max},\quad\max(0,-o_{i,z})\le b_{\max}.$$Additional guards may reject path collisions, excessive yaw changes, or missingcandidate depths. These invalid candidates remain in the full shell with reasoncodes and NaN labels; they are not low-quality valid actions.Mixture component counts are full-shell counts, not guaranteed valid-actioncounts. A scale-ready sampler must therefore report valid candidates per stepand per component, invalid reasons per component, and selected/gain histogramsbefore interpreting low headroom as a planning result.## Layer 4: Mixture And Rollout Branch Selection {#mixture-rollout-branch-selection}The current default target-conditioned sampler is a 60-row mixed candidatetable:```{mermaid}%%| file: ../../figures/diagrams/pose_generation/mermaid/candidate_mixture_families.mmd```| Component | Count | Position family | Orientation family ||---|---:|---|---||`forward_local`| 24 |`forward_local`|`forward_rig`||`target_bearing_local`| 24 |`target_bearing_local`|`target_point`||`lateral_target_bypass`| 12 |`lateral_target_bypass`|`target_point`|Every full-shell row stores stable `position_id`, `strategy_id`,`mixture_id`, component name, and `sampler_probability = 1/N`. The explicitupper-bound ablation remains `upper_bound_free_shell`.For the first thesis production profile, the simpler comparison is moreimportant than exposing every hand-coded family at once. The locked profilematrix is:| Profile | Purpose ||---|---||`v1_realistic_3family`| Main local egocentric distribution with `forward_local`, `target_bearing_local`, and `lateral_target_bypass`. ||`v1_relaxed_3family`| Same three families with looser local-motion bounds to measure sensitivity. ||`v1_rich_5family`| Ablation profile retaining `local_refinement` and `revisit_backtrack` for comparison. ||`upper_bound_free_shell`| Explicit broad-shell ablation, reported separately from V1 realism. |`local_refinement` and `revisit_backtrack` are useful diagnostics only ifcomponent-level validity and target-gain reports show that they contributevalid, selected, target-improving actions. In persisted rollout stores,`position_id` is a hot candidate and $Q_H$ training-view field, with alignedcandidate diagnostics retained for inspection.The realistic profile uses stricter local motion limits than the earlierseminar-style shell:| Limit | Default ||---|---:|| max step distance |`1.0m`|| max height delta |`0.25m`|| max backward step |`0.25m`|| max yaw delta |`70deg`|Full-shell count remains fixed, but thesis-scale generation must gate rootswith too few valid actions and report blocked roots, valid counts by`position_id`, invalid reasons by `position_id`, and target-gain distributionsby `position_id`.Rollout branch selection acts on valid candidate rows, not on target rows:```{mermaid}%%| file: ../../figures/diagrams/pose_generation/mermaid/rollout_branch_selection.mmd```Available rollout policies include:| Policy | Score or sampling rule ||---|---||`farthest_from_history`| Distance to the nearest previously selected pose. ||`farthest_from_reference`| Distance from the current reference pose. ||`random` / `random_valid`| Uniform sampling over eligible valid rows. ||`oracle_greedy`| Greedy argmax over oracle/evaluator score. ||`temperature_softmax`| Stochastic branch sampling over finite evaluator scores. |Temperature-softmax candidate selection uses robust logits:$$\ell_i=\frac{s_i-\operatorname{median}(s)} {\operatorname{IQR}(s)\tau},$$with a standard-deviation fallback for tiny candidate sets. The selectedcandidate distribution is:$$P(i\mid M)=\frac{\exp(\ell_i)} {\sum_{j:m_j=1}\exp(\ell_j)}.$$Diversity guards can require minimum sibling distance, yaw separation,target-bearing separation, and strategy diversity. These guards operate afterthe score policy and before branch expansion so rollout data does not collapseto near-duplicate sibling branches. They are data-generation guardrails, not aprimary thesis contribution. If a configured guard rejects every remainingcandidate and the implementation falls back to unconstrained remaining rows,reports should expose guard activation and fallback counts alongside entropy,selected component histograms, and target-root-gain distributions.## Connection To Target-RRI And Q_H {#target-rri-qh-connection}Target selection and candidate sampling are actor-visible. Oracle target-RRI iscomputed only after an actor-visible target row has been selected and matched toGT for evaluation. The selected target conditions the finite candidate set, andthe scorer evaluates valid candidate rows with target-specific point-meshimprovement. The default rollout and $Q_H$ reward is root-normalized targetgain from the finite-horizon contract, while state-relative target RRI remains adiagnostic label.This separation protects the thesis result: actor policies see observed andpredicted target evidence, valid candidate tokens, history, and budget state;labelers see GT mesh/OBB assets only to create trusted supervision and endpointmetrics.