1 pose_generation.counterfactuals
pose_generation.counterfactuals
Bounded counterfactual pose rollout utilities.
Rollouts regenerate finite candidate tables at each step from the updated pose, history, and remaining budget. The candidate generator may be single-family or mixed, but the selected action must satisfy the actor-valid mask. Oracle scores are supervision/evaluation fields; actor-visible replay rows retain poses, masks, candidate provenance, and selected-action lineage.
The first thesis-core use is deterministic oracle lookahead and replay data for finite-candidate value learning. Online simulator training and continuous action control are outside this module’s current contract.
1.1 Theory
Rollout policies operate over valid finite candidate rows. Heuristic policies score distance from history or the current reference; random policies sample uniformly over eligible valid rows; oracle policies consume evaluator scores such as target root gain. Temperature-softmax selection uses robust logits
\[ \ell_i = \frac{s_i-\operatorname{median}(s)} {\operatorname{IQR}(s)\tau}, \]
with a standard-deviation fallback for tiny candidate sets, followed by a masked softmax over valid candidates. Diversity guards may require sibling distance, yaw separation, target-bearing separation, and strategy diversity before branches are admitted into the rollout tree. Branch schedules control how many sibling transitions are retained per rollout depth.
1.2 Classes
| Name | Description |
|---|---|
| CounterfactualSelectionPolicy | Built-in policies used to rank valid candidates during rollout expansion. |
| CounterfactualSelectionRecord | Selected valid-candidate index plus the distribution used to draw it. |
| CounterfactualMetricBundle | Typed per-valid-candidate metrics emitted by rollout evaluators. |
| CounterfactualCandidateEvaluation | Structured per-valid-candidate rollout scores and optional diagnostics. |
| CounterfactualStepResult | One selected rollout step. |
| CounterfactualTrajectory | One rollout trajectory rooted at one initial pose. |
| CounterfactualRolloutResult | All trajectories produced by one rollout call. |
| CounterfactualPoseGeneratorConfig | Configuration for multi-step finite-candidate rollout generation. |
| CounterfactualOracleRriScorerConfig | Config-as-factory wrapper for oracle-RRI rollout scoring. |
| CounterfactualOracleRriScorer | Evaluate valid candidates with oracle RRI relative to the current trajectory. |
| CounterfactualPoseGenerator | Expand a multi-step counterfactual pose tree from the current generator. |