1 pose_generation.counterfactuals

pose_generation.counterfactuals

Bounded counterfactual pose rollout utilities.

Rollouts regenerate finite candidate tables at each step from the updated pose, history, and remaining budget. The candidate generator may be single-family or mixed, but the selected action must satisfy the actor-valid mask. Oracle scores are supervision/evaluation fields; actor-visible replay rows retain poses, masks, candidate provenance, and selected-action lineage.

The first thesis-core use is deterministic oracle lookahead and replay data for finite-candidate value learning. Online simulator training and continuous action control are outside this module’s current contract.

1.1 Theory

Rollout policies operate over valid finite candidate rows. Heuristic policies score distance from history or the current reference; random policies sample uniformly over eligible valid rows; oracle policies consume evaluator scores such as target root gain. Temperature-softmax selection uses robust logits

\[ \ell_i = \frac{s_i-\operatorname{median}(s)} {\operatorname{IQR}(s)\tau}, \]

with a standard-deviation fallback for tiny candidate sets, followed by a masked softmax over valid candidates. Diversity guards may require sibling distance, yaw separation, target-bearing separation, and strategy diversity before branches are admitted into the rollout tree. Branch schedules control how many sibling transitions are retained per rollout depth.

1.2 Classes

Name Description
CounterfactualSelectionPolicy Built-in policies used to rank valid candidates during rollout expansion.
CounterfactualSelectionRecord Selected valid-candidate index plus the distribution used to draw it.
CounterfactualMetricBundle Typed per-valid-candidate metrics emitted by rollout evaluators.
CounterfactualCandidateEvaluation Structured per-valid-candidate rollout scores and optional diagnostics.
CounterfactualStepResult One selected rollout step.
CounterfactualTrajectory One rollout trajectory rooted at one initial pose.
CounterfactualRolloutResult All trajectories produced by one rollout call.
CounterfactualPoseGeneratorConfig Configuration for multi-step finite-candidate rollout generation.
CounterfactualOracleRriScorerConfig Config-as-factory wrapper for oracle-RRI rollout scoring.
CounterfactualOracleRriScorer Evaluate valid candidates with oracle RRI relative to the current trajectory.
CounterfactualPoseGenerator Expand a multi-step counterfactual pose tree from the current generator.