GenNBV
1 GenNBV: Continuous 5-DoF Coverage-Reward RL
Primary source. GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction [1].
Local source. main.tex and Supp/supp.tex.
External links. Project page and GitHub.
Related ARIA-NBV pages. VIN-NBV, Hestia, and RL planning.
1.1 Core contribution
GenNBV is the main continuous-control contrast for ARIA-NBV. It trains a generalizable NBV policy with PPO over a continuous 5DoF action: 3D position plus yaw and pitch, with roll restricted [1]. Its reward is coverage gain, not reconstruction-quality gain.
The policy observes a multi-source state embedding from 3D occupancy, semantic/appearance cues, and action history, then samples a camera pose from a learned Gaussian policy:
\[ a_t = (x_t,y_t,z_t,\psi_t,\theta_t), \qquad a_t \sim \pi_\theta(a \mid s_t). \]
1.2 Verified paper signals
| signal | source-backed detail | ARIA-NBV relevance |
|---|---|---|
| Action space | The learned policy acts in free 5-DoF pose space: 3D position, yaw, and pitch. | Useful contrast to ARIA-NBV’s finite candidate set; not the first thesis action space. |
| State representation | GenNBV fuses a 3D occupancy grid, image/semantic cues, and action history. | Supports keeping rollout history and geometric state in ARIA-NBV’s MDP contract. |
| Reward | Coverage ratio gain is the primary reward, with collision/path penalties. | Baseline objective to compare against, but weaker than RRI for quality-driven reconstruction. |
| Training regime | The paper uses PPO with many parallel simulation environments and reports large-scale simulator training. | Confirms that continuous online RL depends on an interactive simulator/reward loop. |
| Dataset/evaluation | Training uses Houses3K-style scenes; the paper tests generalization to object/scene categories and reports coverage/AUC metrics. | Dataset and metric mismatch prevent direct transfer of reported numbers to ASE target-RRI experiments. |
The coverage reward can be summarized as:
\[ r^{\mathrm{CR}}_{t+1} = \mathrm{CR}_{t+1} - \mathrm{CR}_{t}, \qquad \mathrm{CR}_{t} = \frac{\tilde{N}_{t}}{N^\star}. \]
1.3 ARIA-NBV adoption
- Background / stretch bridge: use GenNBV to justify why continuous control is possible only when a simulator-grade reward loop exists.
- MDP contract signal: keep state, history, action, reward, and termination explicit.
- Evaluation contrast: compare coverage-gain reasoning against scene/target RRI reasoning, not against uncalibrated qualitative impressions.
1.4 Do not adopt
- Do not make coverage ratio the thesis objective.
- Do not start ARIA-NBV with PPO over continuous actions while oracle RRI is expensive and counterfactual observations are incomplete.
- Do not transfer GenNBV’s reported coverage/AUC numbers as evidence for ARIA-NBV’s ASE target-RRI setting.
- Do not ignore finite-candidate validity masks; GenNBV’s free-space action handling does not solve ARIA-NBV candidate-order and oracle-label contracts.
1.5 Open risks / caveats
- Coverage can improve while Chamfer-style reconstruction quality or target-specific quality does not.
- Continuous-control benchmarks depend heavily on simulator fidelity, collision handling, and training distribution.
- For ARIA-NBV, GenNBV becomes meaningful as a later bridge only after deterministic oracle rollouts and target-conditioned Q_H learning are stable.