GenNBV

1 GenNBV: Continuous 5-DoF Coverage-Reward RL

Primary source. GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction [1].

Local source. main.tex and Supp/supp.tex.

External links. Project page and GitHub.

Related ARIA-NBV pages. VIN-NBV, Hestia, and RL planning.

1.1 Core contribution

GenNBV is the main continuous-control contrast for ARIA-NBV. It trains a generalizable NBV policy with PPO over a continuous 5DoF action: 3D position plus yaw and pitch, with roll restricted [1]. Its reward is coverage gain, not reconstruction-quality gain.

The policy observes a multi-source state embedding from 3D occupancy, semantic/appearance cues, and action history, then samples a camera pose from a learned Gaussian policy:

\[ a_t = (x_t,y_t,z_t,\psi_t,\theta_t), \qquad a_t \sim \pi_\theta(a \mid s_t). \]

1.2 Verified paper signals

signal source-backed detail ARIA-NBV relevance
Action space The learned policy acts in free 5-DoF pose space: 3D position, yaw, and pitch. Useful contrast to ARIA-NBV’s finite candidate set; not the first thesis action space.
State representation GenNBV fuses a 3D occupancy grid, image/semantic cues, and action history. Supports keeping rollout history and geometric state in ARIA-NBV’s MDP contract.
Reward Coverage ratio gain is the primary reward, with collision/path penalties. Baseline objective to compare against, but weaker than RRI for quality-driven reconstruction.
Training regime The paper uses PPO with many parallel simulation environments and reports large-scale simulator training. Confirms that continuous online RL depends on an interactive simulator/reward loop.
Dataset/evaluation Training uses Houses3K-style scenes; the paper tests generalization to object/scene categories and reports coverage/AUC metrics. Dataset and metric mismatch prevent direct transfer of reported numbers to ASE target-RRI experiments.

The coverage reward can be summarized as:

\[ r^{\mathrm{CR}}_{t+1} = \mathrm{CR}_{t+1} - \mathrm{CR}_{t}, \qquad \mathrm{CR}_{t} = \frac{\tilde{N}_{t}}{N^\star}. \]

1.3 ARIA-NBV adoption

  • Background / stretch bridge: use GenNBV to justify why continuous control is possible only when a simulator-grade reward loop exists.
  • MDP contract signal: keep state, history, action, reward, and termination explicit.
  • Evaluation contrast: compare coverage-gain reasoning against scene/target RRI reasoning, not against uncalibrated qualitative impressions.

1.4 Do not adopt

  • Do not make coverage ratio the thesis objective.
  • Do not start ARIA-NBV with PPO over continuous actions while oracle RRI is expensive and counterfactual observations are incomplete.
  • Do not transfer GenNBV’s reported coverage/AUC numbers as evidence for ARIA-NBV’s ASE target-RRI setting.
  • Do not ignore finite-candidate validity masks; GenNBV’s free-space action handling does not solve ARIA-NBV candidate-order and oracle-label contracts.

1.5 Open risks / caveats

  • Coverage can improve while Chamfer-style reconstruction quality or target-specific quality does not.
  • Continuous-control benchmarks depend heavily on simulator fidelity, collision handling, and training distribution.
  • For ARIA-NBV, GenNBV becomes meaningful as a later bridge only after deterministic oracle rollouts and target-conditioned Q_H learning are stable.

References

[1]
X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html