GenNBV

1 GenNBV: Continuous 5-DoF Coverage-Reward RL

Primary source. GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction [1].

Local source. main.tex and Supp/supp.tex.

External links. Project page and GitHub.

Related ARIA-NBV pages. VIN-NBV, Hestia, and RL planning.

1.1 Core contribution

GenNBV is the main continuous-control contrast for ARIA-NBV. It trains a generalizable NBV policy with PPO over a continuous 5DoF action: 3D position plus yaw and pitch, with roll restricted [1]. Its reward is coverage gain, not reconstruction-quality gain.

The policy observes a multi-source state embedding from 3D occupancy, semantic/appearance cues, and action history, then samples a camera pose from a learned Gaussian policy:

\[ a_t = (x_t,y_t,z_t,\psi_t,\theta_t), \qquad a_t \sim \pi_\theta(a \mid s_t). \]

1.2 Verified paper signals

signal	source-backed detail	ARIA-NBV relevance
Action space	The learned policy acts in free 5-DoF pose space: 3D position, yaw, and pitch.	Useful contrast to ARIA-NBV’s finite candidate set; not the first thesis action space.
State representation	GenNBV fuses a 3D occupancy grid, image/semantic cues, and action history.	Supports keeping rollout history and geometric state in ARIA-NBV’s MDP contract.
Reward	Coverage ratio gain is the primary reward, with collision/path penalties.	Baseline objective to compare against, but weaker than RRI for quality-driven reconstruction.
Training regime	The paper uses PPO with many parallel simulation environments and reports large-scale simulator training.	Confirms that continuous online RL depends on an interactive simulator/reward loop.
Dataset/evaluation	Training uses Houses3K-style scenes; the paper tests generalization to object/scene categories and reports coverage/AUC metrics.	Dataset and metric mismatch prevent direct transfer of reported numbers to ASE target-RRI experiments.

The coverage reward can be summarized as:

\[ r^{\mathrm{CR}}_{t+1} = \mathrm{CR}_{t+1} - \mathrm{CR}_{t}, \qquad \mathrm{CR}_{t} = \frac{\tilde{N}_{t}}{N^\star}. \]

1.3 ARIA-NBV adoption

Background / stretch bridge: use GenNBV to justify why continuous control is possible only when a simulator-grade reward loop exists.
MDP contract signal: keep state, history, action, reward, and termination explicit.
Evaluation contrast: compare coverage-gain reasoning against scene/target RRI reasoning, not against uncalibrated qualitative impressions.

1.4 Do not adopt

Do not make coverage ratio the thesis objective.
Do not start ARIA-NBV with PPO over continuous actions while oracle RRI is expensive and counterfactual observations are incomplete.
Do not transfer GenNBV’s reported coverage/AUC numbers as evidence for ARIA-NBV’s ASE target-RRI setting.
Do not ignore finite-candidate validity masks; GenNBV’s free-space action handling does not solve ARIA-NBV candidate-order and oracle-label contracts.

1.5 Open risks / caveats

Coverage can improve while Chamfer-style reconstruction quality or target-specific quality does not.
Continuous-control benchmarks depend heavily on simulator fidelity, collision handling, and training distribution.
For ARIA-NBV, GenNBV becomes meaningful as a later bridge only after deterministic oracle rollouts and target-conditioned Q_H learning are stable.

References

[1]

X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang, “GenNBV: Generalizable next-best-view policy for active 3D reconstruction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16436–16445. Available: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenNBV_Generalizable_Next-Best-View_Policy_for_Active_3D_Reconstruction_CVPR_2024_paper.html

--- title: "GenNBV" phase: thesis audience: public status: current owner: jan format: html --- ## GenNBV: Continuous 5-DoF Coverage-Reward RL {#gen-nbv} **Primary source.** [GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction](https://arxiv.org/abs/2402.16174) [@GenNBV-chen2024]. **Local source.** [`main.tex`](../../literature/tex-src/arXiv-GenNBV/main.tex) and [`Supp/supp.tex`](../../literature/tex-src/arXiv-GenNBV/Supp/supp.tex). **External links.** [Project page](https://gennbv.tech/) and [GitHub](https://github.com/zjwzcx/GenNBV). **Related ARIA-NBV pages.** [VIN-NBV](vin_nbv.qmd), [Hestia](hestia.qmd), and [RL planning](rl_planning.qmd). ### Core contribution GenNBV is the main continuous-control contrast for ARIA-NBV. It trains a generalizable {{< gls next-best-view >}} policy with PPO over a continuous {{< gls five-degrees-of-freedom >}} action: 3D position plus yaw and pitch, with roll restricted [@GenNBV-chen2024]. Its reward is coverage gain, not reconstruction-quality gain. The policy observes a multi-source state embedding from 3D occupancy, semantic/appearance cues, and action history, then samples a camera pose from a learned Gaussian policy: $$ a_t = (x_t,y_t,z_t,\psi_t,\theta_t), \qquad a_t \sim \pi_\theta(a \mid s_t). $$ ### Verified paper signals | signal | source-backed detail | ARIA-NBV relevance | |---|---|---| | Action space | The learned policy acts in free 5-DoF pose space: 3D position, yaw, and pitch. | Useful contrast to ARIA-NBV's finite candidate set; not the first thesis action space. | | State representation | GenNBV fuses a 3D occupancy grid, image/semantic cues, and action history. | Supports keeping rollout history and geometric state in ARIA-NBV's MDP contract. | | Reward | Coverage ratio gain is the primary reward, with collision/path penalties. | Baseline objective to compare against, but weaker than {{< gls relative-reconstruction-improvement >}} for quality-driven reconstruction. | | Training regime | The paper uses PPO with many parallel simulation environments and reports large-scale simulator training. | Confirms that continuous online RL depends on an interactive simulator/reward loop. | | Dataset/evaluation | Training uses Houses3K-style scenes; the paper tests generalization to object/scene categories and reports coverage/AUC metrics. | Dataset and metric mismatch prevent direct transfer of reported numbers to ASE target-RRI experiments. | The coverage reward can be summarized as: $$ r^{\mathrm{CR}}_{t+1} = \mathrm{CR}_{t+1} - \mathrm{CR}_{t}, \qquad \mathrm{CR}_{t} = \frac{\tilde{N}_{t}}{N^\star}. $$ ### ARIA-NBV adoption - **Background / stretch bridge:** use GenNBV to justify why continuous control is possible only when a simulator-grade reward loop exists. - **MDP contract signal:** keep state, history, action, reward, and termination explicit. - **Evaluation contrast:** compare coverage-gain reasoning against scene/target RRI reasoning, not against uncalibrated qualitative impressions. ### Do not adopt - Do not make coverage ratio the thesis objective. - Do not start ARIA-NBV with PPO over continuous actions while oracle RRI is expensive and counterfactual observations are incomplete. - Do not transfer GenNBV's reported coverage/AUC numbers as evidence for ARIA-NBV's ASE target-RRI setting. - Do not ignore finite-candidate validity masks; GenNBV's free-space action handling does not solve ARIA-NBV candidate-order and oracle-label contracts. ### Open risks / caveats - Coverage can improve while Chamfer-style reconstruction quality or target-specific quality does not. - Continuous-control benchmarks depend heavily on simulator fidelity, collision handling, and training distribution. - For ARIA-NBV, GenNBV becomes meaningful as a later bridge only after deterministic oracle rollouts and target-conditioned {{< gls finite-horizon-q-function >}} learning are stable.