Relative Reconstruction Improvement (RRI) Theory
The RRI metric introduced by VIN-NBV, quantifies the expected reconstruction quality gain from capturing a new viewpoint:
\[\text{RRI}(\mathbf{q}) = \frac{\text{d}(\mathcal{P}_t, \mathcal{M}_{\text{GT}}) - \text{d}(\mathcal{P}_{t \cup \mathbf{q}}, \mathcal{M}_{\text{GT}})}{\text{d}(\mathcal{P}_t, \mathcal{M}_{\text{GT}})}\]
Where:
- \(\mathbf{q} \in SE(3)\): Candidate viewpoint (5DoF pose as roll typically fixed to 0°). Each candidate defines a frustum and a set of visible surfaces.
- \(\mathcal{P}_t\): Current reconstruction point cloud from first \(t\) views.
- \(\mathcal{P}_{t \cup \mathbf{q}}\): the reconstruction after fusing the depth from view \(\mathbf{q}\) into \(\mathcal{P}_t\).
- \(\mathcal{M}_{\text{GT}}\): Ground truth mesh.
- \(\text{d}(\cdot, \cdot)\): a bidirectional surface similarity metric, typically Chamfer distance for PC to PC comparison, defined in Surface Reconstruction Metrics.
Properties:
- Range: [0, 1] where higher values indicate better viewpoints
- Normalized by current error \(\rightarrow\) limited independence on point cloud density
- Directly correlates with reconstruction quality improvement
Issues:
- Requires both PCs to follow the same sampling distribution for valid CD comparison. This is problematic, as a semi-dense point cloud is given as \(\mathcal{P}_t\), while extending it with a new view \(\mathbf{q}\), requires sampling points from the GT mesh visible from \(\mathbf{q}\) that follows the same distribution as \(\mathcal{P}_t\).
- Sampling \(\mathcal{P}_q\) as dense PC from the GT mesh visible from \(\mathbf{q}\) leads to distribution mismatch with semi-dense \(\mathcal{P}_t\).
- However, we are not interested in \(\text{d}(\mathcal{P}_t, \mathcal{P}_{t \cup \mathbf{q}})\), but rather in the improvement w.r.t. the GT mesh \(\mathcal{M}_{\text{GT}}\). Also, we are mostly interested in expressing the fitness of different candidate views \(\mathbf{q}\), which should not be strongly affected by the exact sampling distribution of \(\mathcal{P}_t\) as a dense sampling from \(\mathcal{M}_{\text{GT}}\) should lead to similar relative improvements across different candidate views.
For more information on surface reconstruction metrics, see Surface Reconstruction Metrics.
1 NBV Task Formulation using RRI
Input:
- Current partial reconstruction \(\mathcal{P}_t\) from \(t\) captured views
- Set of candidate viewpoints at time \(t\): \(\mathcal{Q}_t = \{\mathbf{q}_1, \mathbf{q}_2, ..., \mathbf{q}_n\}\)
- Current camera trajectory \(\mathcal{T}_t = \{\mathbf{c}_1, \mathbf{c}_2, ..., \mathbf{c}_t\}\)
Objective: Find the next-best viewpoint that maximizes reconstruction improvement:
\[\mathbf{q}^* = \underset{\mathbf{q} \in \mathcal{Q}_t}{\text{argmax}} \; \text{RRI}(\mathbf{q} \mid \mathcal{P}_t, \mathcal{T}_t)\]
Sequential Planning: For planning a sequence of \(k\) views, the optimal camera trajectory \(\boldsymbol{\tau}^*_{t+k} = \{\mathbf{q}_1, \mathbf{q}_2, ..., \mathbf{q}_k\}\) is:
\[\boldsymbol{\tau}^*_{t+k} = \underset{\boldsymbol{\tau}_{t+k}}{\text{argmax}} \sum_{i=1}^{k} \text{RRI}(\mathbf{q}_i \mid \mathcal{P}_{t+i-1}, \boldsymbol{\tau}_{t+i-1})\]
Subject to:
- Collision constraints: \(\mathbf{q} \notin \mathcal{O}_{\text{occupied}}\) (candidate poses must be in free space)
- Kinematic constraints: \(\|\text{pos}(\mathbf{q}_i) - \text{pos}(\mathbf{q}_{i-1})\| \leq d_{\max}\) (reachability limits)
- Field-of-view constraints: \(\mathcal{V}(\mathbf{q}) \cap \mathcal{S} \neq \emptyset\) (scene entities \(\mathcal{S}\) must be visible from frustum \(\mathcal{V}(\mathbf{q})\))
- Resource constraints: Total trajectory length, time, or number of views within budget