Surface Reconstruction Metrics

This document describes the fundamental metrics used to evaluate 3D reconstruction quality by comparing predicted reconstructions against ground truth meshes.

Evaluating how much a new view improves a reconstruction requires robust metrics that compare a point cloud or mesh to a ground truth mesh. These metrics quantify both accuracy (how close the prediction is to the surface) and completeness (how much of the surface is covered). We summarise commonly used metrics and highlight how they can be used in Relative Reconstruction Improvement (RRI) computation.

1 Key Metrics

1.1 Accuracy (Prediction $\rightarrow$ Ground Truth)

Measures how close predicted points are to the ground truth surface.

\[ \text{Accuracy} = \frac{1}{|\mathcal{P}|} \sum_{\mathbf{p} \in \mathcal{P}} \min_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \|\mathbf{p} - \mathbf{q}\|_2 \]

Components:

$\mathcal{P}$: Sampled points from predicted mesh/reconstruction
$\mathcal{P}_{\text{GT}}$: Sampled points from ground truth mesh
Lower values indicate better reconstruction accuracy

Interpretation:

Measures how close predicted surface is to ground truth
High accuracy (low error) = predicted points lie close to GT surface
Detects over-reconstruction (extra geometry/noise in prediction)
Doesn’t penalize missing geometry

1.2 Completeness (Ground Truth $\rightarrow$ Prediction)

Measures how well the prediction covers the ground truth surface.

\[ \text{Completeness} = \frac{1}{|\mathcal{P}_{\text{GT}}|} \sum_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{p} - \mathbf{q}\|_2 \]

Interpretation:

Measures how well prediction covers ground truth surface
High completeness (low error) = GT points have nearby predicted points
Detects under-reconstruction (missing geometry/holes in prediction)
Doesn’t penalize extra geometry

Key Insight: Accuracy and Completeness use the same distance algorithm with source and target swapped.

1.3 Chamfer Distance (Bidirectional)

Combines both accuracy and completeness into a single symmetric metric:

\[ \text{CD}(\mathcal{P}, \mathcal{P}_{\text{GT}}) = \text{Accuracy} + \text{Completeness} \]

Why bidirectional matters:

Accuracy alone doesn’t penalize missing geometry
Completeness alone doesn’t penalize extra/noisy geometry
Together they provide complete picture of reconstruction quality

This symmetric metric captures both over-reconstruction (accuracy) and under-reconstruction (completeness).

Note: Chamfer distance assumes uniform sampling of both point sets. If the prediction is semi‑dense and the ground truth is uniformly sampled, Chamfer may be biased. In practice we compute point‑to‑mesh distances (see below) and sample enough points to ensure stability.

1.4 Precision, Recall & F-score

Binary classification metrics at threshold $\tau$ (typically 5cm):

Precision - Fraction of predicted points that are “correct”: \[ \text{Precision}_\tau = \frac{|\{\mathbf{p} \in \mathcal{P} : \min_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \|\mathbf{p} - \mathbf{q}\| < \tau\}|}{|\mathcal{P}|} \]

Recall - Fraction of GT points that are “recovered”: \[ \text{Recall}_\tau = \frac{|\{\mathbf{q} \in \mathcal{P}_{\text{GT}}: \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{p} - \mathbf{q}\| < \tau\}|}{|\mathcal{P}_{\text{GT}}|} \]

F-score - Harmonic mean balancing precision and recall: \[ \text{F-score}_\tau = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

captures both accuracy and completeness in a single value
works when the prediction is a point cloud and the GT is a mesh
can be used to evaluate the fitness of a candidate view’s point cloud against a GT mesh, and this relative to the previously seen point cloude from all previous views. so it’s about expressing the value of a view in terms of how much it improves the overall reconstruction relative to the previous state.
must be implementable using torch or other gpu-accelerated libraries for efficiency

2 Point-to-mesh distances

Concept: for each source point, find its closest point on the target surface (the witness) and measure the Euclidean distance. Using both directions yields a symmetric score.

Prediction $\rightarrow$ GT (Accuracy): for each predicted point, find its closest point on the GT mesh (witness on GT) and average the distances.
GT $\rightarrow$ Prediction (Completeness): for samples on the GT surface, find their closest point on the prediction (witness on prediction) and average the distances.

Let $d_{p \rightarrow \mathcal{M}}$ be the average distance from predicted points to the GT mesh, and $d_{\mathcal{M} \rightarrow p}$ be the average distance from GT samples to the prediction. The symmetric point‑to‑mesh distance is $d_{p \leftrightarrow \mathcal{M}} = d_{p \rightarrow \mathcal{M}} + d_{\mathcal{M} \rightarrow p}$ and equals Chamfer when the mesh is densely sampled.

2.1 Witness Points (Closest-Point Correspondences)

In point‑to‑mesh evaluation we often need the actual closest point on the surface — the witness point — not just the distance. Given a point $\mathbf{p}$ and a mesh $\mathcal{M}$, the witness $\mathbf{w}(\mathbf{p}) \in \mathcal{M}$ satisfies

\[\|\mathbf{p} - \mathbf{w}(\mathbf{p})\|_2 = \min_{\mathbf{x} \in \mathcal{M}} \|\mathbf{p} - \mathbf{x}\|_2.\]

Witness points enable:

Visualizing error correspondences and outliers
Aggregating per‑region RRI (assign improvement to surface patches)
Derivatives through distances in learning setups

Note: Only Accuracy yields witnesses on the GT mesh. Completeness witnesses lie on the predicted reconstruction (point cloud or mesh).

2.1.1 Computing witnesses for accuracy and completeness

Given a predicted point cloud $\mathcal{P}$ and GT mesh $\mathcal{M}_{GT}$:

Accuracy witnesses: $\mathcal{W}_{\mathcal{P}\rightarrow \mathcal{M}_{GT}} = \{\mathbf{w}(\mathbf{p}) : \mathbf{p}\in\mathcal{P}\}$ on the GT mesh.
Completeness witnesses: sample points $\mathcal{Q}$ on $\mathcal{M}_{GT}$ and compute $\mathcal{W}_{\mathcal{M}_{GT}\rightarrow \mathcal{P}} = \{\mathbf{w}(\mathbf{q}) : \mathbf{q}\in\mathcal{Q}\}$ on the prediction.

2.1.2 Triangle‑wise closest point (as in ATEK impl)

Project the query point orthogonally onto the triangle’s plane and compute its barycentric coordinates.
If the projection lies inside the triangle (all barycentric in [0,1]), that projection is the witness; otherwise, clamp to the closest edge or vertex (shortest segment/point distance).
Handle degenerate triangles (near‑zero area) by falling back to edge/vertex cases.

ATEK implements the interior projection and a vertex fallback. Extending with edge handling (or a standard “barycentric region” test per Ericson/Eberly) gives correct witnesses on boundaries.

2.1.3 Practical computation (CPU/GPU)

trimesh (CPU, simple): trimesh.proximity.closest_point(mesh, points)
PyTorch3D: pytorch3d.loss.point_mesh_face_distance returns squared distances and computes closest faces internally; combine with a custom triangle closest‑point routine to reconstruct witnesses per chosen face. Useful for millions of queries with batching and autograd support.
ATEK: Adapt compute_pts_to_mesh_dist()

2.2 TSDF error and coverage

When using signed distance fields (TSDFs) for mapping, one can compute per-voxel errors between the predicted TSDF and the ground truth distance field. Two useful measures are:

Surface error: average absolute difference between the predicted and GT zero‑crossings (surface locations). A lower value indicates more accurate surfaces.
Coverage gain: number of voxels whose status changed from unknown to free or occupied after fusing a new view. This counts how much of the scene’s volume has been discovered and directly feeds into RRI in volumetric space.

These TSDF‑based metrics are differentiable and well suited for learning‑based RRI prediction. They complement geometric metrics by capturing free‑space information.

3 Example Scenarios

Understanding how metrics behave in different reconstruction scenarios:

Scenario	Accuracy	Completeness	Interpretation
Perfect reconstruction	Low ✅	Low ✅	Ideal case
Missing geometry (holes)	Low ✅	High ❌	Under-reconstruction
Extra geometry (noise)	High ❌	Low ✅	Over-reconstruction
Misaligned reconstruction	High ❌	High ❌	Wrong pose/scale

4 Computation Methods

Both ATEK and EFM3D provide implementations:

ATEK: evaluate_single_mesh_pair() - Production-ready evaluation
EFM3D: eval_mesh_to_mesh() - Includes visualization utilities

See ATEK Implementation for detailed breakdown.

--- title: "Surface Reconstruction Metrics" --- This document describes the fundamental metrics used to evaluate 3D reconstruction quality by comparing predicted reconstructions against ground truth meshes. Evaluating how much a new view improves a reconstruction requires robust metrics that compare a **point cloud** or **mesh** to a ground truth mesh. These metrics quantify both **accuracy** (how close the prediction is to the surface) and **completeness** (how much of the surface is covered). We summarise commonly used metrics and highlight how they can be used in **Relative Reconstruction Improvement** (RRI) computation. ## Key Metrics ### Accuracy (Prediction $\rightarrow$ Ground Truth) Measures how close predicted points are to the ground truth surface. $$ \text{Accuracy} = \frac{1}{|\mathcal{P}|} \sum_{\mathbf{p} \in \mathcal{P}} \min_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \|\mathbf{p} - \mathbf{q}\|_2 $$ **Components**: - $\mathcal{P}$: Sampled points from predicted mesh/reconstruction - $\mathcal{P}_{\text{GT}}$: Sampled points from ground truth mesh - Lower values indicate better reconstruction accuracy **Interpretation**: - Measures how close predicted surface is to ground truth - High accuracy (low error) = predicted points lie close to GT surface - Detects **over-reconstruction** (extra geometry/noise in prediction) - Doesn't penalize missing geometry ### Completeness (Ground Truth $\rightarrow$ Prediction) Measures how well the prediction covers the ground truth surface. $$ \text{Completeness} = \frac{1}{|\mathcal{P}_{\text{GT}}|} \sum_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{p} - \mathbf{q}\|_2 $$ **Interpretation**: - Measures how well prediction covers ground truth surface - High completeness (low error) = GT points have nearby predicted points - Detects **under-reconstruction** (missing geometry/holes in prediction) - Doesn't penalize extra geometry **Key Insight**: Accuracy and Completeness use the **same distance algorithm** with **source and target swapped**. ### Chamfer Distance (Bidirectional) Combines both accuracy and completeness into a single symmetric metric: $$ \text{CD}(\mathcal{P}, \mathcal{P}_{\text{GT}}) = \text{Accuracy} + \text{Completeness} $$ **Why bidirectional matters**: - **Accuracy alone** doesn't penalize missing geometry - **Completeness alone** doesn't penalize extra/noisy geometry - **Together** they provide complete picture of reconstruction quality This symmetric metric captures both over-reconstruction (accuracy) and under-reconstruction (completeness). **Note:** Chamfer distance assumes uniform sampling of both point sets. If the prediction is semi‑dense and the ground truth is uniformly sampled, Chamfer may be biased. In practice we compute point‑to‑mesh distances (see below) and sample enough points to ensure stability. ### Precision, Recall & F-score Binary classification metrics at threshold $\tau$ (typically 5cm): **Precision** - Fraction of predicted points that are "correct": $$ \text{Precision}_\tau = \frac{|\{\mathbf{p} \in \mathcal{P} : \min_{\mathbf{q} \in \mathcal{P}_{\text{GT}}} \|\mathbf{p} - \mathbf{q}\| < \tau\}|}{|\mathcal{P}|} $$ **Recall** - Fraction of GT points that are "recovered": $$ \text{Recall}_\tau = \frac{|\{\mathbf{q} \in \mathcal{P}_{\text{GT}}: \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{p} - \mathbf{q}\| < \tau\}|}{|\mathcal{P}_{\text{GT}}|} $$ **F-score** - Harmonic mean balancing precision and recall: $$ \text{F-score}_\tau = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} $$  - captures both accuracy and completeness in a single value - works when the prediction is a point cloud and the GT is a mesh - can be used to evaluate the fitness of a candidate view's point cloud against a GT mesh, and this relative to the previously seen point cloude from all previous views. so it's about expressing the value of a view in terms of how much it improves the overall reconstruction relative to the previous state. - must be implementable using torch or other gpu-accelerated libraries for efficiency ## Point-to-mesh distances **Concept**: for each source point, find its closest point on the target surface (the witness) and measure the Euclidean distance. Using both directions yields a symmetric score. - Prediction $\rightarrow$ GT (Accuracy): for each predicted point, find its closest point on the GT mesh (witness on GT) and average the distances. - GT $\rightarrow$ Prediction (Completeness): for samples on the GT surface, find their closest point on the prediction (witness on prediction) and average the distances. Let $d_{p \rightarrow \mathcal{M}}$ be the average distance from predicted points to the GT mesh, and $d_{\mathcal{M} \rightarrow p}$ be the average distance from GT samples to the prediction. The symmetric point‑to‑mesh distance is $d_{p \leftrightarrow \mathcal{M}} = d_{p \rightarrow \mathcal{M}} + d_{\mathcal{M} \rightarrow p}$ and equals Chamfer when the mesh is densely sampled. ### Witness Points (Closest-Point Correspondences) In point‑to‑mesh evaluation we often need the actual closest point on the surface — the witness point — not just the distance. Given a point $\mathbf{p}$ and a mesh $\mathcal{M}$, the witness $\mathbf{w}(\mathbf{p}) \in \mathcal{M}$ satisfies $$\|\mathbf{p} - \mathbf{w}(\mathbf{p})\|_2 = \min_{\mathbf{x} \in \mathcal{M}} \|\mathbf{p} - \mathbf{x}\|_2.$$ Witness points enable: - Visualizing error correspondences and outliers - Aggregating per‑region RRI (assign improvement to surface patches) - Derivatives through distances in learning setups Note: Only Accuracy yields witnesses on the GT mesh. Completeness witnesses lie on the predicted reconstruction (point cloud or mesh). #### Computing witnesses for accuracy and completeness Given a predicted point cloud $\mathcal{P}$ and GT mesh $\mathcal{M}_{GT}$: - Accuracy witnesses: $\mathcal{W}_{\mathcal{P}\rightarrow \mathcal{M}_{GT}} = \{\mathbf{w}(\mathbf{p}) : \mathbf{p}\in\mathcal{P}\}$ on the GT mesh. - Completeness witnesses: sample points $\mathcal{Q}$ on $\mathcal{M}_{GT}$ and compute $\mathcal{W}_{\mathcal{M}_{GT}\rightarrow \mathcal{P}} = \{\mathbf{w}(\mathbf{q}) : \mathbf{q}\in\mathcal{Q}\}$ on the prediction. #### Triangle‑wise closest point (as in ATEK impl) - Project the query point orthogonally onto the triangle’s plane and compute its barycentric coordinates. - If the projection lies inside the triangle (all barycentric in [0,1]), that projection is the witness; otherwise, clamp to the closest edge or vertex (shortest segment/point distance). - Handle degenerate triangles (near‑zero area) by falling back to edge/vertex cases. ATEK implements the interior projection and a vertex fallback. Extending with edge handling (or a standard “barycentric region” test per Ericson/Eberly) gives correct witnesses on boundaries. #### Practical computation (CPU/GPU) - `trimesh` (CPU, simple): `trimesh.proximity.closest_point(mesh, points)` - PyTorch3D: [pytorch3d.loss.point_mesh_face_distance](https://pytorch3d.readthedocs.io/en/latest/modules/loss.html#pytorch3d.loss.point_mesh_face_distance) returns squared distances and computes closest faces internally; combine with a custom triangle closest‑point routine to reconstruct witnesses per chosen face. Useful for millions of queries with batching and autograd support. - ATEK: Adapt `compute_pts_to_mesh_dist()` ### TSDF error and coverage When using **signed distance fields (TSDFs)** for mapping, one can compute per-voxel errors between the predicted TSDF and the ground truth distance field. Two useful measures are: * **Surface error**: average absolute difference between the predicted and GT zero‑crossings (surface locations). A lower value indicates more accurate surfaces. * **Coverage gain**: number of voxels whose status changed from `unknown` to `free` or `occupied` after fusing a new view. This counts how much of the scene’s volume has been discovered and directly feeds into RRI in volumetric space. These TSDF‑based metrics are differentiable and well suited for learning‑based RRI prediction. They complement geometric metrics by capturing free‑space information. ## Example Scenarios Understanding how metrics behave in different reconstruction scenarios: | Scenario | Accuracy | Completeness | Interpretation | |----------|----------|--------------|----------------| | Perfect reconstruction | Low ✅ | Low ✅ | Ideal case | | Missing geometry (holes) | Low ✅ | High ❌ | Under-reconstruction | | Extra geometry (noise) | High ❌ | Low ✅ | Over-reconstruction | | Misaligned reconstruction | High ❌ | High ❌ | Wrong pose/scale | ## Computation Methods Both ATEK and EFM3D provide implementations: - **ATEK**: [`evaluate_single_mesh_pair()`](https://github.com/facebookresearch/ATEK/blob/main/atek/evaluation/surface_reconstruction/surface_reconstruction_metrics.py) - Production-ready evaluation - **EFM3D**: [`eval_mesh_to_mesh()`](https://github.com/facebookresearch/efm3d/blob/main/efm3d/utils/mesh_utils.py) - Includes visualization utilities See [ATEK Implementation](../ext-impl/atek_implementation.qmd) for detailed breakdown.