This practical work is on 3D shape modeling using binary silhouettes images. Given such 2D silhouettes we can estimate the visual hull of the corresponding 3D shape. The visual hull is, by definition, the maximal volume compatible with a given set of silhouettes and it corresponds to the intersection of the 3D visual cones defined by the silhouette regions.
In the first part, the visual hull will be estimated by an approach called
voxel carving. The idea is to consider a grid of elementary cells in 3D
and to carve the cells that do project outside the silhouettes in
the images. In the second part, a multi-layer perceptron (MLP) will be
trained to learn the shape occupancy in 3D, as defined by the silhouettes,
in the form of an implicit function
The 3D shape of Al used in the practical unit
(file al.off) and 12 image projections.
In the TP folder you have the file al.off of the 3D model Al, which contains a mesh description of the geometry. You also have the script show_mesh.py that visualizes the geometry in a 3D viewer.
Before you can run any python code, you need to activate a python virtual environment by typing in the terminal
source /opt/python-ensimag/bin/activate
Then you can run
python show_mesh.py al.off
You should be able to see the 3D geometry.
In the images folder, you have the projections of the geometry
in each of the
In this part, the objective is to build the 3D voxel representation of the visual hull of Al as defined by the 12 silhouettes. At the end of this part, you should get a 3D representation similar to the following image
The visual hull with a grid of size 300 x 300 x 150.
Open the file voxcarv3D.py which contains the program to be completed.
At the beginning of the file the calibration matrices for the
python voxcarv3D.py
Once the algorithm is run, the program uses the marching cube algorithm to transform the occupancy grid into a 3D mesh that can be exported in a standard format. The resulting mesh alvoxels.off will be saved in the output folder. The numeric occupancy grid will also be exported into a numpy array occupancy.npy. The current program invites you to visualize the obtained results by running
python show_mesh.py output/alvoxels.off
- Complete the program so that the voxels that projects within the
silhouette
$1$ (image1.pgm) are preserved. These voxels define the visual cone associated to image1. - Complete the program to account for the
$12$ images and to preserve then only the voxels that belong to the visual hull.
Note that projections can be performed in an efficient way using numpy array operations. - Open both models (al.off and alvoxels.off) side by side and discuss their differences.
Now we will explore how a MLP can be trained to learn the 3D occupancy
defined by the
The neural implicit function trained with the 300 x 300 x 150 regular grid points.
We will start with the stored occupancy from the previous part.
Before running any code, take a look at the program and answer the following questions:
- Draw the architecture of the MLP (input, output, layers, activation function).
- How are the training data (X, Y, Z, occupancy) formatted for the training ?
- In the training function nif_train, what is the loss function used ?
- Explain the normalization used to weight losses associated to inside and outside points in the training loss ?
- During the training how is the data organized into batches ?
- What does the function binary_acc evaluate ? Is it used for the training ?
- How is the MLP used to generate a result to be visualized ?
To run and edit the program, we need a GPU computer. The practical takes place on the Ensimag / Grenoble INP educational GPU cluster and it's user manual can be found in this link. In a nutshell, first you log on the slurm server:
ssh -YK nash.ensimag.fr
Then you can run a script using
srun --gres=shard:1 --cpus-per-task=8 --mem=12GB --x11=all --unbuffered python <script>.py
- if the flag x11 does not work you can remove it
Run the program MPLImplicit3D.py once. Inspect the computed 3D model output/alimplicit.off and compare it to the original and the voxel carved models.
Then
- Add a line that uses torch.save to save the trained model and run the code again.
- What is the memory size of the MLP ? how does it compare with:
(i) A voxel occupancy grid;
(ii) The original image set plus the calibration ?
Then play with the program:
- Instead of using a regular grid of points for the training modify your program
to generate random points
$(X_{\text{rand}}, Y_{\text{rand}}, Z_{\text{rand}})$ in 3D. Note that the MLP can still be evaluated on the regular grid points$X, Y, Z$ as before for comparison purposes. - The difference between the number of outside and inside points is compensated with a weighting scheme during the training. A more efficient strategy for the training is to reduce the set of outside points before the training. Propose and implement such a strategy.
- Modify the MLP architecture to see the impact of increasing or reducing the number
of parameters through:
(i) the number of layers and
(ii) the layer dimension.
Going further - Improving the results:
- Implement the definition of the positional encoding defined in NERF (Eq. 4). Adapt the network's architecture (input layers) to use it.
- To quantify the quality of the implicit reconstruction results, implement a function that measures the point to mesh distance between the al.off original shape and the reconstructed shape of a method. You can use the point2mesh function.
- For visualization, you can associate the distance of the al.off mesh vertices into a color (heatmap) and display it.
- High values in the previous error distance gives you locations in which the implicit function does not accurately represent the shape. Try to sample more points around these areas and check if the results improve.
- Implement an iterative automatic method that adds more sampling points using the measured error.
- A fundamental questions arises: do the errors come from the sampling or from the architecture? Think of ways on how you could try to answer this question.