Turn Yourself right into a 3D Gaussian Splat

Artificial Intelligence

Turn Yourself right into a 3D Gaussian Splat

admin

March 14, 2024

Turn Yourself right into a 3D Gaussian Splat

A Hands-on Guide for Practitioners

Last summer a non-deep learning method for novel view synthesis has entered the sport: 3D Gaussian splattig. It is a technique to represent a scene in 3D and to render images in real-time from any viewing direction. Some even say they’re replacing NeRFs, the predominant method for novel view synthesis and implicit scene representation at the moment. I believe that’s debatable since NeRFs are way more than image renderers. But that’s nothing we care about today… Today we only care about crisp looking 3D models and that’s where 3D Gaussian splatting shines 🎉

On this post we are going to very briefly look into Gaussian Splatting after which switch gears and I’ll show you ways you possibly can turn yourself right into a 3D model.

Bonus: At the tip I’ll show you ways you possibly can then embed your model in an interactive viewer on any website.

So, let’s go!

3D Gaussian Splatting model of Sascha Kirch — Image by Sascha Kirch.

What are Gaussian Splats?
Let’s Turn Ourselves right into a 3D Gaussian Splatting
Conclusion and Further Resources

3D Gaussian splatting is a method to represent a scene in 3D. It is definitely certainly one of some ways. For instance you may also represent a scene as a set of points, a mesh, voxels or using an implicit representation like Neural Radiance Fields (aka. NeRFs).

The muse of 3D Gaussian Splatting has been around for quite a while leading back to 2001 to a classical approach from computer vision called surface splatting.

But how does 3D Gaussian Splatting actually represent a scene?

3D Representation

In 3D Gaussian Splatting a scene is represented by a set of points. Each point has certain attributes related to it to parameterize an anisotropic 3D Gaussian. If a picture is rendered, these Gaussians overlap to form the image. The actual parameterization takes place throughout the optimization phase that matches these parameters in such a way, that rendered images are as close as possible to the unique input images.

A 3D Gaussian is parameterizedwith

its mean µ, which is the x,y,z coordinate in 3D space.
its covariance matrix Σ, which might be interpreted because the spread of the Gaussian in any 3D direction. Because the Gaussian is anisotropic it may be stretched in any direction.
a color normally represented as spherical harmonics. Spherical harmonics allow the Gaussian splats to have different colours from different viewpoints which drastically improves the standard of renders. It allows rendering non-lambertian effects like specularities of metallic objects.
an opacity 𝛼 that determines how transparent the Gaussian might be.

The image bellow shows the influence of a 3D Gaussian Splat with respect to a degree p. Spoiler: that time p might be the one relevant if we render the image.

Influence of a 3D Gaussian i on a point p in 3D space. — Fig.1: Influence of a 3D Gaussian i on some extent p in 3D space. Image by Kate Yurkova

How do you get a picture out of this representation?

Image Rendering

Like NeRFs, 3D Gaussian Splatting uses 𝛼-blending along a ray that’s casted from a camera through the image plane and thru the scene. This mainly signifies that through integration along a ray al intersecting gaussians contribute to the ultimate pixel’s color.

The image bellow shows the conceptual difference between essentially the most basic NeRF (for simplicity) and gaussian splatting.

Fig.2: Conceptual difference between NeRFs and 3D Gaussian Splatting. Image by Kate Yurkova

While conceptually similar, there may be a big difference within the implementation though. In Gaussian Splatting we don’t have any deep learning model just like the multi-layer perceptron (MLP) in NeRFs. Hence we don’t need to guage the implicit function approximated by the MLP for every point (which is comparatively time consuming) but overlap various partially transparent Gaussians of various size and color. We still must solid a minimum of 1 ray per pixel of the image to render the ultimate image.

So mainly through the mixing of all that Gaussians the illusion of an ideal image emerges. In the event you’d remove the transparency from the splats you possibly can actually see the person gaussians of various size and orientation.

Fig.3: Visualizing the 3D Gaussians of an object. Image by Sascha Kirch.

And the way is it optimized?

Optimization

The optimization is theoretically straightforward and simple to know. But after all, as at all times, the success lies in the small print.

To optimize the Gaussian Splattings, we’d like an initial set of points and pictures of the scene. The authors of the paper suggest to make use of the structure from motion (SfM) algorithm to acquire the initial point cloud. During training, the scene is rendered with the estimated camera pose and camera intrinsic obtained from SfM. The rendered image and the unique image are compared, a loss is calculated and the parameters of every Gaussian is optimized with stochastic gradient descent (SGD).

Considered one of the essential details value mentioning is the adaptive densification scheme. SGD is just capable to regulate the parameter of existing Gaussians, but it surely cannot spawn latest ones or destroy existing ones. This might result in holes within the scene or to lack of fine-grained details if there are too few points and to unnecessarily large point clouds if there are too many points. To beat this, the adaptive densification method splits points with large gradients and removes points which have converged to low values.

Fig.4: Adaptive Gaussian densification scheme. Image by B. Kerbl et. al.

Having talked about some theoretical basics let’s now switch gears and jump into the sensible a part of this post, where I show you ways you possibly can create a 3D Gaussian splatting of yourself.

Note: The authors suggest using a GPU with a minimum of 24GB but you possibly can still create your 3D Gaussian Splats using some tricks I’ll will mention once they should be applied. I even have an RTX 2060 mobile with 6GB.

These are the steps we are going to cover:

Installation
Capture a Video
Obtain point cloud and camera poses
Run the Gaussian Splatting Algo
Post processing
(Bonus) Embed your model on an internet site in an interactive viewer

Installation

For the installation you possibly can either hop over to the official 3D Gaussian Splatting repository and follow their instructions or head over to The NeRF Guru on YouTube who does a wonderful job in showing install all you would like. I like to recommend the later.

I personally selected to put in colmap on windows because I used to be not capable of construct colmap from source with GPU support in my WSL environment and for windows there may be a pre-built installer. The optimization for the 3D Gaussian Splatting has been done on Linux. But it surely actually does probably not matter and the commands I show you’re equal on either Windows or Linux.

Capture a Video

Ask someone to capture a video of you. You will need to stand as still as possible and the opposite person must walk around you attempting to capture you from any angle.

Some Hints:

Select a pose where it is simple for you not to maneuver. E.g. holding your hands up for 1 minute without moving will not be that easy 😅
Select a high framerate for capturing the video to scale back motion blur. E.g. 60fps.
If you’ve a small GPU, don’t film in 4k otherwise the optimizer is more likely to crash with an out of memory exception.
Ensure there may be sufficient light, so your recording is crisp and clear.
If you’ve a small GPU, prefer indoor scenes over outdoor scenes. Outdoor scenes have a variety of “high frequency” content aka. small things close to one another like gras and leaves which results in many Gaussians being spawned throughout the adaptive densification.

Once you’ve recorded your video move it to your computer and extract single frames using ffmpeg.

ffmpeg -i  -qscale:v 1 -qmin 1 -vf fps= /%04d.jpg

This command takes the video and converts it into jpg images of top quality with low compression (only jpg works). I normally use between 4–10 frames per second. The output files might be named with an up counting four-digit number.

It is best to then find yourself with a folder stuffed with single frame images like so:

Fig.5: Single frame input images. Image by Sascha Kirch.

Some hints for higher quality:

Remove blurry images — otherwise results in a haze around you and spawns “floaters”.
Remove images where your eyes are closed — otherwise results in blurry eyes in the ultimate model.

Fig.6: Good vs. bad image. Image by Sascha Kirch.

Obtain Point Cloud and Camera Poses

As mentioned earlier the gaussian splatting algorithm must be initialized. A method is to initialize the Gaussians’ mean with the placement of some extent in 3D space. We are able to use the tool colmap which implements structure from motion (SfM) to acquire a sparse point cloud from images only. Luckily, the authors of the 3D Gaussian Splatting paper provided us with code to simplify the method.

So head over to the Gaussian Splatting repo you cloned, activate your environment and call the convert.py script.

python .convert.py -s  --resize

The foundation path to your data is the directory that accommodates the “input” folder with all of the input images. In my case I created a subfolder inside within the repo: ./gaussian-splatting/data/. The argument --resize will output additional images with a down sampling aspects 2, 4, and eight. This is significant in case you run out of memory for prime resolution images, so you possibly can simply switch to a lower resolution.

Note: I needed to set the environment variable CUDA_VISIBLE_DEVICES=0 for the GPU to getting used with colmap.

Depending on the variety of images you’ve, this process might take some time, so either grab a cup of coffee or stare on the progress like I sometimes do wasting a variety of time 😂

Once colmap is completed you possibly can type colmap gui into your command line and inspect the sparse point cloud.

To open the purpose cloud click on “File>import model” and navigate to /sparse/0 and open that folder.

Fig.7: Sparse point cloud output and camera poses from colmap. Image by Sascha Kirch.

The red objects are cameras the SfM algorithm estimated from the input frames. They represent the position and pose of the camera where a frame was captured. SfM further provides the intrinsic camera calibration, which is significant for the 3D gaussian splatting algorithm so gaussians might be rendered right into a 2D image during optimization.

Run the Gaussian Splatting Optimizer

The whole lot up until now has been preparation for the actual 3D Gaussian splatting algorithm.

The script to coach the 3D Gaussian splatt is train.py. I normally wish to wrap those python scripts right into a shell script to have the option so as to add comments and simply change the parameters of a run. Here’s what I exploit:

Apart from the data_device=cpu all arguments are set to their default. In the event you run into memory issues, you possibly can try tweaking the next arguments:

resolution: that is the down sampling factor of the image resolution. 1 means full resolution, and a pair of means half resolution. Since we have now used --resize for the convert.py for the sparse point cloud generation, you possibly can test with 1, 2, 4 and eight. Before lowering the resolution I like to recommend attempting to lower sh_degree first.

sh_degree: Sets the utmost degree of the spherical harmonics, with 3 being the utmost. Lowering this value has a big impact on the memory footprint. Keep in mind that the spherical harmonics control the view-dependent color rendering. Practically sh_degree=1 normally still looks good from my experience.

densify_*_iter: Controls the span of iterations where adaptive densification is performed. Tweaking the argument might lead to fewer points being spawned hence a lower memory footprint. Note that this might need a huge impact on the standard.

If all the pieces seems well, you hopefully find yourself with a scene as shown below. In the subsequent section we jump into the visualization and postprocessing.

Fig.8: Optimized scene represented in 3D gaussian splattings. Image by Sascha Kirch.

You’ll be able to actually see quite nice the gaussian shape of individual splats in low density regions.

Post Processing

Though the Gaussian splatting repo comes with its own visualizer, I prefer to make use of Super Splat because it is way more intuitive and you possibly can directly edit your scene.

So to start, head the Super Splat editor and open your ply-file, situated under ./output/.

I normally begin to remove many of the background points using a sphere as indicated below.