Introduction to 3D Gaussian Splatting

-


Dylan Ebert's avatar


3D Gaussian Splatting is a rasterization technique described in 3D Gaussian Splatting for Real-Time Radiance Field Rendering that enables real-time rendering of photorealistic scenes learned from small samples of images. This text will break down how it really works and what it means for the long run of graphics.



What’s 3D Gaussian Splatting?

3D Gaussian Splatting is, at its core, a rasterization technique. Meaning:

  1. Have data describing the scene.
  2. Draw the info on the screen.

That is analogous to triangle rasterization in computer graphics, which is used to attract many triangles on the screen.

Nevertheless, as an alternative of triangles, it’s gaussians. Here’s a single rasterized gaussian, with a border drawn for clarity.

It’s described by the next parameters:

  • Position: where it’s positioned (XYZ)
  • Covariance: the way it’s stretched/scaled (3×3 matrix)
  • Color: what color it’s (RGB)
  • Alpha: how transparent it’s (α)

In practice, multiple gaussians are drawn directly.

That is three gaussians. Now what about 7 million gaussians?

Here’s what it looks like with each gaussian rasterized fully opaque:

That is a really transient overview of what 3D Gaussian Splatting is. Next, let’s walk through the total procedure described within the paper.



How it really works



1. Structure from Motion

Step one is to make use of the Structure from Motion (SfM) method to estimate some extent cloud from a set of images. This can be a method for estimating a 3D point cloud from a set of 2D images. This could be done with the COLMAP library.



2. Convert to Gaussians

Next, each point is converted to a gaussian. That is already sufficient for rasterization. Nevertheless, only position and color could be inferred from the SfM data. To learn a representation that yields prime quality results, we want to coach it.



3. Training

The training procedure uses Stochastic Gradient Descent, just like a neural network, but without the layers. The training steps are:

  1. Rasterize the gaussians to a picture using differentiable gaussian rasterization (more on that later)
  2. Calculate the loss based on the difference between the rasterized image and ground truth image
  3. Adjust the gaussian parameters based on the loss
  4. Apply automated densification and pruning

Steps 1-3 are conceptually pretty straightforward. Step 4 involves the next:

  • If the gradient is large for a given gaussian (i.e. it’s too unsuitable), split/clone it
    • If the gaussian is small, clone it
    • If the gaussian is large, split it
  • If the alpha of a gaussian gets too low, remove it

This procedure helps the gaussians higher fit fine-grained details, while pruning unnecessary gaussians.



4. Differentiable Gaussian Rasterization

As mentioned earlier, 3D Gaussian Splatting is a rasterization approach, which pulls the info to the screen. Nevertheless, some vital elements are also that it’s:

  1. Fast
  2. Differentiable

The unique implementation of the rasterizer could be found here. The rasterization involves:

  1. Project each gaussian into 2D from the camera perspective.
  2. Sort the gaussians by depth.
  3. For every pixel, iterate over each gaussian front-to-back, mixing them together.

Additional optimizations are described in the paper.

It is also essential that the rasterizer is differentiable, in order that it will probably be trained with stochastic gradient descent. Nevertheless, this is barely relevant for training – the trained gaussians can be rendered with a non-differentiable approach.



Who cares?

Why has there been a lot attention on 3D Gaussian Splatting? The plain answer is that the outcomes speak for themselves – it’s high-quality scenes in real-time. Nevertheless, there could also be more to the story.

There are various unknowns as to what else could be done with Gaussian Splatting. Can they be animated? The upcoming paper Dynamic 3D Gaussians: tracking by Persistent Dynamic View Synthesis suggests that they’ll. There are various other unknowns as well. Can they do reflections? Can they be modeled without training on reference images?

Finally, there’s growing research interest in Embodied AI. That is an area of AI research where state-of-the-art performance continues to be orders of magnitude below human performance, with much of the challenge being in representing 3D space. Provided that 3D Gaussian Splatting yields a really dense representation of 3D space, what might the implications be for Embodied AI research?

These questions call attention to the strategy. It stays to be seen what the actual impact shall be.



The long run of graphics

So what does this mean for the long run of graphics? Well, let’s break it up into pros/cons:

Pros

  1. High-quality, photorealistic scenes
  2. Fast, real-time rasterization
  3. Relatively fast to coach

Cons

  1. High VRAM usage (4GB to view, 12GB to coach)
  2. Large disk size (1GB+ for a scene)
  3. Incompatible with existing rendering pipelines
  4. Static (for now)

Up to now, the unique CUDA implementation has not been adapted to production rendering pipelines, like Vulkan, DirectX, WebGPU, etc, so it’s yet to be seen what the impact shall be.

There have already been the next adaptations:

  1. Distant viewer
  2. WebGPU viewer
  3. WebGL viewer
  4. Unity viewer
  5. Optimized WebGL viewer

These rely either on distant streaming (1) or a conventional quad-based rasterization approach (2-5). While a quad-based approach is compatible with a long time of graphics technologies, it could end in lower quality/performance. Nevertheless, viewer #5 demonstrates that optimization tricks may end up in prime quality/performance, despite a quad-based approach.

So will we see 3D Gaussian Splatting fully reimplemented in a production environment? The reply is probably yes. The first bottleneck is sorting hundreds of thousands of gaussians, which is completed efficiently in the unique implementation using CUB device radix sort, a highly optimized sort only available in CUDA. Nevertheless, with enough effort, it’s actually possible to realize this level of performance in other rendering pipelines.

If you’ve any questions or would really like to get entangled, join the Hugging Face Discord!



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x