Unraveling the Design Pattern of Physics-Informed Neural Networks: Series 01

Artificial Intelligence

Unraveling the Design Pattern of Physics-Informed Neural Networks: Series 01

admin

May 15, 2023

Unraveling the Design Pattern of Physics-Informed Neural Networks: Series 01

2.1 Problem

Physics-Informed Neural Networks (PINNs) offer a definite advantage over conventional neural networks by explicitly integrating known governing abnormal or partial differential equations (ODEs/PDEs) of physical processes. The enforcement of those governing equations in PINNs relies on a set of points often called residual points. These points are strategically chosen throughout the simulation domain, and the corresponding network outputs are substituted into the governing equations to judge the residuals. The residuals indicate the extent to which the network outputs align with the underlying physical processes, thereby serving as an important physical loss term that guides the neural network training process.

It is clear that the distribution of those residual points plays a pivotal role in influencing the accuracy and efficiency of PINNs during training. Nevertheless, the prevailing approach often involves easy uniform sampling, which leaves ample room for improvement.

Workflow of physics-informed neural network — Illustration of PINN. The part encircled by the dashed line, i.e., the distribution of residual points, is the predominant problem tackled by the paper. (Image by this blog writer)

Consequently, a pressing query arises: How can we optimize the distribution of residual points to boost the accuracy and training efficiency of PINNs?

2.2 Solution

Promising ways of distributing the residual points are by adopting the adaptive strategy and the refinement strategy:

The adaptive strategy signifies that after every certain number of coaching iterations, a latest batch of residual points will be generated to interchange the previous residual points;
The refinement strategy signifies that extra residual points will be added to the prevailing ones, thus “refining” the residual points.

Based on those two foundational strategies, the paper proposed two novel sampling methods: Residual-based Adaptive Distribution (RAD) and Residual-based Adaptive Refinement with Distribution (RAR-D):

1. RAD: Residual-based Adaptive Distribution

The important thing idea is to attract latest residual samples based on a customized probability density function over the spatial domain x. The probability density function P(x) is designed such that it’s proportional to the PDE residual ε(x) at x:

Design pattern of physics-informed neural network — Custom probability density function for generating residual points. (Adopted from the unique paper)

Here, k and c are two hyperparameters, and the expectation term within the denominator will be approximated by e.g., Monte Carlo integration.

In total, there are three hyperparameters for RAD approach: k, c, and the period of resampling N. Although the optimal hyperparameter values are problem-dependent, the suggested default values are 1, 1, and 2000.

2. RAR-D: Residual-based Adaptive Refinement with Distribution

Essentially, RAR-D adds the element of refinement on top of the proposed RAD approach: after certain training iterations, as an alternative of replacing entirely the old residual points with latest ones, RAR-D keeps the old residual points and draws latest residual points in line with the custom probability density function displayed above.

For RAR-D, the suggested default values for k and c are 2 and 0, respectively.

2.3 Why the answer might work

The important thing lies within the designed sampling probability density function: this density function tends to position more points in regions where the PDE residuals are large and fewer points in regions where the residuals are small. This strategic distribution of points enables a more detailed evaluation of the PDE in regions where the residuals are higher, potentially resulting in enhanced accuracy in PINN predictions. Moreover, the optimized distribution allows for more efficient use of computational resources, thus reducing the full variety of points required for accurate resolution of the governing PDE.

2.4 Benchmark

The paper benchmarked the performance of the 2 proposed approaches together with 8 other sampling strategies, when it comes to addressing forward and inverse problems. The considered physical equations include:

Diffusion-reaction equation (inverse problem, calibrating response rate k(x))

Diffusion-reaction equation (inverse problem, calibrating reaction rate k(x))

Korteweg-de Vries equation (inverse problem, calibrating λ₁ and λ₂)

Korteweg-de Vries equation (inverse problem, calibrating λ₁ and λ₂)

The comparison studies yielded that:

RAD all the time performed the very best, thus making it a very good default strategy;
If computational cost is a priority, RAR-D could possibly be a robust alternative, because it tends to supply adequate accuracy and it’s inexpensive than RAD;
RAD & RAR-D are especially effective for classy PDEs;
The advantage of RAD & RAR-D shrinks if the simulated PDEs have smooth solutions.

2.5 Strength and Weakness

👍Strength

dynamically improves the distribution of residual points based on the PDE residuals during training;
results in a rise in PINN accuracy;
achieves comparable accuracy to existing methods with fewer residual points.

👎Weakness

will be more computationally expensive than other non-adaptive uniform sampling methods. Nevertheless, that is the value to pay for the next accuracy;
for PDEs with smooth solutions, e.g., diffusion equation, diffusion-reaction equation, some easy uniform sampling methods may produce sufficiently low errors, making the proposed solution potentially less suitable in those cases;
introduced two latest hyperparameters k and c that should be tuned as their optimal values are problem-dependent.

2.6 Alternatives

Other approaches have been proposed prior to the present paper:

Categorization of various approaches for sampling residual points — A complete of 10 sampling approaches were investigated within the paper. The 2 newly proposed approaches are highlighted in red. (Image by this blog writer)

Amongst those methods, two of them heavily influenced the approaches proposed in the present paper:

Residual-based adaptive refinement (Lu et al.), which is a special case of the proposed RAR-D with a big value of k;
Importance sampling (Nabian et al.), which is a special case of RAD by setting k=1 and c=0.