Unraveling the Design Pattern of Physics-Informed Neural Networks: Part 05

Artificial Intelligence

Unraveling the Design Pattern of Physics-Informed Neural Networks: Part 05

admin

June 5, 2023

Unraveling the Design Pattern of Physics-Informed Neural Networks: Part 05

2.1 Problem 🎯

In the applying of Physics-Informed Neural Networks (PINNs), it comes as no surprise that the neural network hyperparameters, comparable to network depth, width, the selection of activation function, etc, all have significant impacts on the PINNs’ efficiency and accuracy.

Naturally, people would resort to AutoML (more specifically, neural architecture search) to robotically discover the optimal network hyperparameters. But before we will try this, there are two questions that have to be addressed:

The way to effectively navigate the vast search space?
The way to define a correct search objective?

This latter point is attributable to the indisputable fact that PINN is often seen as an “unsupervised” problem: no labeled data is required because the training is guided by minimizing the ODE/PDE residuals.

Use AutoML to automatical tune the model hyperparameters for Physics-Informed Neural Networks — PINN workflow. PINNs’ performance is extremely sensitive to the network structure. One promising strategy to address this issue is by leveraging AutoML for automatic hyperparameter tuning. (Image by this blog writer)

To raised understand those two issues, the authors have conducted extensive experiments to research the PINN performance’s sensitivity with respect to the network structure. Let’s now take a have a look at what they’ve found.

2.2 Solution 💡

The primary idea proposed within the paper is that the training loss might be used because the surrogate for the search objective, because it highly correlates with the ultimate prediction accuracy of the PINN. This addresses the problem of defining a correct optimization goal for hyperparameter search.

The second idea is that there is no such thing as a have to optimize all network hyperparameters concurrently. As a substitute, we will adopt a step-by-step decoupling strategy to, for instance, first seek for the optimal activation function, then fix the selection of the activation function and find the optimal network width, then fix the previous decisions and optimize network depth, and so forth. Of their experiments, the authors demonstrated that this strategy could be very effective.

With those two ideas in mind, let’s see how we will execute the search intimately.

To begin with, which network hyperparameters are considered? Within the paper, the advisable search space is:

Width: variety of neurons in each hidden layer. The considered range is [8, 512] with a step of 4 or 8.
Depth: variety of hidden layers. The considered range is [3, 10] with a step of 1.
Activation function: Tanh, Sigmoid, ReLU, and Swish.
Changing point: the portion of the epochs using Adam to the entire training epochs. The considered values are [0.1, 0.2, 0.3, 0.4, 0.5]. In PINN, it’s a standard practice to first use Adam to coach for certain epochs after which switch to L-BFGS to maintain training for some epochs. This changing point hyperparameter determines the timing of the change.
Learning rate: a set value of 1e-5, because it has a small effect on the ultimate architecture search results.
Training epochs: a set value of 10000, because it has a small effect on the ultimate architecture search results.

Secondly, let’s examine the proposed procedure intimately:

The primary search goal is the activation function. To attain that, we sample the width and depth parameter space and calculate the losses for all width-depth samples under different activation functions. These results may give us ideas of which activation function is the dominant one. Once decided, we fix the activation function for the next steps.

Hyperparameter tuning for physics-informed neural networks — Step one is to discover the dominant activation function. (Image by this blog writer)

The second search goal is the width. More specifically, we’re searching for a few width intervals where PINN performs well.

The third search goal is the depth. Here, we only consider width various inside the best-performing intervals determined from the last step, and we would really like to seek out the perfect K width-depth combos where PINN performs well.

The ultimate search goal is the changing point. We simply seek for the perfect changing point for every of the top-K configurations identified from the last step.

The end result of this search procedure is K different PINN structures. We will either select the best-performing one out of those K candidates or just use all of them to form a K-ensemble PINN model.

Notice that several tuning parameters have to be laid out in the above-presented procedure (e.g., variety of width intervals, variety of K, etc.), which might rely on the available tuning budget.

As for the particular optimization algorithms utilized in individual steps, off-the-shelf AutoML libraries might be employed to finish the duty. For instance, the authors within the paper used Tune package for executing the hyperparameter tuning.

2.3 Why the answer might work 🛠️

By decoupling the search of various hyperparameters, the size of the search space might be greatly decreased. This not only substantially decreases the search complexity, but additionally significantly increases the possibility of locating a (near) optimal network architecture for the physical problems under investigation.

Also, using the training loss because the search objective is each easy to implement and desirable. Because the training loss (mainly constituted by PDE residual loss) highly correlates with the PINN accuracy during inference (in line with the experiments conducted within the paper), identifying an architecture that delivers minimum training loss may also likely result in a model with high prediction accuracy.

2.4 Benchmark ⏱️

The paper considered a complete of seven different benchmark problems. All problems are forward problems where PINN is used to unravel the PDEs.

Heat equation with Dirichlet boundary condition. This kind of equation describes the warmth or temperature distribution in a given domain over
time.

Heat equation with Neumann boundary conditions.

Wave equation, which describes the propagation of oscillations in an area, comparable to mechanical and electromagnetic waves. Each Dirichlet and Neumann conditions are considered here.

Burgers equation, which has been leveraged to model shock flows, wave propagation in combustion chambers, vehicular traffic movement, and more.

Advection equation, which describes the motion of a scalar field because it is advected by a known velocity vector field.

Advection equation, with different boundary conditions.

Response equation, which describes chemical reactions.

The benchmark studies yielded that:

The proposed Auto-PINN shows stable performance for various PDEs.
For many cases, Auto-PINN is in a position to discover the neural network architecture with the smallest error values.
The search trials are fewer with the Auto-PINN approach.

2.5 Strengths and Weaknesses ⚡

Strengths 💪

Significantly reduced computational cost for performing neural architecture seek for PINN applications.
Improved likelihood of identifying a (near) optimal neural network architecture for various PDE problems.

Weaknesses 📉

The effectiveness of using the training loss value because the search objective might rely on the particular characteristics of the PDE problem at hand, because the benchmarks are performed just for a particular set of PDEs.
Data sampling strategy influences Auto-PINN performance. While the paper discusses the impact of various data sampling strategies, it doesn’t provide a transparent guideline on learn how to select the perfect strategy for a given PDE problem. This might potentially add one other layer of complexity to using Auto-PINN.

2.6 Alternatives 🔀

The traditional out-of-box AutoML algorithms may also be employed to tackle the issue of hyperparameter optimization in Physics-Informed Neural Networks (PINNs). Those algorithms include Random Search, Genetic Algorithms, Bayesian optimization, etc.

In comparison with those alternative algorithms, the newly proposed Auto-PINN is specifically designed for PINN. This makes it a singular and effective solution for optimizing PINN hyperparameters.

There are several possibilities to further improve the proposed strategy:

Incorporating more sophisticated data sampling strategies, comparable to adaptive- and residual-based sampling methods, to enhance the search accuracy and the model performance.

To learn more about learn how to optimize the residual points distribution, take a look at this blog within the PINN design pattern series.

More benchmarking on the search objective, to evaluate if training loss value is indeed a superb surrogate for various varieties of PDEs.
Incorporating other varieties of neural networks. The present version of Auto-PINN is designed for multilayer perceptron (MLP) architectures only. Future work could explore convolutional neural networks (CNNs) or recurrent neural networks (RNNs), which could potentially enhance the aptitude of PINNs in solving more complex PDE problems.
Transfer learning in Auto-PINN. As an example, architectures that perform well on certain varieties of PDE problems could possibly be used as starting points for the search process on similar varieties of PDE problems. This might potentially speed up the search process and improve the performance of the model.