Reinforcement Learning for Physics: ODEs and Hyperparameter Tuning

Working with ODEs

Physical systems can typically be modeled through differential equations, or equations including derivatives. Forces, hence Newton’s Laws, might be expressed as derivatives, as can Maxwell’s Equations, so differential equations can describe most physics problems. A differential equation describes how a system changes based on the system’s current state, in effect defining state transition. Systems of differential equations might be written in matrix/vector form:

where x is the state vector, A is the state transition matrix determined from the physical dynamics, and x dot (or dx/dt) is the change within the state with a change in time. Essentially, matrix A acts on state x to advance it a small step in time. This formulation is usually used for linear equations (where elements of A don’t contain any state vector) but might be used for nonlinear equations where the weather of A could have state vectors which may result in the complex behavior described above. This equation describes how an environment or system develops in time, ranging from a specific initial condition. In mathematics, these are known as initial value problems since evaluating how the system will develop requires specification of a starting state.

The expression above describes a specific class of differential equations, abnormal differential equations (ODE) where the derivatives are all of 1 variable, often time but occasionally space. The dot denotes dx/dt, or change in state with incremental change in time. ODEs are well studied and linear systems of ODEs have a wide selection of analytic solution approaches available. Analytic solutions allow solutions to be express when it comes to variables, making them more flexible for exploring the entire system behavior. Nonlinear have fewer approaches, but certain classes of systems do have analytic solutions available. For probably the most part though, nonlinear (and a few linear) ODEs are best solved through simulation, where the answer is set as numeric values at each time-step.

Simulation is predicated around finding an approximation to the differential equation, often through transformation to an algebraic equation, that’s accurate to a known degree over a small change in time. Computers can then step through many small changes in time to indicate how the system develops. There are a lot of algorithms available to calculate it will similar to Matlab’s ODE45 or Python SciPy’s solve_ivp functions. These algorithms take an ODE and a place to begin/initial condition, mechanically determine optimal step size, and advance through the system to the desired ending time.

If we will apply the proper control inputs to an ODE system, we will often drive it to a desired state. As discussed last time, RL provides an approach to find out the proper inputs for nonlinear systems. To develop RLs, we’ll again use the gymnasium environment, but this time we’ll create a custom gymnasium environment based on our own ODE. Following Gymnasium documentation, we create an remark space that may cover our state space, and an motion space for the control space. We initialize/reset the gymnasium to an arbitrary point inside the state space (though here we should be cautious, not all desired end states are all the time reachable from any initial state for some systems). Within the gymnasium’s step function, we take a step over a short while horizon in our ODE applying the algorithm estimated input using Python SciPy solve_ivp function. Solve_ivp calls a function which holds the actual ODE we’re working with. Code is accessible on git. The init and reset functions are straightforward; init creates and remark space for each state within the system and reset sets a random place to begin for every of those variables inside the domain at a minimum distance from the origin. Within the step function, note the solve_ivp line that calls the actual dynamics, solves the dynamics ODE over a short while step, passing the applied control K.

#taken from https://www.gymlibrary.dev/content/environment_creation/
#create gym for Moore-Greitzer Model
#motion space: continuous  +/- 10.0 float , perhaps make scale to mu 
#remark space:  -30,30 x2 float for x,y,zand
#reward:  -1*(x^2+y^2+z^2)^1/2 (attempt to drive to 0)#Moore-Grietzer model:
from os import path
from typing import Optional
import numpy as np
import math
import scipy
from scipy.integrate import solve_ivp
import gymnasium as gym
from gymnasium import spaces
from gymnasium.envs.classic_control import utils
from gymnasium.error import DependencyNotInstalled
import dynamics  #local library containing formulas for solve_ivp
from dynamics import MGM
class MGMEnv(gym.Env):
#no render modes
def __init__(self, render_mode=None, size=30):
self.observation_space =spaces.Box(low=-size+1, high=size-1, shape=(2,), dtype=float)
self.action_space = spaces.Box(-10, 10, shape=(1,), dtype=float) 
#must update motion to normal distribution
def _get_obs(self):
return self.state
def reset(self, seed: Optional[int] = None, options=None):
#need below to seed self.np_random
super().reset(seed=seed)
#start random x1, x2 origin
np.random.seed(seed)
x=np.random.uniform(-8.,8.)
while (x>-2.5 and x<2.5):
np.random.seed()
x=np.random.uniform(-8.,8.)
np.random.seed(seed)
y=np.random.uniform(-8.,8.)
while (y>-2.5 and y<2.5):
np.random.seed()
y=np.random.uniform(-8.,8.)
self.state = np.array([x,y])
remark = self._get_obs()
return remark, {}
def step(self,motion):
u=motion.item()
result=solve_ivp(MGM, (0, 0.05), self.state, args=[u])
x1=result.y[0,-1]
x2=result.y[1,-1]
self.state=np.array([x1.item(),x2.item()])
done=False
remark=self._get_obs()
info=x1
reward = -math.sqrt(x1.item()**2)#+x2.item()**2)
truncated = False #placeholder for future expnasion/limits if solution diverges
info = x1
return remark, reward, done, truncated, {}

Below are the dynamics of the Moore-Greitzer Mode (MGM) function. This implementation is predicated on solve_ivp documentation . Limits are placed to avoid solution divergence; if system hits limits reward shall be low to cause algorithm to revise control approach. Creating ODE gymnasiums based on the template discussed here must be straightforward: change the remark space size to match the scale of the ODE system and update the dynamics equation as needed.

def MGM(t, A, K):
#non-linear approximation of surge/stall dynamics of a gas turbine engine per Moore-Greitzer model from
#"Output-Feedbak Cotnrol on Nonlinear systems using Control Contraction Metrics and Convex Optimization"
#by Machester and Slotine
#2D system, x1 is mass flow, x2 is pressure increase
x1, x2 = A
if x1>20:  x1=20.
elif x1<-20:  x1=-20.
if x2>20:  x2=20.
elif x2<-20:  x2=-20.
dx1= -x2-1.5*x1**2-0.5*x1**3
dx2=x1+K
return np.array([dx1, dx2])

For this instance, we’re using an ODE based on the Moore-Greitzer Model (MGM) describe gas turbine engine surge-stall dynamics¹. This equation describes coupled damped oscillations between engine mass flow and pressure. The goal of the controller is to quickly dampen oscillations to 0 by controlling pressure on the engine. MGM has “motivated substantial development of nonlinear control design” making it an interesting test case for the SAC and GP approaches. Code describing the equation might be found on Github. Also listed are three other nonlinear ODEs. The Van Der Pol oscillator is a classic nonlinear oscillating system based on dynamics of electronic systems. The Lorenz Attractor is a seemingly easy system of ODEs that may product chaotic behavior, or results highly sensitive to initial conditions such that any infinitely small different in place to begin will, in an uncontrolled system, soon result in widely divergent state. The third is a mean-field ODE system provided by Duriez/Brunton/Noack that describes development of complex interactions of stable and unstable waves as an approximation to turbulent fluid flow.

To avoid repeating evaluation of the last article, we simply present results here, noting that again the GP approach produced a greater controller in lower computational time than the SAC/neural network approach. The figures below show the oscillations of an uncontrolled system, under the GP controller, and under the SAC controller.

Each algorithms improve on uncontrolled dynamics. We see that while the SAC controller acts more quickly (at about 20 time steps), it’s low accuracy. The GP controller takes a bit longer to act, but provides smooth behavior for each states. Also, as before, GP converged in fewer iterations than SAC.

We have now seen that gymnasiums might be easily adopted to permit training RL algorithms on ODE systems, briefly discussed how powerful ODEs might be for describing and so exploring RL control of physical dynamics, and seen again the GP producing higher final result. Nevertheless, we’ve got not yet tried to optimize either algorithm, as a substitute just organising with, essentially, a guess at basic algorithm parameters. We’ll address that shortcoming now by expanding the MGM study.

Reinforcement Learning for Physics: ODEs and Hyperparameter Tuning

Working with ODEs

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Methods to Use Easy Data Contracts in Python for Data Scientists

NVIDIA Open Sources Audio2Face Animation Model

Blazingly fast whisper transcriptions with Inference Endpoints

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

AlphaFold: Five Years of Impact

Reinforcement Learning for Physics: ODEs and Hyperparameter Tuning

Working with ODEs

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.