A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

-

the goal is to search out the most effective (maximum or minimum) value of an objective function by choosing real variables that satisfy a set of equality and inequality constraints.

A general constrained optimization problem is to pick nn real decision variables x0,x1,,xn1x_0,x_1,dots,x_{n-1} from a given feasible region in such a way as to optimize (minimize or maximize) a given objective function

[f(x_0,x_1,dots,x_{n-1}).]

We often let xx denote the vector of nn real decision variables x0,x1,,xn1.x_0,x_1,dots,x_{n-1}. That’s, x=(x0,x1,,xn1)x=(x_0,x_1,dots,x_{n-1}) and write the overall nonlinear program as:

[begin{aligned}text{ Maximize }&f(x)text{ Subject to }&g_i(x)leq b_i&&qquad(i=0,1,dots,m-1)&xinmathbb{R}^nend{aligned}]

where each of the constraint functions g0g_0 through gm1g_{m-1} is given, and every bib_i is a relentless (Bradley et al., 1977).

This is simply one possible approach to write the issue. Minimizing f(x)f(x) is similar to maximizing f(x).-f(x). Likewise, an equality constraint h(x)=bh(x)=b may be expressed because the pair of inequalities h(x)bh(x)leq b and h(x)b.-h(x)leq-b. Furthermore, by adding a slack variable, any inequality may be converted into an equality constraint (Bradley et al., 1977).

These kinds of problems appear across many application areas, for instance, in business settings where an organization goals to maximise profit or minimize costs while operating on resources or funding constraints (Cherry, 2016).

If f(x)f(x) is a linear function and the constraints are linear, then the issue is named a (LP) (Cherry, 2016).

“The issue is named a (NLP) if the target function is nonlinear and/or the feasible region is set by nonlinear constraints.” (Bradley et al., 1977, p. 410)

The assumptions and approximations utilized in linear programming can sometimes produce an acceptable model for the choice variable ranges of interest. In other cases, nonetheless, nonlinear behavior in the target function and/or the constraints is important to formulate the applying as a mathematical program accurately (Bradley et al., 1977).

Nonlinear programming refers to methods for solving an NLP. Although many mature solvers exist for LPs, NLPs, especially those involving higher-order nonlinearities, are typically tougher to unravel (Cherry, 2016).

Difficult nonlinear programming problems arise in areas equivalent to electronic circuit design, financial portfolio optimization, gas network optimization, and chemical process design (Cherry, 2016).

One approach to tackle nonlinear programming problems is to linearize them and use an LP solver to acquire a great approximation. One would really like to do this for several reasons. As an illustration, linear models are frequently faster to unravel and may be more numerically stable.

To maneuver from theory to a working Python model, we are going to walk through the next sections:

This text assumes some familiarity with mathematical programming. After defining Separable Functions, we present a separable nonlinear maximization Example and description an approach based on of the nonlinear objective function.

Next, we define Special Ordered Sets of Type 2 and explain how they support the numerical formulation.

We then introduce the theoretical background in On Convex and Concave Functions, providing tools used throughout the rest of this work on nonlinear programming (NLP).

Finally, we present a Python Implementation procedure that uses Gurobipy and PWL approximations of the nonlinear objective function, enabling LP/MIP solvers to acquire useful approximations for fairly large NLPs.


Separable Functions

This text’s solution procedure is for Separable programs are optimization problems of the shape:

[begin{aligned}text{ Maximize }&sumlimits_{j=0}^{n-1}f_j(x_j)text{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0qquad(i=0,1,dots,m-1)&xinmathbb{R}^nend{aligned}]

where each of the functions fjf_j and gijg_{ij} are known. These problems are called separable because the choice variables appear individually: one in each constraint function gijg_{ij} and one in each objective function fjf_j (Bradley et al., 1977).


Example

Consider the next problem from Bradley et al. (1977) that arises in portfolio selection:

[begin{aligned}text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-(x_0+x_1)^2text{ Subject to }&x_0+x_1leq5&x_0geq0,x_1geq0.end{aligned}]

Objective function (Animation by the writer).
Objective function over the feasible region (Animation by the writer).

As stated, the issue shouldn’t be separable due to the term (x0+x1)2(x_0+x_1)^2 in the target function. Nonetheless, nonseparable problems can ceaselessly be reduced to a separable form using various formulation tricks. Specifically, by letting x2=x0+x1x_2=x_0+x_1 we will re-express the issue above in separable form as:

[begin{aligned}text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-x_2^2text{ Subject to }&x_0+x_1leq5&x_0+x_1-x_2=0&x_0geq0,x_1geq0,x_2geq 0.end{aligned}]

Clearly, the linear constraints are separable, and the target function can now be written as f(x)=f0(x0)+f1(x1)+f2(x2),f(x)=f_0(x_0)+f_1(x_1)+f_2(x_2), where

[begin{aligned}f_0(x_0)&=20x_0-2x_0^2f_1(x_1)&=16x_1-x_1^2end{aligned}]

and

[f_2(x_2)=-x_2^2.]

Nonlinear functions (Image by the writer).

To form the approximation problem, we approximate each nonlinear term fj(xj)f_j(x_j) by a piecewise linear function fj^(xj)hat{f_j}(x_j) by utilizing a specified variety of breakpoints.

Each PWL function fj^(xj)hat{f_j}(x_j) is defined over an interval [l,u][l,u] and consists of line segments whose endpoints lie on the nonlinear function fj(xj),f_j(x_j), starting at (l,fj(l))(l,f_j(l)) and ending at (u,fj(u)).(u,f_j(u)). We represent the PWL functions using the next parametrization. One can write any lxjulleq x_jleq u as:

[x_j=sumlimits_{i=0}^{r-1}theta_j^i a_i,quad hat f_j(x_j)=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i),quadsumlimits_{i=0}^{r-1}theta_j^i=1,quadtheta_j=(theta_j^0,theta_j^1,dotstheta_j^{r-1})inmathbb{R}_{geq0}^r]

for every j=0,1,2,j=0,1,2, where the PWL functions are specified by the points {(ai,fj(ai)}{(a_i,f_j(a_i)} for i=0,1,,r1i=0,1,dots,r-1 (Cherry, 2016). Here, we use 4 uniformly distributed breakpoints to approximate each fj(xj)f_j(x_j) Also, since x0+x15,x_0+x_1leq5, x0+x1=x2,x_0+x_1=x_2, and x0,x1,x20x_0,x_1,x_2geq0 now we have that

[0leq x_0leq5,qquad0leq x_1leq 5,quadtext{ and }quad0leq x_2leq5.]

And so, our approximations needn’t extend beyond these variable bounds. Subsequently the PWL function fj^(xj)hat{f_j}(x_j) will likely be defined by the points (0,fj(0)),(0,f_j(0)), (5/3,fj(5/3)),(5/3,f_j(5/3)), (10/3,fj(10/3))(10/3,f_j(10/3)) and (5,fj(5))(5,f_j(5)) for j=0,1,2.j=0,1,2.

Any x0[0,5]x_0in[0,5]capmathbb{R} may be written as

[begin{aligned}x_0&=theta_0^0cdot0+theta_0^1cdotfrac{5}{3}+theta_0^2cdotfrac{10}{3}+theta_0^3cdot5&=theta_0^1cdotfrac{5}{3}+theta_0^2cdotfrac{10}{3}+theta_0^3cdot 5text{ s.t. }sumlimits_{i=0}^3theta_0^i=1end{aligned}]

and f0(x0)f_0(x_0) may be approximated as

[begin{aligned}hat f_0(x_0)&=theta_0^0f_0(0)+theta_0^1f_0left(frac{5}{3}right)+theta_0^2f_0left(frac{10}{3}right)+theta_0^3f_0(5)&=theta_0^1cdotfrac{250}{9}+theta_0^2cdotfrac{400}{9}+theta_0^3cdot50.end{aligned}]

The identical reasoning follows for x1x_1 and x2x_2 For instance, evaluating the approximation at x0=1.5x_0=1.5 gives:

[begin{aligned}hat f_0(1.5)&=frac{1}{10}f_0(0)+frac{9}{10}f_0left(frac{5}{3}right)&=frac{1}{10}cdot 0+frac{9}{10}cdotfrac{250}{9}&=25end{aligned}]

since

[1.5=frac{1}{10}cdot0+frac{9}{10}cdotfrac{5}{3}.]

PWL approximation curves (Image by the writer).

Such weighted sums are called convex mixtures, i.e., linear mixtures of points with non-negative coefficients that sum to 1. Now, since f0(x0),f_0(x_0), f1(x1),f_1(x_1), and f2(x2)f_2(x_2) are concave functions, one may ignore additional restrictions, and the issue may be solved as an LP (Bradley et al., 1977). Nonetheless, on the whole, such a formulation doesn’t suffice. Specifically, we still must implement the adjacency condition: At most two θjitheta_j^i weights are positive, and if two weights are positive, then they’re adjoining, i.e., of the shape θjitheta_j^i and θji+1.theta_j^{i+1}. An analogous restriction applies to every approximation. As an illustration, if the weights θ00=2/5theta_0^0=2/5 and θ02=3/5theta_0^2=3/5 then the approximation gives

[begin{aligned}hat f_0(x_0)&=frac{2}{5}cdot 0+frac{3}{5}cdotfrac{400}{9}=frac{80}{3}x_0&=frac{2}{5}cdot0+frac{3}{5}cdotfrac{10}{3}=2.end{aligned}]

In contrast, at x0=2,x_0=2, the approximation curve gives

[begin{aligned}hat f_0(x_0)&=frac{4}{5}cdotfrac{250}{9}+frac{1}{5}cdotfrac{400}{9}=frac{280}{9}x_0&=frac{4}{5}cdotfrac{5}{3}+frac{1}{5}cdotfrac{10}{3}=2.end{aligned}]

Need for adjacency condition (Image by the writer).

One can implement the adjacency condition by introducing additional binary variables into the model, as formulated in Cherry (2016). Nonetheless, we leverage Gurobipy‘s SOS (Special Ordered Set) constraint object.

The entire program is as follows

[begin{alignat}{2}text{ Maximize }&hat f_0(x_0)+hat f_1(x_1)+hat f_2(x_2)text{ Subject to }&x_0=sumlimits_{i=0}^{r-1}theta_0^ia_i&x_1=sumlimits_{i=0}^{r-1}theta_1^ia_i&x_2=sumlimits_{i=0}^{r-1}theta_2^ia_i&hat f_0=sumlimits_{i=0}^{r-1}theta_0^if_0(a_i)&hat f_1=sumlimits_{i=0}^{r-1}theta_1^if_1(a_i)&hat f_2=sumlimits_{i=0}^{r-1}theta_2^if_2(a_i)&sumlimits_{i=0}^{r-1}theta_j^i=1&&qquad j=0,1,2&{theta_j^0,theta_j^1,dotstheta_j^{r-1}}text{ SOS type 2 }&&qquad j=0,1,2&x_0,x_1,x_2in[0,5]capmathbb{R}&hat f_0in[0,50]capmathbb{R}&hat f_1in[0,55]capmathbb{R}&hat f_2in [-25,0]capmathbb{R}&theta_j^iin[0,1]capmathbb{R}&&qquad j=0,1,2,;i=0,1,dots,{r-1}.end{alignat}]


Special Ordered Sets of Type 2

“A set of variables {x0,x1,,xn1}{x_0,x_1,dots,x_{n-1}} is a special ordered set of type 2 () if xixj=0x_ix_j=0 every time |ij|2,|i-j|geq2, i.e., at most two variables within the set may be nonzero, and if two variables are nonzero they need to be adjoining within the set.” (de Farias et al., 2000, p. 1)

In our formulation, the adjacency condition on the θjitheta_j^i variables matches exactly a SOS type 2 condition: at most two θjitheta_j^i may be positive, and they need to correspond to adjoining breakpoints. Using SOS type 2 constraints lets us ensure adjacency directly in Gurobi, which applies specialized branching strategies that routinely capture the SOS structure, as a substitute of introducing extra binary variables by hand.


On Convex and Concave Functions

For the next discussion, we take a general separable constrained maximization problem instance, as previously defined within the Introduction section:

[begin{aligned}text{ Maximize }&sumlimits_{j=0}^{n-1}f_j(x_j)text{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0qquad(i=0,1,dots,m-1)&xinmathbb{R}^n.end{aligned}]

where each of the functions fif_i and gijg_{ij} are known.

Throughout, the term refers back to the secant interpolant obtained by connecting ordered breakpoints pairs (ak,f(ak))(a_k,f(a_k)) with line segments.

Convex and concave are amongst essentially the most essential functional forms in mathematical optimization. Formally, a function f(x)f(x) is named if, for each yy and zz and each 0λ1,0leqlambdaleq1,

[f[lambda y+(1-lambda)z]leqlambda f(y)+(1-lambda)f(z).]

This definition implies that the sum of convex functions is convex, and nonnegative multiples of convex functions are also convex. functions are simply the negative of convex functions, for which the definition above follows aside from the reversed direction of the inequality. In fact, some functions are neither convex nor concave.

Convex and concave functions (Eric W. Weisstein on MathWorld)

It is straightforward to see that linear functions are each convex and concave. This property is important because it allows us to optimize (maximize or minimize) linear objective functions using computationally efficient techniques, equivalent to the simplex method in linear programming (Bradley et al., 1977).

Moreover, a single-variable differentiable function f(x)f(x) is convex on an interval exactly when its derivative f(x)f^prime(x) is nondecreasing — meaning the slope doesn’t go down. Likewise, differentiable f(x)f(x) is concave on an interval exactly when f(x)f^prime(x) is nonincreasing — meaning the slope doesn’t go up.

From the definition above, one can trivially conclude that PWL approximations overestimate convex functions since every line segment joining two points on its graph doesn’t lie below the graph at any point. Similarly, PWL approximations underestimate concave functions.

Nonlinear Objective Function

This notion helps us explain why we will omit the SOS type 2 constraints from an issue equivalent to the one within the Example section. By concavity, the PWL approximation curve lies above the segment joining any two non-adjacent points. Subsequently, the maximization will select the approximation curve with only adjoining weights. An analogous argument applies if three or more weights are positive (Bradley et al., 1977). Formally, suppose that every fj(xj)f_j(x_j) is a concave function and every known gijg_{ij} is a linear function in x.x. Then we will apply the identical procedure as within the Example section — We let breakpoints satisfy

[l= a_0lt a_1lt dotslt a_{r-1}= u,qquad rgeq 2.]

For real numbers lxju.lleq x_jleq u. Also, let θj+rtheta_jinmathbb{R}_+^r such that,

[sumlimits_{i=0}^{r-1}theta_j^i=1]

and assign

[x_j=sumlimits_{i=0}^{r-1}theta_j^icdot a_i,qquadhat f_j=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)]

for every j=0,1,,n1.j=0,1,dots,n-1. And so, the transformed LP is as follows

[begin{alignat}{2}text{ Maximize }&sumlimits_{j=0}^{n-1}hat f_jtext{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0&&qquad(i=0,1,dots,m-1)&x_j=sumlimits_{i=0}^{r-1}theta_j^icdot a_i&&qquad(j=0,1,dots,n-1)&hat f_j=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)&&qquad(j=0,1,dots,n-1)&sum_{i=0}^{r-1}theta_j^i=1&&qquad(j=0,1,dots,n-1)&theta_jinmathbb{R}_{geq 0}^r&&qquad(j=0,1,dots,n-1)&xinmathbb{R}^n.end{alignat}]

Now, let pj(xj)p_j(x_j) be the PWL curve that goes through the points (ai,fj(ai))(a_i,f_j(a_i)) i.e., for every xj[ak,ak+1],x_jin[a_k,a_{k+1}], j=0,1,,n1:j=0,1,dots,n-1:

[p_j(x_j)=frac{a_{k+1}-x_j}{a_{k+1}-a_k}f_j(a_k)+frac{x_j-a_k}{a_{k+1}-a_k}f_j(a_{k+1})]

fj(xj)f_j(x_j) concave implies that its secant slopes are non-increasing, and since pj(xj)p_j(x_j) is constructed by joining ordered breakpoints on the graph of fj(xj),f_j(x_j), now we have that pj(xj)p_j(x_j) can be concave. And so, by Jensen’s inequality,

[p_j(x_j)=p_jleft(sumlimits_{i=0}^{r-1}theta_j^icdot a_iright)geqsumlimits_{i=0}^{r-1}theta_j^icdot p_j(a_i)=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)=hat f_j.]

Furthermore, for all xjx_j in any feasible solution ((x0,f0^,θ0),(x1,f1^,θ1),,(xn1,f^n1,θn1)),((x_0,hat{f_0},theta_0),(x_1,hat{f_1},theta_1),dots,(x_{n-1},hat{f}_{n-1},theta_{n-1})), one can select kk such that xj[ak,ak+1]x_jin[a_k,a_{k+1}] and define the vector θj^hat{theta_j} by

[hattheta_j^k=frac{a_{k+1}-x_j}{a_{k+1}-a_k},qquad hattheta_j^{k+1}=frac{x_j-a_k}{a_{k+1}-a_k},qquadhattheta_j^i=0,;(inotin {k,k+1})qquad]

with θ^jk,θ^jk+10hattheta_j^k,hattheta_j^{k+1}geq 0 and θ^jk+θ^jk+1=1.hattheta_j^k+hattheta_j^{k+1}=1. Then θ^jkak+θ^jk+1ak+1=xjhattheta_j^kcdot a_k+hattheta_j^{k+1}cdot a_{k+1}=x_j and offers

[hat f_j^text{new}=hattheta_j^k f_j(a_k)+hattheta_j^{k+1}f_j(a_{k+1})=p_j(x_j)geqhat f_j.]

Since the constraints rely on θjtheta_j only through the induced value xj,x_j, replacing each θjtheta_j by θ^jhattheta_j preserves feasibility (since it leaves each xjx_j unchanged) and it weakly improves the target function (since it increases each f^jhat f_j to pj(xj)p_j(x_j)). Subsequently, the SOS type 2 constraints that implement the adjacency condition don’t change the optimal value in separable maximization problems with concave objective functions. An analogous argument shows that the SOS type 2 constraints don’t change the optimal value in separable minimization problems with convex objective functions.

Note, nonetheless, that this shows that there exists an optimal solution by which only weights corresponding to adjoining breakpoints are positive. It doesn’t guarantee that every one optimal solutions of the LP satisfy adjacency. Subsequently, one should still include SOS type 2 constraints to implement a consistent representation.

In practice, solving models with concave objective functions via PWL approximations can yield an objective value that underestimates the true optimal value. Conversely, solving models with convex objective functions via PWL approximations yield an objective value that overestimates the true optimal value.

Nonetheless, although these approximations can distort the target value, in models where the unique linear constraints remain unchanged, feasibility shouldn’t be affected by the PWL approximations. Any variable assignments found by solving the model through PWL approximations satisfy the identical linear constraints and due to this fact lie in the unique feasible region. Consequently, one can evaluate the unique nonlinear objective f(x)f(x) on the returned solution to acquire the target value, although this value needn’t equal the true nonlinear optimum.

Nonlinear Constraints

What requires different approaches are PWL approximations of since these approximations can change the unique feasible region. We define the of the issue, just like the one above, by:

[F=bigcaplimits_{i=0}^{m-1}{xinmathbb{R}^n:G_i(x)leq0},]

where

[G_i(x):=sumlimits_{j=0}^{n-1}g_{ij}(x_j).]

Now, suppose (for notational simplicity) that every one Gi(x)G_i(x) are nonlinear functions. Approximate each univariate gij(xj)g_{ij}(x_j) by a PWL function g^ij(xj)hat g_{ij}(x_j) and define:

[hat G_i(x):=sumlimits_{j=0}^{n-1}hat g_{ij}(x_j).]

Then

[hat F=bigcaplimits_{i=0}^{m-1}{xinmathbb{R}^n:hat G_i(x)leq0},]

is the feasible set of the transformed LP that uses PWL approximations. To avoid truly infeasible solutions, we wish

[hat Fsubseteq F.]

If, as a substitute, FF^Fsubseteqhat F then there may exist xF^Fxin hat Fsetminus F meaning the model that uses PWL approximations could accept points that violate the unique nonlinear constraints.

Feasible sets (Image by the writer).

Given the best way we construct the PWL approximations, a sufficient condition for F^Fhat Fsubseteq F is that for constraints of the shape Gi(x)0G_i(x)leq0 each gij(xj)g_{ij}(x_j) is convex and for constraints of the shape Gi(x)0G_i(x)geq0 each gij(xj)g_{ij}(x_j) is concave.

To see why this holds, take any xF^.xinhat F. First, consider constraints of the shape Gi(x)0G_i(x)leq0 where each gij(xj)g_{ij}(x_j) is convex. Because PWL approximations overestimate convex functions, for any fixed i=0,1,,m1,i=0,1,dots,m-1, now we have:

[(forall j,;hat g_{ij}(x_j)geq g_{ij}(x_j))rightarrowsumlimits_{j=0}^{n-1}hat g_{ij}(x_j)geqsumlimits_{j=0}^{n-1}g_{ij}(x_j)rightarrowhat G_i(x)geq G_i(x)]

Since xF^,xinhat F, it satisfies G^i(x)0.hat G_i(x)leq0. Along with G^i(x)Gi(x),hat G_i(x)geq G_i(x), this means Gi(x)0.G_i(x)leq0.

Next, consider constraints of the shape Gi(x)0G_i(x)geq0 where each gij(xj)g_{ij}(x_j) is concave. Because PWL approximations underestimate concave functions, for any fixed i=0,1,,m1,i=0,1,dots,m-1, now we have:

[(forall j,;hat g_{ij}(x_j)leq g_{ij}(x_j))rightarrowsumlimits_{j=0}^{n-1}hat g_{ij}(x_j)leqsumlimits_{j=0}^{n-1}g_{ij}(x_j)rightarrowhat G_i(x)leq G_i(x).]

Again, since xF^,xinhat F, it satisfies G^i(x)0.hat G_i(x)geq0. Combined with G^i(x)Gi(x),hat G_i(x)leq G_i(x), this means Gi(x)0.G_i(x)geq0.

This motivates the notion of a A set CC is convex if, for any two points x,yCx,yin C your entire line segment connecting them lies inside C.C. Formally, CC is convex if for all x,yCx,yin C and any λ[0,1]lambdain[0,1] the purpose λx+(1λ)ylambda x+(1-lambda)y can be in C.C.

Convex and concave functions (Eric W. Weisstein on MathWorld)

Specifically, if Gi:nG_i:mathbb{R}^ntomathbb{R} is convex, then

[S_i:={xinmathbb{R}^n:G_i(x)leq b}]

is convex for any b.binmathbb{R}. Proof: Take any x1,x2S,x_1,x_2in S, then Gi(x1)bG_i(x_1)leq b and Gi(x2)b.G_i(x_2)leq b. Since Gi(x)G_i(x) is convex, for anyλ[0,1]lambdain[0,1] now we have that

[G_i(lambda x_1+(1-lambda)x_2)leqlambda G_i(x_1)+(1-lambda)G_i(x_2)leqlambda b+(1-lambda)b=b.]

Hence, λx1+(1λ)x2S,lambda x_1+(1-lambda) x_2in S, and so, SS is convex. An analogous argument applies to point out that

[T_i:={xinmathbb{R}^n:G_i(x)geq b}]

is convex for any bbinmathbb{R} if Gi:nG_i:mathbb{R}^ntomathbb{R} is concave.

And since one could easily show that the intersection of any collection of convex sets can be convex,

“for a convex feasible set we wish convex functions for less-than-or-equal-to constraints and concave functions for greater-than-or-equal to constraints.” (Bradley et al., 1977, p.418)

Nonlinear optimization problems by which the goal is to attenuate a convex objective (or maximize a concave one) over a convex set of constraints are called These problems are generally easier to tackle than nonconvex ones.

Indeed, if FF^Fsubseteqhat F then variable assignments may result in constraint violations. In these cases, one would wish to depend on other techniques, equivalent to the easy modification MAP to the Frank-Wolfe algorithm provided in Bradley et al. (1977), and/or introduce tolerances that, to some extent, allow constraint violations.


Python Implementation

Before presenting the code, it is helpful to contemplate how our modeling approach will scale as the issue size increases. For a separable program with nn decision variables and rr breakpoints per variable, the model introduces nrncdot r continuous variables (for the convex mixtures) and rr SOS type 2 constraints. Because of this each the variety of variables and constraints grow linearly with nn and r,r, but their product can quickly result in large models when either parameter is increased.

As previously mentioned, we are going to use Gurobipy to numerically solve this system above. Generally, the procedure is as follows:

  1. Select bounds [l,u][l,u]
  2. Select breakpoints aia_i
  3. Add θ,theta, convex-combination constraints
  4. Add SOS type 2
  5. Solve
  6. Evaluate the nonlinear objective on the returned xx
  7. Refine breakpoints if needed

After importing the crucial libraries and dependencies, we declare and instantiate our nonlinear functions f0,f_0, f1,f_1, and f2:f_2:

# Imports
import gurobipy as grb
import numpy as np

# Nonlinear functions
f0=lambda x0: 20.0*x0-2.0*x0**2
f1=lambda x1: 16.0*x1-x1**2
f2=lambda x2: -x2**2

Next, we decide the variety of breakpoints. To construct the PWL approximations, we want to distribute them across the interval [0,5].[0,5]capmathbb{R}. Many strategies exist to do that. As an illustration, one could cluster more points near the ends and even depend on adaptive procedures as in Cherry (2016). For simplicity, we persist with uniform sampling:

lb,ub=0.0,5.0
r=4  # Variety of breakpoints
a_i=np.linspace(start=lb,stop=ub,num=r)  # Distribute breakpoints uniformly

Then, we will compute the worth of the functions fj(xj)f_j(x_j) on the chosen breakpoints to approximate the choice variables:

# Compute function values at breakpoints
f0_hat_r=f0(a_i)
f1_hat_r=f1(a_i)
f2_hat_r=f2(a_i)

The following step is to declare and instantiate a Gurobipy optimization model:

model=gp.Model()

After creating of our model, we want so as to add the choice variables. Gurobipy provides methods so as to add multiple decision variables to a model without delay, and we leverage this feature to declare and instantiate a few of our decision variables, namely the coefficients of the PWL approximations. Note that every one decision variables in our model are continuous and may be assigned to any real value inside the provided bounds. Such models may be solved efficiently using modern software packages equivalent to Gurobi.

# Decision variables
x0=model.addVar(lb=0.0,ub=5.0,name='x0',vtype=GRB.CONTINUOUS)
x1=model.addVar(lb=0.0,ub=5.0,name='x1',vtype=GRB.CONTINUOUS)
x2=model.addVar(lb=0.0,ub=5.0,name='x2',vtype=GRB.CONTINUOUS)

f0_hat=model.addVar(lb=0.0,ub=50.0,name='f0_hat',vtype=GRB.CONTINUOUS)
f1_hat=model.addVar(lb=0.0,ub=55.0,name='f1_hat',vtype=GRB.CONTINUOUS)
f2_hat=model.addVar(lb=-25.0,ub=0.0,name='f2_hat',vtype=GRB.CONTINUOUS)

theta=model.addVars(range(0,3),range(0,r),lb=0.0,ub=1.0,name='theta',vtype=GRB.CONTINUOUS)

Moreover, we want so as to add constraints that limit the values our decision variables may take. Firstly, we add the constraints that ensure the right decision of the variables that yield the approximations of the nonlinear functions:

# Constraints
model.addConstr(x0==gp.quicksum(theta[0,i]*a_i[i] for i in range(0,r)))
model.addConstr(x1==gp.quicksum(theta[1,i]*a_i[i] for i in range(0,r)))
model.addConstr(x2==gp.quicksum(theta[2,i]*a_i[i] for i in range(0,r)))

model.addConstr(f0_hat==gp.quicksum(theta[0,i]*f0_hat_r[i] for i in range(0,r)))
model.addConstr(f1_hat==gp.quicksum(theta[1,i]*f1_hat_r[i] for i in range(0,r)))
model.addConstr(f2_hat==gp.quicksum(theta[2,i]*f2_hat_r[i] for i in range(0,r)))

model.addConstr(gp.quicksum(theta[0,i] for i in range(0,r))==1)
model.addConstr(gp.quicksum(theta[1,i] for i in range(0,r))==1)
model.addConstr(gp.quicksum(theta[2,i] for i in range(0,r))==1)

Secondly, we add the constraints that outline the unique problem, namely those restrictions within the nonlinear formulation:

model.addConstr(x0+x1<=5.0)
model.addConstr(x0+x1-x2==0.0)

And eventually, we add the constraints to make sure the adjacency condition:

# Special ordered sets
model.addSOS(GRB.SOS_TYPE2,[theta[0,i] for i in range(0,r)])
model.addSOS(GRB.SOS_TYPE2,[theta[1,i] for i in range(0,r)])
model.addSOS(GRB.SOS_TYPE2,[theta[2,i] for i in range(0,r)])

All that continues to be is to define the target function, which we would really like to maximise:

model.setObjective(f0_hat+f1_hat+f2_hat,GRB.MAXIMIZE)  # Objective function

The model is now complete and we will use the Gurobi solver to retrieve the optimal value and optimal variable assignments. The code to do that is shown below:

model.optimize()  # Optimize!

obj_val=model.ObjVal

res={}

all_vars=model.getVars()
values=model.getAttr('X',all_vars)
names=model.getAttr('VarName',all_vars)

# Retrieve values of variables after optimizing
for name,val in zip(names,values):
    res[name]=val

After executing the code above and printing attributes to the screen, the output is as follows:

- Status: 2
- Variety of breakpoints: 4
- Optimal objective value: 45.00
- Optimal x0, f0(x0) value (PWL approximation): 1.67, 27.78
- Optimal x1, f1(x1) value (PWL approximation): 3.33, 42.22
- Optimal x2, f2(x2) value (PWL approximation): 5.00, -25.00
PWL approximation curves (3 segments) with optimal values (Image by the writer).

The difference from our computed solution to the optimal value of 139/3 is 4/3. Now, to extend accuracy and acquire a greater solution, one could add more breakpoints, consequently increasing the variety of segments used to approximate the nonlinear functions. As an illustration, the output below shows how increasing the variety of breakpoints to 16 results in an approximation with practically no error to the true optimal value:

- Status: 2
- Variety of breakpoints: 16
- Optimal objective value: 46.33
- Optimal x0, f0(x0) value (PWL approximation): 2.33, 35.78
- Optimal x1, f1(x1) value (PWL approximation): 2.67, 35.56
- Optimal x2, f2(x2) value (PWL approximation): 5.00, -25.00
PWL approximation curves (15 segments) with optimal values (Image by the writer).
Objective function over the feasible region with optimal value using PWL approximation curves consisting of 15 segments (Animation by the writer).

Yow will discover the whole Python code on this GitHub repository: link.


Further Readings

For a more advanced, robust algorithm that uses PWL functions to approximate the nonlinear objective function, Cherry (2016) is a helpful reference. It presents a procedure that begins with coarse approximations, that are refined iteratively using the optimal solution to the LP leisure from the previous step.

For a deeper understanding of nonlinear programming, including methods, applied examples, and a proof of why nonlinear problems are intrinsically tougher to unravel, the reader may consult with (Bradley et al., 1977).

Finally, de Farias et al. (2000) present computational experiments that use SOS constraints inside a branch-and-cut scheme to unravel a generalized task problem arising in fiber-optic cable production scheduling.


Conclusion

Nonlinear programming is a relevant area in numerical optimization — with applications starting from financial portfolio optimization to chemical process control. This text presented a procedure for tackling separable nonlinear programs by replacing nonlinear terms with PWL approximations and solving the resulting model using LP/MIP solvers. We also provided theoretical background on convex and concave functions and explained how these notions relate to nonlinear programming. With these elements, a reader should find a way to model and solve LP/MIP-compatible approximations of many nonlinear programming instances without counting on dedicated nonlinear solvers.

Thanks for reading!


References

Cherry, A., 2016. Texas Tech University. Available at: https://ttu-ir.tdl.org/server/api/core/bitstreams/2ffce0e6-87e9-4e4c-b41a-60ea670c5a13/content (Accessed: 27 January 2026).

Bradley, S. P., Hax, A. C. & Magnanti, T. L., 1977. Reading, MA: Addison-Wesley Publishing Company, pp. 410–464. Available at: https://web.mit.edu/15.053/www/AMP-Chapter-13.pdf (Accessed: 27 January 2026).

de Farias, I. R., Johnson, E. L. & Nemhauser, G. L., 2000. Mathematical Programming, 89(1), pp. 187–203. Available at: https://doi.org/10.1007/PL00011392 (Accessed: 27 January 2026).


Connect With Me

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x