MATLAB TUTORIAL for the Second Course, part 2.5: Newton's methods

Iterations for Nonlinear Systems

We extend the methods studied previously to the case of systems of nonlinear equations. Let us start with planar case and consider two functions

\[ f(x,y) = x^2 - 2\,x - y + 1, \qquad g(x,y) = x^2 + 9\,y^2 - 9 . \]

We seek a method of solution for the system of nonlinear equations

\[ f(x,y) = 0, \qquad g(x,y) = 0 . \]

By plotting implicitly these two curves, we see that the given system of equations has two roots: (0,1) and (1.88241, 0.778642).

Fixed Point Iteration

We consider the problem \( {\bf x} = {\bf g} \left( {\bf x} \right) , \) where \( {\bf g}: \, \mathbb{R}^n \mapsto \mathbb{R}^n \) is the function from n-dimensional space into the same space. The ﬁxed-point iteration is deﬁned by selecting a vector \( {\bf x}_0 \in \mathbb{R}^n \) and deﬁning a sequence \( {\bf x}_{k+1} = {\bf g} \left( {\bf x}_k \right) , \quad k=0,1,2,\ldots . \)

A point \( {\bf x} \in \mathbb{R}^n \) is a ﬁxed point of \( {\bf g} \left( {\bf x} \right) \) if and only if \( {\bf x} = {\bf g} \left( {\bf x} \right) . \) A mapping \( {\bf f} \left( {\bf x} \right) \) is a contractive mapping if there is a positive number L less than 1 such that

\[ \left\| {\bf f} \left( {\bf x} \right) - {\bf f} \left( {\bf y} \right) \right\| \le L\, \| {\bf x} - {\bf y} \| , \qquad 0 < L < 1 . \]

Theorem: Let \( {\bf g} \left( {\bf x} \right) \) be a continuous function on \( \Omega = \left\{ {\bf x} = (x_1 , x_2 , \ldots x_n ) \, : \ a_i < x_i < b_i \right\} \) such that

\( {\bf g} \left( \Omega \right) \subset \Omega ; \)
\( {\bf g} \left( {\bf x} \right) \) is a contraction.

Then

g has a ﬁxed point \( {\bf x} \in \Omega \) and the sequence \( {\bf x}_{k+1} = {\bf g} \left( {\bf x}_k \right) , \quad k=0,1,2,\ldots ; \quad {\bf x}_0 \in \Omega \) converges to x;
\( \| {\bf x}_k - {\bf x} \| \le L^k \,\| {\bf x}_0 - {\bf x} \| . \) ■

Example. We apply the fixed point iteration to find the roots of the system of nonlinear equations

\[ f(x,y) = x^2 - 2\,x - y + 1 =0, \qquad g(x,y) = x^2 + 9\,y^2 - 9 =0. \]

The first equation can be used to solve for either x or y. Depending on our choice, we can add either 18y or 2x to the second equation. So we have two formulas for iteration:

\[ \begin{cases} x &= \frac{1}{2} \left[ x^2 -y+1 \right] , \\ y &= \frac{1}{18} \left[ x^2 + 9\,y^2 -9 + 18\,y \right] , \end{cases} \qquad\mbox{or} \qquad \begin{cases} y &= \left( x -1 \right)^2 , \\ x &= \frac{1}{2} \left[ (x+1)^2 + 9\,y^2 -10 \right] . \end{cases} \]

These two system of equations are used to write the recursive formulas. Start with an initial point (0,0.5), and then compute the sequence \( \left\{ (x_k , y_k ) \right\} \) according to the formula:

\[ \begin{cases} x_{k+1} &= \frac{1}{2} \left[ x^2_k -y_k +1 \right] , \\ y_{k+1} &= \frac{1}{18} \left[ x^2_k + 9\,y^2_k -9 + 18\,y \right] . \end{cases} \]

Then we repeat calculations according to the second formula:

\[ \begin{cases} y_{k+1} &= \left( x_k -1 \right)^2 , \\ x_{k+1} &= \frac{1}{2} \left[ (x_k +1)^2 + 9\,y_k^2 -10 \right] . \end{cases} \]

x[0] = 0; y[0] = 0.5;
Do[{x[k + 1] = N[(x[k]^2 - y[k] + 1)/2], y[k + 1] = N[(x[k]^2 + 9*y[k]^2 - 9 + 18*y[k])/18]}, {k, 0, 50}]
Do[Print["i=", i, " x= ", x[i], " y= ", y[i]], {i, 0, 10}]

x[0] = 0; y[0] = 0.5;
Do[{y[k + 1] = N[(x[k] - 1)^2], x[k + 1] = N[((x[k] + 1)^2 + 9*y[k]^2 - 10)/2]}, {k, 0, 10}]
Do[Print["i=", i, " x= ", x[i], " y= ", y[i]], {i, 0, 10}]

We summarize the results of computations in the table

k	x_k	y_k	k	x_k	y_k
0	0	0.5	0	0	0.5
1	0.25	0.125	1	-3.375	1
2	0.46875	-0.363715	2	2.32031	19.1406
3	0.791721	-0.785364	3	1649.15	1.74323
4	1.20609	-0.942142	4	1.3615*10¹⁰	2.71639*10⁶
5	1.6984	-0.917512	5	3.41314*10¹³	1.85369*10¹²
6	2.40104	-0.836344	6	5.97938*10²⁶	1.16495*10²⁷
7	3.80067	-0.666331
8	8.0557	-0.141829
9	33.0181	2.9734

So we see that both sequences diverge from the fixed point (0,1).

If we use the starting value (1.5, 0.5), then

k	x_k	y_k	k	x_k	y_k
0	1.5	0.5	0	1.5	0.5
1	1.375	0.25	1	-0.75	0.25
2	1.32031	-0.113715	2	-4.6875	3.0625
3	1.42847	-0.510404	3	44.0039	32.3477
4	1.77547	-0.766785	4	5716.34	1849.34
5	2.45953	-0.797679	5	3.17342056*10⁷	3.26652*10⁷
6	3.92349	-0.643461	6	5.3050884*10¹⁵	1.00706*10¹⁵
7	8.5186	-0.0812318	7	1.86357*10³¹	2.8144*10³¹
8	36.8239	3.45355	8
9	676.774	84.2505	9

So both sequences diverge. ■

We want to determine why our iterative equations were not suitable for finding the solution near both fixed points (0, 1) and (1.88241, 0.778642). To answer this question, we need to evaluate the Jacobian matrix:

\[ {\bf J} (x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \\ \frac{\partial g}{\partial x} & \frac{\partial g}{\partial y} \end{bmatrix} . \]

Theorem: Assume that the functions f(x,y) and g(x,y) and their first partial derivatives are continuous in a region that contain the fixed point (p,q). If the starting point is chosen sufficiently close to the fixed point and if

\[ \begin{cases} \left\vert \frac{\partial f}{\partial x}(p,q) \right\vert + \left\vert \frac{\partial f}{\partial y}(p,q) \right\vert < 1, \\ \left\vert \frac{\partial g}{\partial x}(p,q) \right\vert + \left\vert \frac{\partial g}{\partial y}(p,q) \right\vert < 1, \end{cases} \]

then the iteration

\[ x_{k+1} = f(x_k , y_k ) , \qquad y_{k+1} = g(x_k , y_k ) , \qquad k=0,1,2,\ldots ; \]

converges to the fixed point (p,q). ■

Seidel Method

The Gauss--Seidel method could be extended for nonlinear systems. Suppose we have fixed point planar problem

\[ x = f(x , y ) , \qquad y = g(x , y ) . \]

We use these equations to calculate the sequence of points \( \left\{ \left( x_k , y_k \right) \right\}_{n\ge 0} \) according to Seidel iterations:

\[ x_{k+1} = f(x_k , y_k ) , \qquad y_{k+1} = g(x_{k+1} , y_k ) , \qquad k=0,1,2,\ldots . \]

The following is matlab code to solve the nonlinear fixed point system \( {\bf x} = {\bf g} \left( {\bf x} \right) , \) given one initial approximation \( {\bf p}_0 = \left( p_1^{(0)} , p_2^{(0)} , \ldots , p_n^{(0)} \right) , \) and generating a sequence \( {\bf p}_k = \left( p_1^{(k)} , p_2^{(k)} , \ldots , p_n^{(k)} \right) \) that converges to the solution x.

function [p,iter] = seidel(g,p,delta,max1)
% Input:    g is the nonlinear system saved in the m-file g.m
%           p is the initial giuess at the solution
%           delta is the error bound
%           max1 is the number of iterations
% Output:   p is the seidel approximation to the solution
%           iter is the number of iterations required
n=length(p);
for k=1:max1
    x=p;
    % x is the k-th approximation to the solution
    for j=1:n
        A=feval('g',x);
        % update the terms of x as they are calculated
        x(j)=A(j);
    end
    err=abs(norm(x-p));
    relerr=err/(norm(x)+eps);
    p=x;
    iter=k;
    if (err < delta) | (relerr < delta)
        break
    end
end

Wegstein Algorithm

Let \( {\bf g} \, : \, \mathbb{R}^n \mapsto \mathbb{R}^n \) be a continuous function. Suppose we are given a system of equations

\[ \begin{split} x_1 &= g_1 \left( x_1 , x_2 , \ldots , x_n \right) , \\ x_2 &= g_2 \left( x_1 , x_2 , \ldots , x_n \right) , \\ \vdots & \qquad \vdots \\ x_n &= g_n \left( x_1 , x_2 , \ldots , x_n \right) . \end{split} \]

Then Wegstein iteration equations can we written as

\[ \begin{split} x_1^{(k+1)} &= q_1 x_1^{(k)} + \left( 1- q_1 \right) g_1 \left( x_1^{(k)} , x_2^{(k)} , \ldots , x_n^{(k)} \right) , \\ x_2^{(k+1)} &= q_2 x_2^{(k)} + \left( 1- q_2 \right) g_2 \left( x_1^{(k)} , x_2^{(k)} , \ldots , x_n^{(k)} \right) , \\ \vdots & \qquad \vdots \\ x_n^{(k+1)} &= q_n x_n^{(k)} + \left( 1- q_n \right) g_n \left( x_1^{(k)} , x_2^{(k)} , \ldots , x_n^{(k)} \right) , \end{split} \]

where

\[ q_i = \dfrac{\partial g_i / \partial x_i}{\partial g_i / \partial x_i -1} \qquad\mbox{evaluated at} \quad \left( x_1^{(k)} , x_2^{(k)} , \ldots , x_n^{(k)} \right) , \qquad i=0,1,2,\ldots n; \]

Here again k denotes the number of the iteration.

In planar case, we have two equations

\[ x= f(x,y), \qquad y= g(x,y) . \]

The Wegstein iterations become

\[ \begin{cases} X_{k+1} &= q_1 X_k + \left( 1-q_1 \right) f\left( X_k , Y_k \right) , \\ Y_{k+1} &= q_2 Y_k + \left( 1-q_2 \right) g\left( X_k , Y_k \right) , \end{cases} \]

where

\[ q_1 = \frac{f_x}{f_x -1} , \qquad q_2 = \frac{g_y}{g_y -1} \]

are evaluated at the point \( \left( x_k , y_k \right) . \) Then

\[ \begin{cases} x_{k+1} &= q_1 x_k + \left( 1-q_1 \right) f\left( X_{k+1} , Y_{k+1} \right) , \\ y_{k+1} &= q_2 y_k + \left( 1-q_2 \right) g\left( X_{k+1} , Y_{k+1} \right) , \qquad k=0,1,2,\ldots . \end{cases} \]

A condition for Wegstein algorithm to be stable at a root is

\[ \left\vert \frac{f_x \,g_y}{\left( f_x -1 \right)\left( g_y -1 \right)} \right\vert < 1 . \]

Example. We apply the Wegstein algorithm to find the roots of the system of nonlinear equations

\[ x^2 - 2\,x - y + 1 =0, \qquad x^2 + 9\,y^2 - 9 =0. \]

By adding and subtracting x and y from each equation, respectively, we reduce the given problem to fixed point system of equations:

\[ x= f(x,y) = x^2 - x - y + 1 , \qquad y = g(x,y) = x^2 + 9\,y^2 - 9 +y . \]

We know that such system of equations has two real fixed points (0,1) and (1.88241, 0.778642). Their partial derivatives are \( f_x = 2x-1 \) and \( g_y = 18\,y +1 . \) At the first fixed point (0,1) we have

\[ \left\vert \frac{f_x \,g_y}{\left( f_x -1 \right)\left( g_y -1 \right)} \right\vert_{x=0, y=1} = \frac{19}{36} < 1 , \]

while at the another point (1.88241, 0.778642) its value is

\[ \left\vert \frac{f_x \,g_y}{\left( f_x -1 \right)\left( g_y -1 \right)} \right\vert_{x=1.88241, y=0.778642} = 1.67841 > 1 . \]

So we expect that the sequence of Wegstein iterations will converge to the fixed point (0,1), but it will diverge from another fixed point (1.88241, 0.778642).

Newton's Methods

The Newton methods have problems with the initial guess which, in general, has to be selected close to the solution. In order to avoid this problem for scalar equations we combine the bisection and Newton's method. First, we apply the bisection method to obtain a small interval that contains the root and then ﬁnish the work using Newton’s iteration.

For systems, it is known the global method called the descent method of which Newton’s iteration is a special case. Newton’s method will be applied once we get close to a root.

Let us consider the system of nonlinear algebraic equations \( {\bf f} \left( {\bf x} \right) =0 , \) where \( {\bf f} \,:\, \mathbb{R}^n \mapsto \mathbb{R}^n , \) and deﬁne the scalar multivariable function

\[ \phi ({\bf x}) = \frac{1}{2}\, \sum_{i=1}^n f_i^2 ({\bf x} ) = \frac{1}{2}\,{\bf f}^T \,{\bf f} \]

The function φ has the following properties.

\( \phi ({\bf x}) \ge 0 \) for all \( {\bf x} \in \mathbb{R}^n . \)
If x is a solution of \( {\bf f} \left( {\bf x} \right) =0 , \) then φ has a local minimum at x.
At an arbitrary point x₀, the vector \( - \nabla \phi ({\bf x}_0 ) \) is the direction of the most rapid decrease of φ.
φ has inﬁnitely many descent directions.
A direction u is descent direction for φ at x if and only if \( {\bf u}^T \nabla \phi ({\bf x}) < 0 . \)
Special descent directions:
(a) Steepest descent method: \( {\bf u}_k = -JF^T ({\bf x}_k )\, {\bf f} ({\bf x}_k ) , \)
(b) Newton’s method: \( {\bf u}_k = -JF^{-1} ({\bf x}_k )\, {\bf f} ({\bf x}_k ) . \) ■

For scalar problems \( f (x) =0 , \) the descent direction is given by \( - \phi' (x) = -2\,f(x) \,f' (x) \) while Newton’s method yields \( -f(x) / f' (x) \) which has the same sign as the steepest descent method.

The question that arises for all descent methods is how far should one go in a given descent direction. To answer this question, there are several techniques and conditions to guarantee convergence to a minimum of φ. For a detailed discussion consult the book by

Numerical Methods for Unconstrained Optimization and Nonlinear Equations, by J. E. Dennis, Jr. and Robert B. Schnabel. Published: 1996, ISBN: 978-0-89871-364-0
http://www.math.vt.edu/people/adjerids/homepage/teaching/F09/Math5465/chapter2.pdf

The following is a Newton's realization of matlab code to solve the nonlinear systems \( {\bf f} \left( {\bf x} \right) = {\bf 0} , \) given one initial approximation \( {\bf p}_0 = \left( p_1^{(0)} , p_2^{(0)} , \ldots , p_n^{(0)} \right) , \) and generating a sequence \( {\bf p}_k = \left( p_1^{(k)} , p_2^{(k)} , \ldots , p_n^{(k)} \right) \) that converges to the solution x.

function [p,iter, err] = newdim(F,JF,p,delta,epsilon,max1)
% Input:    F is the nonlinear system saved in the m-file F.m
%           JF is the Jacobian of F saved as the m-file JF.m
%           p is the initial giuess at the solution
%           delta is the error bound
%           epsilon is the tolerance for F(p)
%           max1 is the number of iterations
% Output:   p is the seidel approximation to the solution
%           iter is the number of iterations required
%           err is the error estimate for p
Y=feval(F,p);
for k=1:max1
    J=feval(JF,p);
    Q=p-(J\Y')';
    Z=feval(F,Q);
    err=norm(Q-p);
    relerr=err/(norm(Q)+epsilon);
    p=Q;
    Y=Z;
    iter=k;
    if (err < delta)|(relerr < delta)|(abs(Y) < epsilon)
        break
    end
end


Complete

MATLAB TUTORIAL for the Second Course. Part 2.5: Newton's Methods

Fixed Point Iterations

Elementary Row Operations

LU Factorization

PLU Factorization

Reflection

Givens rotation

Row Space and Column Space

Null Space or Kernel

Rank and Nullity

Square Matrices

Cramer's rule

Symmetric and self-adjoint matrices

Cholesky decomposition

Projection Operators

Gram-Schmidt process

QR-decomposition

Least Squares Approximation

SVD factorization

Numerical Methods

Markov Chains

Inequalities

Applications