The direct methods (such as Gauss--Jordan) for solving linear systems lead to exact solutions in a finite number of steps, with
guaranteed convergence. They are computationally demanding, especially for large
systems, and necessitate a huge amount of memory. However, given systems of equations are often subject to errors due to roundoff and other
factors. The Gaussian elimination is sensitive to roundoff errors, and this sensitivity can lead to inaccurate or even wildly wrong answers. Also, even if Gaussian elimination does not go astray, we cannot improve on a solution once we have found it. For
example, if we use Gaussian elimination to calculate a solution to four decimal places,
there is no way to obtain the solution to eight decimal places except to start over again
and work with increased accuracy (we do not discuss this topic and closely related rate of convergence at this moment, which is related to numerical methods).
In contrast, we can achieve additional accuracy with iterative methods simply by
doing more iterations.
The methods, starting from an initial guess x^{(0)}, iteratively apply some formulas to detect the solution of the system. For this reason, these methods are often indicated as iterative methods.
In this section, we explore one-step iterative methods that may be beneficial. When applied, they proceed iteratively by successively generating sequences of vectors that approach a solution to a linear system. In many instances (such as when the coefficient matrix is sparse, that is, contains many zero entries), iterative methods can be faster and more accurate than direct methods. Furthermore, the amount of memory required for
a sparse coefficient n×n matrix A is proportional to n. Also, iterative methods can be stopped whenever the approximate solution they generate is sufficiently accurate.
To demonstrate applications of iterative methods in linear algebra,
we consider the system of n linear equations with n unknowns x_{1}, x_{2}, … , x_{n} (here n is a positive integer):
where coefficients 𝑎_{i,j} of the system and entries b_{i} are given (known) scalars, which we assume to be real numbers. It is convenient to rewrite the system in compact matrix/vector form:
where f : ℂ^{n×1} ↦ ℂ^{n×1} is some vector-valued function. The process of transferring linear matrix/vector equation \eqref{EqIterate.1} into fixed point form \eqref{EqIterate.2} is known as preprocessing.
A fixed point of a function f : X ↦ Y is a point x ∈ X that is mapped onto itself. That is, x = f(x).
Not every function has a fixed point, but when f is linear, then equation .\eqref{EqIterate.2} has the fixed point---zero vector.
Let (X, ∥ · ∥) be a normed space. A function f : X ↦ X is said to be a contraction on X if there exists a positive real number α < 1 such that
\[
\parallel f(x) - f(y) \parallel \leqslant \alpha\,\parallel x - y \parallel , \qquad \forall\ x, y \in X .
\]
Banach contraction principle is a crucial mathematical result that guarantees the
existence of a fixed point for contraction mapping defined on a Banach space. Essentially, it provides a powerful mathematical instrument for establishing the existence and uniqueness of solutions in a variety of contexts, including optimization problems, differential equations, and iterative numerical approaches. The importance of this theorem stems from its broad applicability across mathematics as well as its function in illustrating the convergence of iterative algorithms in solving real-world
issues.
Banach fixed point Theorem:
Let (X, ∥ · ∥) be a normed space and
f : X ↦ X be a contraction on X. Then . f has exactly one fixed point.
We will start the proof by defining a Cauchy sequence .{x_{n}} in X using the function f. As X is complete, {x_{n}} will converge to a point x ∈ X. We will show that
x is the unique fixed point for f in X. Choose an
x₀ ∈ X and define
\[
x_{n} = f(x_{n-1}) , \qquad n=1,2,3,\ldots .
\]
First we will show that this sequence is Cauchy. As . f is a contraction, we have
\[
\| x_{n+1} - x_n \| = \| f(x_n ) - f(x_{n-1} \| \le \alpha \,\| x_{n} - x_{n-1} \| .
\]
We repeat the same inequlity and obtain
\[
\| x_{n+1} - x_n \| \le \alpha^n \| x_1 - x_0 \| .
\]
By Triangle inequality and summation formula for geometric series, for n > m, we have
\begin{align*}
\| x_m - x_n \| &\le \| x_m - x_{m+1} \| + \| x_{m=1} - x_{m+2} \| + \cdots + \| x_{n-1} - x_n \|
\\
&\le \left( \alpha^m + \alpha^{m-1} + \cdots + \alpha^{n-1} \right) \| x_1 - x_1 \|
&= \alpha^m \frac{1 - \alpha^{n-m}}{1-\alpha} \, x_{n-1} - x_n \|
\end{align*}
As α is a positive real number with α < 1, we have .1 − α^{n−m} < 1 and hence for all n > m
\[
\| x_m - x_n \| \le \frac{\alpha^m}{1-\alpha} \, \| x_1 - x_0 \| .
\]
Since .0 < α < 1 and ∥x₁ − x₀∥ is fixed, if we choose m sufficiently large, we can
make .∥x_{m} − x_{n}∥ as small as possible. Thus .{x_{n}} is Cauchy and as X is complete
x_{n} → x ∈ X.
Now we will prove that x is the only fixed point of f. Suppose that there exists another fixed point for f in X, say y. That is, we have . f(y) = y. Then
\[
\| x- y \| = \| f(x) - f(y) \| \le \alpha \, \| x-y \|
\]
and this implies ∥x − y∥ = 0. Thus .x =
ỹ and hence f has exactly one fixed point.
Example 1:
Consider f : [−1, 1] → [−1, 1] defined by f(x) = cosx. Choose x₀ = 0.7 (an initial approximation). Now, define .x_{n+1} = cos x_{n}.
Proceeding like this, we can approximate the fixed point of f(x)
= cos x as x ≈
0.739 after a certain number of iterations. The more the number of iterations, the
less will be the error associated with it. Keep in mind that any initial point will give
us the fixed point, however, the number of iterations required may vary.
Roger codes:
■
End of Example 1
If equation \eqref{EqIterate.2} has a fixed point, we can organize an iterating process upon choosing an initial guess x^{0}:
If this sequence x^{(k)} converges, we expect that its limit is a solution not only Eq.\eqref{EqIterate.2}, but also the original linear algebra problem \eqref{EqIterate.1}.
It is our hope that the iterative formula above provides a sequence of numbers { x^{(k)} } that converges to the true solution x = c. However, we can stop iteration when we want to. There are known some sufficient conditions on function f that guarantee convergence of fixed point iteration procedure.
If we consider a system of linear equations A x = b, the starting solution x^{(0)},
and the approximated solution at the k-th step x^{(k)}, the solution x = c of the system A x = b, then an approximated method is said to converge to the solution of the system when
\[
\lim_{k\to\infty} \left\| {\bf x}^{(k)} - {\bf c} \right\| = {\bf 0} .
\]
On the contrary, if the above limit does not exist, then the method is said to diverge.
When a sequence of errors e_{k} = x^{(k)} − c convergences fast enough to zero,
iterative methods are competitive with direct methods, which require much
more memory and have a higher computational complexity.
There are several references on iterative methods to solve a system of linear
equations; for example, P.G. Ciarlet, J.J. Dongarra et al., C.T. Kelley, G. Meurant, and Y. Saad.
There are inﬁnite many ways to introduce an equivalent ﬁxed point
problem for the given equation (1) or (2).
In order to apply an iterative method for solving system of equations (1), or its equivalent compact form (2), we need to rearrange this equation to the fixed point form. This can be achieved by splitting the matrix A = S + (A − S), where S is a nonsingular matrix. S is also called a preconditioner, and its choice is crucial in numerical analysis. Then
the equation A x = b is the same as S x = b + (S −A) x. Application of the inverse matrix
S^{−1} yields the fixed point equation, which becomes
There is no guarantee that this method will work. A successful splitting A = S + (A −I) satisfies two different requirements:
The new vector x^{(k+1)} should be easy to compute. Hence, S should be a simple
(and invertible!) matrix; it may be diagonal or triangular.
The sequence { x^{(k)} } should converge to the true solution x = c. If we subtract the iteration
in equation (5) from the true equation x = S^{−1}b − S^{−1}M x, the result is a formula involving only the errors e_{k} = c − x^{(k)}:
matrix I − S^{−1}A is called the iteration matrix.
Equation (6) is a difference equation of first order. It starts with the initial error e₀ and after k steps it produces the new error e_{k}. The question of convergence is exactly
the same as the question of stability: x^{(k)} → c exactly when e_{k} → 0.
If ∥ · ∥ denotes some vector norm (this topic is covered in Part 5 of this tutorial) on
ℝ^{n} and
the corresponding matrix norm of (−S^{−1}M) is less than 1, we claim that x^{(k)} − c → 0 as
k → ∞. Indeed,
Thus, if ∥ (−S^{−1}M) ∥ = ∥ S^{−1}M ∥ < 1, then ∥ x^{(k)} − c ∥ → 0 as k → ∞.
The foregoing result holds for any standard norm on ℝ^{n} because they all equivalent.
Theorem 2:
Let x^{(0)} be an arbitrary vector in ℝ^{n} and deﬁne x^{(k+1)} = S^{−1}b − S^{−1}Mx^{(k)} for k = 0, 1, 2,, …. If x = c is the solution to Eq.(4), then a necessary and sufficient condition for x^{(k)} → c is that all eigenvalues of matrix S^{−1}M be less than 1, i.e., | λ | < 1.
In other words,
the iterative method in equation (5) is convergent if and only if every
eigenvalue of S^{−1}M satisfies | λ | < 1. Its rate of convergence depends on the maximum size of eigenvalues of matrix S^{−1}M. So the largest eigenvalue will eventually be dominant.
.
We will prove the theorem only in the case where B = −S^{−1}M has n linearly independent eigenvectors. The case where B is not diagonalizable is later. If u_{1}, u_{2}, … , u_{n} are n linearly independent eigenvectors of B, we can write
\[
{\bf x}^{(k)} - {\bf c} = c_1 {\bf u}_1 + c_2 {\bf u}_2 + \cdots + c_n {\bf u}_n
\]
and it follows from x^{(k)} − c = B^{k}(x^{(0)} − c) that
\begin{align*}
{\bf x}^{(k)} - {\bf c} &= {\bf B}^k \left( c_1 {\bf u}_1 + c_2 {\bf u}_2 + \cdots + c_n {\bf u}_n \right)
\\
&= c_1 \lambda_1^k {\bf u}_1 + c_2 \lambda_2^k {\bf u}_2 + \cdots + c_n \lambda_n^k {\bf u}_n .
\end{align*}
Hence, x^{(k)} − c → 0 if and only if
| λ_{i} | < 1 for i = 1, 2, … , n.
In this section, we discuss three versions of stationary iterative method: Jacobi's scheme, the Gauss--Seidel method, and the SOR method.
Since they converge slowly, stationary iterative methods are no longer
used alone, but rather used within other algorithms.
In this exercise, you will see that the order of the equations in a system can affect the performance of an iterative solution method. Upon swapping the order of equations in the previous exercise, we get
\[
\begin{split}
2\,x_1 + 6\, x_2 &= -16 ,
\\
5\, x_1 + x_2 &= 2 .
\end{split}
\]
Compute five iterations with both the Jacobi and Gauss--Seidel methods using zero as an initial guess.
Show that if a 2 x 2 coefficient matrix
\[
{\bf A} = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \end{bmatrix}
\]
is strictly diagonally dominant, then the eigenvalues of the iteration matrices B corresponding to the Jacobi and Gauss-Seidel methods are of magnitude < 1 (which guarantees convergence of iteration methods).
Apply Jacobi's iteration scheme to the system
\[
\begin{split}
x_1 + x_2 & = 0 ,
\\
x_1 - x_2 &= 2 .
\end{split}
\]
Do you think that the iteration method converges to the true solution x₁ = 1, x₂ = −1?
Apply Jacobi's iteration scheme for solving
\[
\begin{bmatrix} 2 & 1 \\ 2 & 3 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2
\end{pmatrix} = \begin{pmatrix} \phantom{-}1 \\ -1 \end{pmatrix} .
\]
Recall that norm of the error is ∥e^{(k)}∥ =
∥Be^{(k-1)}∥ = |λ_{max}| ∥e^{(k-1)}∥, where λ_{max} is the largest (in magnitude) eigenvalue of iteration matrix B. In general (for larger systems), this approximate relationship often is not the case for the first few iterations). Show that the error for this system tends to zero.
Use the SOR method to solve the system
\[
\begin{bmatrix} 2 & 1 \\ 3 & 4 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2
\end{pmatrix} = \begin{pmatrix} \phantom{-}1 \\ -1 \end{pmatrix} .
\]
In the previous exercise, find the characteristic equation for the corresponding matrix B depending on parameter ω. What are its eigenvalues?
Solve the system of equations with a direct method:
\[
\begin{cases}
5 x - 2 y + 3 z &= 8,
\\
3 x + 8 y + 2 z &= 23,
\\
2 x + y + 6 z &= 2.
\end{cases}
\]
In order to understand differences and relative advantages among st the three iterative methods, perform 8 iterations using Jacobi's scheme, the Gauss--Seidel method, and SOR iteration procedure.
Let us consider the following system of linear equations:
\[
\begin{cases}
5 x - 2 y + z &= 2,
\\
9 x -3 y + 2 z &= 7,
\\
4 x + y + z &= 9.
\end{cases}
\]
Show that this system cannot
be tackled by means of Jacobi’s nor Gauss-Seidel’s methods. On the other hand, a
tuning of ω can lead to the detection of a good approximation of the solution.
Consider the system of equations
\[
\begin{bmatrix} 1 & -5 & 2 \\ 7 & 1 & 3 \\ 2 & 1 & 8 \end{bmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} -17 \\ 11 \\ -9 \end{pmatrix} .
\]
Show that the given system is not strictly diagonally dominant. Nonetheless, if we swap the first and second
equations, we obtain an equivalent system whose associated matrix is suitable for Jacobi's iteration scheme. Show this by performing 10 iterations.
Ciarlet, P.G., Introduction to Numerical Linear Algebra and Optimization.
Cambridge University Press, Cambridge, 1989.
Dongarra, J.J., Duff, I.S., Sorensen, D.C., and H. van der Vorst, H., Numerical
Linear Algebra for High-Performance Computers. SIAM, 1998.
Gauss, Carl, (1903), Werke (in German), vol. 9, Göttingen: Köninglichen Gesellschaft der Wissenschaften. direct link
Golub, Gene H.; Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Baltimore: Johns Hopkins, ISBN 978-0-8018-5414-9.
Kahan, W., Gauss–Seidel methods of solving large systems of linear equations, Doctoral Thesis, University of Toronto,
Toronto, Canada, 1958.
Seidel, Ludwig (1874). "Über ein Verfahren, die Gleichungen, auf welche die Methode der kleinsten Quadrate führt, sowie lineäre Gleichungen überhaupt, durch successive Annäherung aufzulösen" [On a process for solving by successive approximation the equations to which the method of least squares leads as well as linear equations generally]. Abhandlungen der Mathematisch-Physikalischen Klasse der Königlich Bayerischen Akademie der Wissenschaften (in German). 11 (3): 81–108.
Strang, G., Introduction to Linear Algebra, Wellesley-Cambridge Press, 5-th edition, 2016.