The direct methods (such as Gauss--Jordan) for solving linear systems lead to exact solutions in a finite number of steps, with guaranteed convergence. They are computationally demanding, especially for large systems, and necessitate a huge amount of memory. However, given systems of equations are often subject to errors due to roundoff and other factors. The Gaussian elimination is sensitive to roundoff errors, and this sensitivity can lead to inaccurate or even wildly wrong answers. Also, even if Gaussian elimination does not go astray, we cannot improve on a solution once we have found it. For example, if we use Gaussian elimination to calculate a solution to four decimal places, there is no way to obtain the solution to eight decimal places except to start over again and work with increased accuracy (we do not discuss this topic and closely related rate of convergence at this moment, which is related to numerical methods). In contrast, we can achieve additional accuracy with iterative methods simply by doing more iterations.

The methods, starting from an initial guess x(0), iteratively apply some formulas to detect the solution of the system. For this reason, these methods are often indicated as iterative methods. In this section, we explore one-step iterative methods that may be beneficial. When applied, they proceed iteratively by successively generating sequences of vectors that approach a solution to a linear system. In many instances (such as when the coefficient matrix is sparse, that is, contains many zero entries), iterative methods can be faster and more accurate than direct methods. Furthermore, the amount of memory required for a sparse coefficient n×n matrix A is proportional to n. Also, iterative methods can be stopped whenever the approximate solution they generate is sufficiently accurate.

# Iterative methods

To demonstrate applications of iterative methods in linear algebra, we consider the system of n linear equations with n unknowns x1, x2, … , xn (here n is a positive integer):
$\begin{split} a_{1,1} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n &= b_1 , \\ a_{2,1} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n &= b_2 , \\ \vdots \qquad \vdots \qquad \vdots & \vdots \\ a_{n,1} x_1 + a_{n,2} x_2 + \cdots + a_{n,n} x_n &= b_n , \end{split}$
where coefficients 𝑎i,j of the system and entries bi are given (known) scalars, which we assume to be real numbers. It is convenient to rewrite the system in compact matrix/vector form:
$$\label{EqIterate.1} {\bf A}\,{\bf x} = {\bf b} ,$$
where
${\bf A} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & \cdots & a_{n,n} \end{bmatrix} \in \mathbb{R}^{n \times n}, \qquad {\bf b} = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix} , \qquad {\bf x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \in \mathbb{R}^{n \times 1} .$
Eq.\eqref{EqIterate.1} is not suitable for iteration and it should be reformulated in fixed point form:
$$\label{EqIterate.2} {\bf x} = f(\mathbf{x}) ,$$
where f : ℂn×1 ↦ ℂn×1 is some vector-valued function. The process of transferring linear matrix/vector equation \eqref{EqIterate.1} into fixed point form \eqref{EqIterate.2} is known as preprocessing.
A fixed point of a function f : XY is a point xX that is mapped onto itself. That is, x = f(x).
Not every function has a fixed point, but when f is linear, then equation .\eqref{EqIterate.2} has the fixed point---zero vector.
Let (X, ∥ · ∥) be a normed space. A function f : XX is said to be a contraction on X if there exists a positive real number α < 1 such that $\parallel f(x) - f(y) \parallel \leqslant \alpha\,\parallel x - y \parallel , \qquad \forall\ x, y \in X .$
Banach contraction principle is a crucial mathematical result that guarantees the existence of a fixed point for contraction mapping defined on a Banach space. Essentially, it provides a powerful mathematical instrument for establishing the existence and uniqueness of solutions in a variety of contexts, including optimization problems, differential equations, and iterative numerical approaches. The importance of this theorem stems from its broad applicability across mathematics as well as its function in illustrating the convergence of iterative algorithms in solving real-world issues.
Banach fixed point Theorem: Let (X, ∥ · ∥) be a normed space and f : XX be a contraction on X. Then . f has exactly one fixed point.
We will start the proof by defining a Cauchy sequence .{xn} in X using the function f. As X is complete, {xn} will converge to a point xX. We will show that x is the unique fixed point for f in X. Choose an x₀ ∈ X and define $x_{n} = f(x_{n-1}) , \qquad n=1,2,3,\ldots .$ First we will show that this sequence is Cauchy. As . f is a contraction, we have $\| x_{n+1} - x_n \| = \| f(x_n ) - f(x_{n-1} \| \le \alpha \,\| x_{n} - x_{n-1} \| .$ We repeat the same inequlity and obtain $\| x_{n+1} - x_n \| \le \alpha^n \| x_1 - x_0 \| .$ By Triangle inequality and summation formula for geometric series, for n > m, we have \begin{align*} \| x_m - x_n \| &\le \| x_m - x_{m+1} \| + \| x_{m=1} - x_{m+2} \| + \cdots + \| x_{n-1} - x_n \| \\ &\le \left( \alpha^m + \alpha^{m-1} + \cdots + \alpha^{n-1} \right) \| x_1 - x_1 \| &= \alpha^m \frac{1 - \alpha^{n-m}}{1-\alpha} \, x_{n-1} - x_n \| \end{align*} As α is a positive real number with α < 1, we have .1 − αn−m < 1 and hence for all n > m $\| x_m - x_n \| \le \frac{\alpha^m}{1-\alpha} \, \| x_1 - x_0 \| .$ Since .0 < α < 1 and ∥x₁ − x₀∥ is fixed, if we choose m sufficiently large, we can make .∥xmxn∥ as small as possible. Thus .{xn} is Cauchy and as X is complete xnxX.

Now we will prove that x is the only fixed point of f. Suppose that there exists another fixed point for f in X, say y. That is, we have . f(y) = y. Then $\| x- y \| = \| f(x) - f(y) \| \le \alpha \, \| x-y \|$ and this implies ∥xy∥ = 0. Thus .x = ỹ and hence f has exactly one fixed point.

Example 1: Consider f : [−1, 1] → [−1, 1] defined by f(x) = cosx. Choose x₀ = 0.7 (an initial approximation). Now, define .xn+1 = cos xn. Proceeding like this, we can approximate the fixed point of f(x) = cos x as x ≈ 0.739 after a certain number of iterations. The more the number of iterations, the less will be the error associated with it. Keep in mind that any initial point will give us the fixed point, however, the number of iterations required may vary.

Roger codes:    ■

End of Example 1

If equation \eqref{EqIterate.2} has a fixed point, we can organize an iterating process upon choosing an initial guess x0:

$$\label{EqIterate.3} {\bf x}^{(k+1)} = f\left( \mathbf{x}^{(k)} \right) , \qquad k=0,1,2,\ldots .$$
If this sequence x(k) converges, we expect that its limit is a solution not only Eq.\eqref{EqIterate.2}, but also the original linear algebra problem \eqref{EqIterate.1}.

It is our hope that the iterative formula above provides a sequence of numbers { x(k) } that converges to the true solution x = c. However, we can stop iteration when we want to. There are known some sufficient conditions on function f that guarantee convergence of fixed point iteration procedure.

If we consider a system of linear equations A x = b, the starting solution x(0), and the approximated solution at the k-th step x(k), the solution x = c of the system A x = b, then an approximated method is said to converge to the solution of the system when $\lim_{k\to\infty} \left\| {\bf x}^{(k)} - {\bf c} \right\| = {\bf 0} .$ On the contrary, if the above limit does not exist, then the method is said to diverge.

When a sequence of errors ek = x(k)c convergences fast enough to zero, iterative methods are competitive with direct methods, which require much more memory and have a higher computational complexity. There are several references on iterative methods to solve a system of linear equations; for example, P.G. Ciarlet, J.J. Dongarra et al., C.T. Kelley, G. Meurant, and Y. Saad.

There are inﬁnite many ways to introduce an equivalent ﬁxed point problem for the given equation (1) or (2). In order to apply an iterative method for solving system of equations (1), or its equivalent compact form (2), we need to rearrange this equation to the fixed point form. This can be achieved by splitting the matrix A = S + (AS), where S is a nonsingular matrix. S is also called a preconditioner, and its choice is crucial in numerical analysis. Then the equation A x = b is the same as S x = b + (SA) x. Application of the inverse matrix S−1 yields the fixed point equation, which becomes

$$\label{EqJacobi.4} {\bf x} = {\bf S}^{-1} {\bf b} + \left( \mathbf{I} - {\bf S}^{-1} {\bf A} \right) {\bf x} , \qquad \mathbf{B} = \mathbf{I} - {\bf S}^{-1} {\bf A} ,$$
where I is the identity matrix and B = IS−1A is the iteration matrix. Therefore, we can try one-step iteration from x(k) to x(k+1):
$$\label{EqJacobi.5} {\bf x}^{(k+1)} = {\bf S}^{-1} {\bf b} + \left( \mathbf{I} - {\bf S}^{-1} {\bf A} \right) {\bf x}^{(k)} , \qquad k=0,1,2,\ldots .$$
There is no guarantee that this method will work. A successful splitting A = S + (AI) satisfies two different requirements:
1. The new vector x(k+1) should be easy to compute. Hence, S should be a simple (and invertible!) matrix; it may be diagonal or triangular.
2. The sequence { x(k) } should converge to the true solution x = c. If we subtract the iteration in equation (5) from the true equation x = S−1bS−1M x, the result is a formula involving only the errors ek = cx(k):
$$\label{EqJacobi.6} {\bf e}_{k+1} = \left( \mathbf{I} - {\bf S}^{-1} {\bf A} \right) {\bf e}_k = \left( \mathbf{I} - {\bf S}^{-1} {\bf A} \right)^{k+1} {\bf e}_0 , \qquad k=0,1,2,\ldots .$$
matrix IS−1A is called the iteration matrix. Equation (6) is a difference equation of first order. It starts with the initial error e₀ and after k steps it produces the new error ek. The question of convergence is exactly the same as the question of stability: x(k)c exactly when ek0.

If ∥ · ∥ denotes some vector norm (this topic is covered in Part 5 of this tutorial) on ℝn and the corresponding matrix norm of (−S−1M) is less than 1, we claim that x(k)c → 0 as k → ∞. Indeed,

\begin{align*} {\bf x}^{(1)} - {\bf c} &= \left( {\bf B} \right) {\bf x}^{(0)} + \left( {\bf S}^{-1} {\bf M}{\bf c} \right) - \left( \left( -{\bf S}^{-1} {\bf M} \right) {\bf c} + {\bf c} \right) = \left( -{\bf S}^{-1} {\bf M} \right) \left( {\bf x}^{(0)} - {\bf c} \right) , \\ {\bf x}^{(2)} - {\bf c} &= \left( \left( -{\bf S}^{-1} {\bf M} \right) {\bf x}^{(1)} + {\bf c} \right) - \left( \left( -{\bf S}^{-1} {\bf M} \right) {\bf c} + {\bf c} \right) = \left( -{\bf S}^{-1} {\bf M} \right) \left( {\bf x}^{(1)} - {\bf c} \right) \\ &= \left( {\bf B} \right)^2 \left( {\bf x}^{(0)} - {\bf c} \right) , \end{align*}
and so on. In general,
${\bf x}^{(k)} - {\bf c} = \left( -{\bf S}^{-1} {\bf M} \right)^k \left( {\bf x}^{(0)} - {\bf c} \right) , \qquad k=0,1,2,\ldots ,$
and hence
\begin{align*} \left\| {\bf x}^{(k)} - {\bf c} \right\| &= \left\|\left( -{\bf S}^{-1} {\bf M} \right)^k \left( {\bf x}^{(0)} - {\bf c} \right) \right\| \\ &\le \left\| \left( -{\bf S}^{-1} {\bf M} \right)^k \right\| \left\| {\bf x}^{(0)} - {\bf c} \right\| \\ & \le \left\| \left( -{\bf S}^{-1} {\bf M} \right) \right\|^k \left\| {\bf x}^{(0)} - {\bf c} \right\| . \end{align*}
Thus, if ∥ (−S−1M) ∥ = ∥ S−1M ∥ < 1, then ∥ x(k)c ∥ → 0 as k → ∞.

The foregoing result holds for any standard norm on ℝn because they all equivalent.

Theorem 2: Let x(0) be an arbitrary vector in ℝn and deﬁne x(k+1) = S−1bS−1Mx(k) for k = 0, 1, 2,, …. If x = c is the solution to Eq.(4), then a necessary and sufficient condition for x(k)c is that all eigenvalues of matrix S−1M be less than 1, i.e., | λ | < 1.

In other words, the iterative method in equation (5) is convergent if and only if every eigenvalue of S−1M satisfies | λ | < 1. Its rate of convergence depends on the maximum size of eigenvalues of matrix S−1M. So the largest eigenvalue will eventually be dominant.

.

We will prove the theorem only in the case where B = −S−1M has n linearly independent eigenvectors. The case where B is not diagonalizable is later. If u1, u2, … , un are n linearly independent eigenvectors of B, we can write ${\bf x}^{(k)} - {\bf c} = c_1 {\bf u}_1 + c_2 {\bf u}_2 + \cdots + c_n {\bf u}_n$ and it follows from x(k)c = Bk(x(0)c) that \begin{align*} {\bf x}^{(k)} - {\bf c} &= {\bf B}^k \left( c_1 {\bf u}_1 + c_2 {\bf u}_2 + \cdots + c_n {\bf u}_n \right) \\ &= c_1 \lambda_1^k {\bf u}_1 + c_2 \lambda_2^k {\bf u}_2 + \cdots + c_n \lambda_n^k {\bf u}_n . \end{align*} Hence, x(k)c0 if and only if | λi | < 1 for i = 1, 2, … , n.
In this section, we discuss three versions of stationary iterative method: Jacobi's scheme, the Gauss--Seidel method, and the SOR method. Since they converge slowly, stationary iterative methods are no longer used alone, but rather used within other algorithms.

1. Let ${\bf A} = \begin{bmatrix} 5 & 1 \\ 2 & 6 \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} \phantom{-}2 \\ -16 \end{bmatrix} , \qquad {\bf x}^{(0)} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} .$ Use Jacobi iteration to compute x(1) and x(2).
2. In this exercise, you will see that the order of the equations in a system can affect the performance of an iterative solution method. Upon swapping the order of equations in the previous exercise, we get $\begin{split} 2\,x_1 + 6\, x_2 &= -16 , \\ 5\, x_1 + x_2 &= 2 . \end{split}$ Compute five iterations with both the Jacobi and Gauss--Seidel methods using zero as an initial guess.
3. Show that if a 2 x 2 coefficient matrix ${\bf A} = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \end{bmatrix}$ is strictly diagonally dominant, then the eigenvalues of the iteration matrices B corresponding to the Jacobi and Gauss-Seidel methods are of magnitude < 1 (which guarantees convergence of iteration methods).
4. Apply Jacobi's iteration scheme to the system $\begin{split} x_1 + x_2 & = 0 , \\ x_1 - x_2 &= 2 . \end{split}$ Do you think that the iteration method converges to the true solution x₁ = 1, x₂ = −1?
5. Apply Jacobi's iteration scheme for solving $\begin{bmatrix} 2 & 1 \\ 2 & 3 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} \phantom{-}1 \\ -1 \end{pmatrix} .$ Recall that norm of the error is ∥e(k)∥ = ∥Be(k-1)∥ = |λmax| ∥e(k-1)∥, where λmax is the largest (in magnitude) eigenvalue of iteration matrix B. In general (for larger systems), this approximate relationship often is not the case for the first few iterations). Show that the error for this system tends to zero.
6. Use the SOR method to solve the system $\begin{bmatrix} 2 & 1 \\ 3 & 4 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} \phantom{-}1 \\ -1 \end{pmatrix} .$
7. In the previous exercise, find the characteristic equation for the corresponding matrix B depending on parameter ω. What are its eigenvalues?
8. Solve the system of equations with a direct method: $\begin{cases} 5 x - 2 y + 3 z &= 8, \\ 3 x + 8 y + 2 z &= 23, \\ 2 x + y + 6 z &= 2. \end{cases}$ In order to understand differences and relative advantages among st the three iterative methods, perform 8 iterations using Jacobi's scheme, the Gauss--Seidel method, and SOR iteration procedure.
9. Let us consider the following system of linear equations: $\begin{cases} 5 x - 2 y + z &= 2, \\ 9 x -3 y + 2 z &= 7, \\ 4 x + y + z &= 9. \end{cases}$ Show that this system cannot be tackled by means of Jacobi’s nor Gauss-Seidel’s methods. On the other hand, a tuning of ω can lead to the detection of a good approximation of the solution.
10. Consider the system of equations $\begin{bmatrix} 1 & -5 & 2 \\ 7 & 1 & 3 \\ 2 & 1 & 8 \end{bmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} -17 \\ 11 \\ -9 \end{pmatrix} .$ Show that the given system is not strictly diagonally dominant. Nonetheless, if we swap the first and second equations, we obtain an equivalent system whose associated matrix is suitable for Jacobi's iteration scheme. Show this by performing 10 iterations.

1. Ciarlet, P.G., Introduction to Numerical Linear Algebra and Optimization. Cambridge University Press, Cambridge, 1989.
2. Dongarra, J.J., Duff, I.S., Sorensen, D.C., and H. van der Vorst, H., Numerical Linear Algebra for High-Performance Computers. SIAM, 1998.
3. Gauss, Carl, (1903), Werke (in German), vol. 9, Göttingen: Köninglichen Gesellschaft der Wissenschaften. direct link
4. Golub, Gene H.; Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Baltimore: Johns Hopkins, ISBN 978-0-8018-5414-9.
5. Kahan, W., Gauss–Seidel methods of solving large systems of linear equations, Doctoral Thesis, University of Toronto, Toronto, Canada, 1958.
6. Karunanithi, S., Gajalakshmi, N., Malarvizhi, M., Saileshwari, M., A Study on Comparison of Jacobi, Gauss-Seidel and SOR Methods for the Solution in System of Linear Equations, International Journal of Mathematics Trends and Technology (IJMTT) – Volume 56 Issue 4- April 2018.
7. Kelley, C.T., Iterative Methods for Linear and Nonlinear Equations. SIAM, USA, 1995.
8. Meurant, G., Computer solution of large linear systems. North Holland, Amsterdam, 1999.
9. Saad, Y., Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, Philadelphia, PA, 2003.
10. Seidel, Ludwig (1874). "Über ein Verfahren, die Gleichungen, auf welche die Methode der kleinsten Quadrate führt, sowie lineäre Gleichungen überhaupt, durch successive Annäherung aufzulösen" [On a process for solving by successive approximation the equations to which the method of least squares leads as well as linear equations generally]. Abhandlungen der Mathematisch-Physikalischen Klasse der Königlich Bayerischen Akademie der Wissenschaften (in German). 11 (3): 81–108.
11. Strang, G., Introduction to Linear Algebra, ‎ Wellesley-Cambridge Press, 5-th edition, 2016.
12. Strong, D.M., Iterative Methods for Solving Ax = b - Gauss--Seidel Method, MAA.
13. Strong, D.M., Iterative Methods for Solving Ax = b - The SOR Method, MAA.