Computer solves Systems of Linear Equations

There is often a gap between mathematical theory and its practical implementation---Gaussian elimination and modification of a matrix to the reduced row echelon form being good examples. Computers usually operate with floating point approximations to real numbers. A number is, in general, represented approximately to a fixed number of significant digits (the significand or mantissa) and scaled using an exponent in some fixed base, normally two. For example, the rational number 4.5 is converted to single binary normalized floating format as follows

\[ 4.5 = 100.1_{(2)} = 1.001_{(2)} \times 2^2 , \]
where .001 is the mantissa and 2 is the exponent with respect to base 2. A well known number π is represented as 3.1415926, where 0.1415926 is the mantissa. The double precision format has 52 bits for significand (1 represents implied bit), 10 bits for exponent and 1 bit for sign, while single precision has only 23 bits for mantissa and 7+1 for exponent. Accuracy in floating point representation is governed by number of significand bits, whereas range is limited by exponent. Not all real numbers can exactly be represented in floating point format.

Therefore, arithmetic operations with these numbers on a computer are not exact and introduce roundoff errors. In practical computation it is not just zero pivots that are unwelcome but also small pivots. The problem with small pivots is that they can lead to large multipliers. There are various techniques for minimizing roundoff error and instability. In computer jargon, an arithmetic operation (+, –, *, ÷) on two real numbers is called a flop, which is an acronym for "floating-point operation." While executing each arithmetic operation takes different time for a computer (multiplication being the fastest and division being the slowest), we need to determine the cost in flops for a computer processor to perform Gaussian elimination.

Consider a linear system of equations, written in vector form Ax = b, of m equations in n unknowns to be solved by Gaussian elimination. So in the above equation, A is the given m×n matrix and b is the known m column vector. We build the m×(n+1) augmented matrix B = [A | b]. Assuming that we perform Gaussian elimination without exchanging rows and pivots are all situated in appropriate sequential rows, on the first step we need to eliminate entries marked by ⊗ in the augmented matrix: ⊙

\[ {\bf B} = \left[ {\bf A}\, \vert\,{\bf b} \right] = \left[ \begin{array}{cccccc|c} ♦ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \\ ⊗ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \\ ⊗ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ ⊗ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \end{array} \right] , \]
where ♦ denotes the pivot's position, ⊗ denotes element to be eliminates, and • denotes a quantity that is not being computed. To eliminate the first term in position (2,1), we need to make n+1 multiplications (by the value in this position) and the same number of divisions (by the value at the first pivot) and finally make n+1 subtractions. So the total number of operations will be
\[ (n+1)\, M + (n+1)\,S + (n+1)\,D = 3\,(n+1) \quad\mbox{flops} , \]
where M stands for multiplication, S for subtraction, and D for division. Since we have m rows, we need to repeat the same number of arithmetic operations m-1 times. Therefore, using
\[ (m-1) \left[ (n+1)\, M + (n+1)\,S + (n+1)\,D \right] = 3(m-1)\,(n+1) \quad\mbox{flops} . \]
Now in the second step, we eliminate entries in the second column below the pivot:
\[ {\bf B} = \left[ {\bf A}\, \vert\,{\bf b} \right] \sim \left[ \begin{array}{cccccc|c} ♦ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \\ 0 & ♦ & \times & \cdots & \times & \times & \times \\ 0 & ⊗ & \times & \cdots & \times & \times & \times \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & ⊗ & \times & \cdots & \times & \times & \times \end{array} \right] \ \sim \ \left[ \begin{array}{cccccc|c} ♦ & \bullet & \bullet & \cdots & \bullet & \bullet & \bullet \\ 0 & ♦ & \times & \cdots & \times & \times & \times \\ 0 & 0 & ♦ & \cdots & \times & \times & \times \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & ⊗ & \cdots & \times & \times & \times \end{array} \right] , \]
where × denotes a quantity that is being computed. To achieve this, we need to make
\[ (m-2) \left[ (n)\, M + (n)\,S + (n)\,D \right] = 3(m-2)\,(n) \quad\mbox{flops} . \]
Proceeding in a similar manner, we need to make
\[ N (n,m) =\sum_{j=1}^{m-1} \,(m-j) \left[ (n+2-j)\, M + (n+2-j)\,S + (n+2-j)\,D \right] = \sum_{j=1}^{m-1} \,3(m-j)\,(n+2-j) \quad\mbox{flops} . \]
To find the explicit expression for the the sum, we use the famous formulas (see V. Dobrushkin's book for their derivations):
\begin{align*} \sum_{j=1}^{m-1} \, j &= \frac{m(m-1)}{2} = \binom{m}{2} , \\ \sum_{j=1}^{m-1} \, j^2 &= \frac{m(m-1)(2m-1)}{6} = \frac{1}{4} \,\binom{2\,m}{3} . \end{align*}
Then we get
\[ N (n,m) =3\,\sum_{j=1}^{m-1} \left[ j^2 - 2\,j -j\,m - j\,n + m\,n \right] = \frac{m}{2} \left( m-1 \right) \left( 3n -m +5 \right) \quad\mbox{flops} . \]
When n = m, we have
\[ N (n,n) = \frac{n}{2} \left( n-1 \right) \left( 2\,n +5 \right) \approx n^3 \quad\mbox{flops} . \]

  1. Higham, Nicholas, Gaussian Elimination, Manchester Institute for Mathematical Sciences, School of Mathematics, The University of Manchester, 2011.
  2. Dobrushkin, Vladimir, Methods in Algorithmic Analysis, CRC Press, 2010.