Diagonalization

We show how to define a function of a square matrix using a diagonalization procedure. This method is applicable only for diagonalizable square matrices, and is not suitable for defective matrices. Recall that a matrix A is called diagonalizable if there exists a nonsingular matrix S such that \( {\bf S}^{-1} {\bf A} {\bf S} = {\bf \Lambda} , \) a diagonal matrix. In other words, the matrix A is similar to a diagonal matrix. An \( n \times n \) square matrix is diagonalizable if and only if there exist n linearly independent eigenvectors, so geometrical multiplicity of each eigenvalue is the same as its algebraic multiplicity. Then the matrix S can be built from eigenvectors of A, column by column.

Let A be a square \( n \times n \) diagonalizable matrix, and let \( {\bf \Lambda} \) be the corresponding diagonal matrix of its eigenvalues:

\[ {\bf \Lambda} = \begin{bmatrix} \lambda_1 & 0 & 0 & \cdots & 0 \\ 0&\lambda_2 & 0& \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & \lambda_n \end{bmatrix} , \]

where \( \lambda_1 , \lambda_2 , \ldots , \lambda_n \) are eigenvalues (that may be equal) of the matrix A.

Let \( {\bf x}_1 , {\bf x}_2 , \ldots , {\bf x}_n \) be linearly independent eigenvectors, corresponding to the eigenvalues \( \lambda_1 , \lambda_2 , \ldots , \lambda_n .\) We build the nonsingular matrix S from these eigenvectors (every column is an eigenvector):

\[ {\bf S} = \begin{bmatrix} {\bf x}_1 & {\bf x}_2 & {\bf x}_3 & \cdots & {\bf x}_n \end{bmatrix} . \]
For any reasonable (we do not specify this word, it is sufficient to be smooth) function defined on the spectrum (set of all eigenvalues) of the diagonalizable matrix A, we define the function of this matrix by the formula:
\[ f \left( {\bf A} \right) = {\bf S} f\left( {\bf \Lambda} \right) {\bf S}^{-1} , \qquad \mbox{where } \quad f\left( {\bf \Lambda} \right) = \begin{bmatrix} f(\lambda_1 ) & 0 & 0 & \cdots & 0 \\ 0 & f(\lambda_2 ) & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & f(\lambda_n ) \end{bmatrix} . \]

 

Example: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1&4&16 \\ 18&20&4 \\ -12&-14&-7 \end{bmatrix} \) that has three distinct eigenvalues

A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
Eigenvectors[A]
Out[3]= {{1, -2, 1}, {4, -5, 2}, {4, -4, 1}}
Using eigenvectors, we build the transition matrix S of its eigenvectors:
\[ {\bf S} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} . \]

Then we are ready to construct eight (it is 23 roots because each square root of an eigenvalue has two values; for instance, \( \sqrt{9} = \pm 3 \) ) square roots of this positive definite matrix:

\[ \sqrt{\bf A} = {\bf S} \sqrt{\Lambda} {\bf S}^{-1} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} \begin{bmatrix} \pm 3&0&0 \\ 0&\pm 2&0 \\ 0&0&\pm 1 \end{bmatrix} \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} , \]
with appropriate choice of roots on the diagonal. In particular,
\[ \sqrt{\bf A} = \begin{bmatrix} 3&4&8 \\ 2&2&-4 \\ -2&-2&1 \end{bmatrix} , \quad \begin{bmatrix} 21&28&32 \\ -34&-46&-52 \\ 16&22&25 \end{bmatrix} , \quad \begin{bmatrix} -11&-20&-32 \\ 6&14&28 \\ 0&-2&-7 \end{bmatrix} , \quad \begin{bmatrix} 29&44&56 \\ -42&-62&-76 \\ 18&26&31 \end{bmatrix} . \]
We check with Mathematica for specific roots of eigenvalues: 3, 2, and 1. However, we can take any combination of these roots using \( \pm 3, \pm 2, \pm 1 \) next time.
S = Transpose[Eigenvectors[A]]
square = {{3, 0, 0}, {0, 2, 0}, {0, 0, 1}}
S.square.Inverse[S]
Out[7]= {{3, 4, 8}, {2, 2, -4}, {-2, -2, 1}}
Now we build other matrix functions.
\[ e^{{\bf A}\,t} = {\bf S} \begin{bmatrix} e^{9t}&0&0 \\ 0&e^{4t}&0 \\ 0&0&e^t \end{bmatrix} {\bf S}^{-1} = e^{9t} \begin{bmatrix} -3&-4&-4 \\ 6&8&8 \\ -3&-4&-4 \end{bmatrix} + e^{4t} \begin{bmatrix} 8&12&16 \\ -10&-15&-20 \\ 4&6&8 \end{bmatrix} + e^t \begin{bmatrix} -4&-8&-12 \\ 4&8&12 \\ -1&-2& -3 \end{bmatrix} , \]
which we check with Mathematica standard command:
A = {{1, 4, 16}, {18, 20, 4}, {-12, -14, -7}}
MatrixExp[A*t]
Recall that the exponential matrix function is the unique solution to the following initial value problem:
\[ \frac{\text d}{{\text d}t}\, {\bf \Phi} = {\bf A}\,{\bf \Phi}(t) = {\bf \Phi}(t)\, {\bf A} , \quad {\bf \Phi}(0) = {\bf I}, \]
where I is the identity square matrix. Two other important matrix functions depending on time variable t:
\begin{eqnarray*} \frac{\sin \left( \sqrt{\bf A}\,t \right)}{\sqrt{\bf A}} &=& {\bf S} \begin{bmatrix} \frac{\sin 3t}{3}&0&0 \\ 0&\frac{\sin 2t}{2}&0 \\ 0&0&\sin t \end{bmatrix} {\bf S}^{-1} = \sin 3t \begin{bmatrix} -1&-\frac{4}{3} &-\frac{4}{3} \\ 2&\frac{8}{3}&\frac{8}{3} \\ -1&-\frac{4}{3}&-\frac{4}{3} \end{bmatrix} + \sin 2t \begin{bmatrix} 4&6&8 \\ -5&-\frac{15}{2}&-10 \\ 2&3&4 \end{bmatrix} + \sin t \begin{bmatrix} -4&-8&-12 \\ 4&8&12 \\ -1&-2&-3 \end{bmatrix} , \\ \cos \left( \sqrt{\bf A}\,t \right) &=& {\bf S} \begin{bmatrix} \cos 3t&0&0 \\ 0&\cos 2t&0 \\ 0&0&\cos t \end{bmatrix} {\bf S}^{-1} = \cos 3t \begin{bmatrix} -3&-4&-4 \\ 6&8&8 \\ -3&-4&-4 \end{bmatrix} + \cos 2t \begin{bmatrix} 8&12&-12 \\ -10&-15&-20 \\ 4&6&8 \end{bmatrix} + \cos t \begin{bmatrix} -4&-8&-12 \\ 4&8&12 \\ -1&-2&-3 \end{bmatrix} . \end{eqnarray*}
S = Transpose[Eigenvectors[A]]
S.{{Sin[3*t]/3, 0, 0}, {0, Sin[2*t]/2, 0}, {0, 0, Sin[t]}}.Inverse[S]
S.{{Cos[3*t], 0, 0}, {0, Cos[2*t], 0}, {0, 0, Cos[t]}}.Inverse[S]
These two matrix functions are unique solutions of the following initial value problems:
\[ \ddot{\bf \Phi} + {\bf A}\, {\bf \Phi} = {\bf 0} , \quad {\bf \Phi}(0) = {\bf 0} , \quad \dot{\bf \Phi}(0) = {\bf I} \qquad \mbox{for} \quad {\bf \Phi}(t) = \frac{\sin \left( \sqrt{\bf A}\,t \right)}{\sqrt{\bf A}} , \]
and
\[ \ddot{\bf \Psi} + {\bf A}\, {\bf \Psi} = {\bf 0} , \quad {\bf \Psi}(0) = {\bf I} , \quad \dot{\bf \Psi}(0) = {\bf 0} \qquad \mbox{for} \quad {\bf \Psi}(t) = \cos \left( \sqrt{\bf A}\,t \right) . \]

Example: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} -20&-42&-21 \\ 6&13&6 \\ 12&24&13 \end{bmatrix} \) that has two distinct eigenvalues

A = {{-20, -42, -21}, {6, 13, 6}, {12, 24, 13}}
Eigenvalues[A]
Out[2]= 4, 1, 1
Eigenvectors[A]
Out[3]= {{ -7, 2, 4 }, {-1, 0, 1 }, {-2, 1, 0 }}
Since the double eigenvalue \( \lambda =1 \) has two linearly independent eigenvectors, the given matrix is diagonalizable, and we are able to build the transition matrix of its eigenvectors:
\[ {\bf S} = \begin{bmatrix} -7&-1&-2 \\ 2&0&1 \\ 4&1&0 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} 1&2&1 \\ -4&-8&-3 \\ -2&-3&-2 \end{bmatrix} . \]
For three functions, \( f(\lambda ) = e^{\lambda \,t} , \quad \Phi (\lambda ) = \frac{\sin \left( \sqrt{\lambda} \,t \right)}{\sqrt{\lambda}} , \quad \Psi (\lambda ) = \cos \left( \sqrt{\lambda} \,t \right) \) we construct the corresponding matrix-functions:

\begin{align*} f({\bf A}) &= {\bf S} e^{{\bf \Lambda}\,t} {\bf S}^{-1} = e^{2t} \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} + e^t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} , \\ {\bf \Phi} ({\bf A}) &= {\bf S} \frac{\sin \left( \sqrt{\bf \Lambda} \,t \right)}{\sqrt{\bf \Lambda}} {\bf S}^{-1} = \sin 2t \begin{bmatrix} -7/2 & -7 & -7/2 \\ 1&2&1 \\ 2&4&2 \end{bmatrix} + \sin t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} , \\ {\bf \Psi} ({\bf A}) &= {\bf S} \cos \left( {\bf \Lambda}\,t \right) {\bf S}^{-1} = \cos 2t \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} + \cos t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} . \end{align*}

These matrix functions are unique solutions of the following initial value problems:
\[ \frac{\text d}{{\text d}t}\,e^{{\bf A}\,t} = {\bf A}\,e^{{\bf A}\,t} , \qquad \lim_{t\to 0} \,e^{{\bf A}\,t} = {\bf I} , \quad \mbox{where } {\bf I} \mbox{ is the identity matrix}; \]
\[ \frac{{\text d}^2}{{\text d}t^2}\,{\bf \Phi} ({\bf A}) + {\bf A}\,{\bf \Phi} ({\bf A}) = {\bf 0} , \qquad \lim_{t\to 0} \,{\bf \Phi} ({\bf A}) = {\bf 0} , \quad \quad \lim_{t\to 0} \,\dot{\bf \Phi} ({\bf A}) = {\bf I} , \quad \mbox{where } {\bf I} \mbox{ is the identity matrix}; \]
\[ \frac{{\text d}^2}{{\text d}t^2}\,{\bf \Psi} ({\bf A}) + {\bf A}\,{\bf \Psi} ({\bf A}) = {\bf 0} , \qquad \lim_{t\to 0} \,{\bf \Psi} ({\bf A}) = {\bf I} , \quad \quad \lim_{t\to 0} \,\dot{\bf \Psi} ({\bf A}) = {\bf 0} . \]

Example: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1 &2&3 \\ 2 &3&4 \\ 2&-6&-4 \end{bmatrix} \) that has two complex conjugate eigenvalues \( \lambda = 1 \pm 2{\bf j} \) and one real eigenvalue \( \lambda = -2 .\) Mathematica confirms:
A = {{1, 2, 3}, {2, 3, 4}, {2, -6, -4}}
Eigenvalues[A]
Out[2]= {1 + 2 I, 1 - 2 I, -2}
Eigenvectors[A]
Out[3]= {{-1 - I, -2 - I, 2}, {-1 + I, -2 + I, 2}, {-7, -6, 11}}
We build the transition matrix of its eigenvectors:
\[ {\bf S} = \begin{bmatrix} -1-{\bf j} & -1+{\bf j} &-7 \\ -2-{\bf j} & -2+{\bf j} &-6 \\ 2&2&1 \end{bmatrix} , \quad \mbox{with} \quad {\bf S}^{-1} = \frac{1}{6} \begin{bmatrix} 1 - 10{\bf j} & -1 + 13{\bf j} & 1 + 8{\bf j} \\ 1 +10 {\bf j} & -1 -13{\bf j} & 1 -8{\bf j} \\ -4 & 4 & 2 \end{bmatrix} . \]
Now we are ready to define a function of the given square matrix. For example, if \( f(\lambda ) = e^{\lambda \, t} , \) we obtain the corresponding exponential matrix:
\begin{align*} e^{{\bf A}\,t} &= {\bf S} \begin{bmatrix} e^{(1+2{\bf j})\,t} & 0&0 \\ 0& e^{(1-2{\bf j})\,t} & 0 \\ 0&0&e^{-2t} \end{bmatrix} {\bf S}^{-1} \\ &= \begin{bmatrix} -1-{\bf j} & -1+{\bf j} &-7 \\ -2-{\bf j} & -2+{\bf j} &-6 \\ 2&2&1 \end{bmatrix} \, \begin{bmatrix} e^{t} \left( \cos 2t + {\bf j}\,\sin 2t \right) & 0&0 \\ 0& e^{t} \left( \cos 2t - {\bf j}\,\sin 2t \right) & 0 \\ 0&0&e^{-2t} \end{bmatrix} \, \frac{1}{6} \begin{bmatrix} 1 - 10{\bf j} & -1 + 13{\bf j} & 1 + 8{\bf j} \\ 1 +10 {\bf j} & -1 -13{\bf j} & 1 -8{\bf j} \\ -4 & 4 & 2 \end{bmatrix} \\ &= \frac{1}{3} \, e^{-2t} \begin{bmatrix} 14 & -14& -7 \\ 12&-12& -6 \\ -2&2&1 \end{bmatrix} + \frac{1}{3} \, e^{t} \,\cos 2t \begin{bmatrix} -11&14&7 \\ -12&15&6 \\ 2&-2&2 \end{bmatrix} + \frac{1}{3} \, e^{t} \,\sin 2t \begin{bmatrix} -9&12&9 \\ -19&25&17 \\ 20&-26&-16 \end{bmatrix} . \end{align*}
Here we use Euler's formula: \( e^{a+b{\bf j}} = e^a \left( \cos b + {\bf j} \sin b \right) . \) Mathematica confirms
S = {{-1-I, -1+I, -7}, {-2-I, -2+I, -6}, {2, 2, 1}}
diag = {{Exp[t]*(Cos[2*t] + I*Sin[2*t]), 0, 0} , {0, Exp[t]*(Cos[2*t] - I*Sin[2*t]), 0}, {0, 0, Exp[-2*t]}}
FullSimplify[S.diag.Inverse[S]*3]
The matrix function \( e^{{\bf A}\,t} \) is the unique solution of the following matrix initial value problem:
\[ \frac{\text d}{{\text d}t}\,e^{{\bf A}\,t} = {\bf A}\,e^{{\bf A}\,t} , \qquad \lim_{t\to 0} \,e^{{\bf A}\,t} = {\bf I} , \]
where I is the identity matrix. ■
============== to be modified =========

Theorem: For a square matrix A, the geometric multiplicity of its any eigenvalue is less than or equal to its algebraic multiplicity. ■

Let λ be an eigenvalue of a n×n matrix A, and suppose that the dimension of its eigenspace, ker(λI - A), is k. Let x1, x2, ... , xk be a basis for this eigenspace. We build n×k matrix X from these eigenvectors:
\[ {\bf X} = \left[ {\bf x}_1 \ {\bf x}_2 \ \cdots \ {\bf x}_k \right] \qquad \Longrightarrow \qquad {\bf A}\,{\bf X} = \left[ {\bf A}\,{\bf x}_1 \ {\bf A}\,{\bf x}_2 \ \cdots \ {\bf A}\,{\bf x}_k \right] = \left[ \lambda{\bf x}_1 \ \lambda{\bf x}_2 \ \cdots \ \lambda{\bf x}_k \right] = \lambda \,{\bf X} . \]
We complete matrix X with n×(n-k) matrix X' so that the square n×n matrix S = [X X'] becomes invertible. Then
\[ \left[ {\bf S}^{-1} {\bf X}\ {\bf S}^{-1} {\bf X}' \right] = {\bf S}^{-1} {\bf S} = \begin{bmatrix} {\bf I}_{k\times k} & {\bf 0} \\ {\bf 0} & {\bf I}_{(n-k)\times (n-k)} \end{bmatrix} , \]
so
\[ {\bf S}^{-1} {\bf X} = \begin{bmatrix} {\bf I}_{k\times k} \\ {\bf 0} \end{bmatrix} . \]
Compute
\[ {\bf S}^{-1} {\bf A}\ {\bf S} = {\bf S}^{-1} \left[ {\bf S}^{-1} {\bf X}\ {\bf S}^{-1} {\bf X}' \right] = {\bf S}^{-1} \left[ \lambda {\bf X}\ {\bf A}\, {\bf X}' \right] = \left[ \lambda {\bf S}^{-1} {\bf X} \ {\bf S}^{-1}{\bf A}\, {\bf X}' \right] = \begin{bmatrix} \lambda {\bf I}_{k\times k} & {\bf \ast} \\ {\bf 0} & {\bf C}_{(n-k)\times (n-k)} \end{bmatrix} , \]
for some (n-k)×(n-k) matrix C. Since similar matrices have the same characteristic polynomial, we get
\[ \chi_{A} (z) = \chi_{S^{-1} A\,S} (z) = \chi_{\lambda\,I_k} (z) \, \chi_{C} (z) = \left( z- \lambda \right)^k \chi_{C} (z) . \]
Consequenly, λ is a root of χA(z) = 0 with multiplicity at least k. ■
Theorem: Let T be a linear operator on an n-dimensional vector space V. Then T is digonalizable if and only if its minimal polynomial ψ(λ) is the product of simple terms:
\[ \psi (\lambda ) = \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_s \right) , \]
where \( \lambda_1 , \lambda_2 , \ldots , \lambda_s \) are distinct scalars (which are eigenvalues of matrix A)     ▣
Suppose that T is diagonalizable. Let \( \lambda_1 , \lambda_2 , \ldots , \lambda_s \) be the distinct eigenvalues of T, and define
\[ p (\lambda ) = \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_s \right) . \]
We know that p(λ) divides the minimal polynomial ψ(λ) of T. Let \( \beta = \left\{ {\bf v}_1 , {\bf v}_2 , \ldots , {\bf v}_n \right\} \) be a basis for V consisting of eigenvectors of T, and consider one vector vi from β. Then \( \left( \lambda_i I - T \right) {\bf v}_i = 0 \) for some eigenvalues λi. Since λ - λi divides p(λ), there is a polynomial q(λ) such that \( p(\lambda ) = q(\lambda ) \left( \lambda - \lambda_i \right) . \) Hence
\[ p (T ) {\bf v}_i = q(T) \left( T - \lambda_i I \right) {\bf v}_i =0 . \]
It follows that \( p(T) = 0 \) since p(T) moves each element of a basis β for V into zero vector. Therefore, p(λ) is the minimal polynomial.

Conversely, suppose that there are distinct scalars \( \lambda_1 , \lambda_2 , \ldots , \lambda_s \) such that the minimal polynomial factors as

\[ \psi (\lambda ) = \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_s \right) . \]
According to previous theorem, all λi are eigenvalues of T. We apply mathematical induction on n = dim(V). Clearly, T is diagonalizable for n=1. Now suppose that T is diagonalizable whenever dim(V) < n for some n>1, and suppose that dim(V) = n. Let U be the range of transformation λsI-T. Clearly \( U \ne V \) because λs is an eigenvalue of T. If \( U = \{ 0 \} , \) then T = λsI, which is clearly diagonalizable. So suppose that 0 < dim(U) < n. Then U is T-invariant, and for any \( {\bf x} \in U, \)
\[ \left( T - \lambda_1 I \right) \left( T - \lambda_2 I \right) \cdots \left( T - \lambda_s I\right) {\bf x} = 0 . \]

It follows that the minimal polynomial for TU, the projection of T on subspace U, divides the polynomial \( \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_{s-1} \right) . \) Hence, by the induction hypothesis, TU is diagonalizable. Furethemore, λs is not an eigenvalue of TU. Therefore, \( U \cap \mbox{ker} \left( \lambda_s I - T \right) = \{ 0 \} . \) Now let \( \beta_1 = \left\{ {\bf v}_1 , {\bf v}_2 , \ldots , {\bf v}_m \right\} \) be a basis for U consisting of eigenvectors of TU (and hence of T), and let \( \beta_2 = \left\{ {\bf w}_1 , {\bf w}_2 , \ldots , {\bf w}_k \right\} \) be a basis for the kernel of λkI-T, the eigenspace of T corresponding to λs. Then β1 and β2 are disjoint. Also observe that m+k=n by the dimension theorem applied to λkI-T. We show that \( \beta = \beta_1 \cup \beta_2 \) is linearly independent. Consider scalars \( a_1 , \ldots , a_m , \quad\mbox{and} \quad b_1 , \ldots , b_k \) such that
\[ a_1 {\bf v}_1 + \cdots + a_m {\bf v}_m + b_1 {\bf w}_1 + \cdots + b_k {\bf w}_k = {\bf 0} . \]
Let
\[ {\bf x} = a_1 {\bf v}_1 + \cdots + a_m {\bf v}_m \quad \mbox{and} \quad {\bf y} = b_1 {\bf w}_1 + \cdots + b_k {\bf w}_k . \]
Then \( {\bf x} \in U , \quad {\bf y} \in \mbox{ker} \left( \lambda_k I - T \right) , \) and x + y = 0. It follows that \( {\bf x} = - {\bf y} \in U \cap \mbox{ker} \left( \lambda_k I - T \right) , \) and therefore x = 0. Since β1 is linearly independent, we have \( a_1 = \cdots = a_m =0 . \) Similarly, \( b_1 = \cdots = b_k =0 , \) and we conclude that β is a linearly independent subset of V consisting of eigenvectors of T, and therefore, T is diagonalizable.
Theorem: An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.     ▣
If matrix A is diagonalizable, then its eigenvectors, written as colomn vectors, can be used to form a nonsingular matrix S and then S-1 A S is the diagonal matrix having the eigenvalues of A as the diagonal entries. Thus, if A is not diagonalizable, then A does not have n linearly independent eigenvectors and we cannot form the matrix S.