Diagonalization

We show how to define a function of a square matrix using a diagonalization procedure. This method is applicable only for diagonalizable square matrices, and is not suitable for defective matrices. Recall that a matrix A is called diagonalizable if there exists a nonsingular matrix S such that S1AS=Λ, a diagonal matrix. In other words, the matrix A is similar to a diagonal matrix. An n×n square matrix is diagonalizable if and only if there exist n linearly independent eigenvectors, so geometrical multiplicity of each eigenvalue is the same as its algebraic multiplicity. Then the matrix S can be built from eigenvectors of A, column by column.

Let A be a square n×n diagonalizable matrix, and let Λ be the corresponding diagonal matrix of its eigenvalues:

Λ=[λ10000λ200000λn],

where λ1,λ2,,λn are eigenvalues (that may be equal) of the matrix A.

Let x1,x2,,xn be linearly independent eigenvectors, corresponding to the eigenvalues λ1,λ2,,λn. We build the nonsingular matrix S from these eigenvectors (every column is an eigenvector):

S=[x1x2x3xn].
For any reasonable (we do not specify this word, it is sufficient to be smooth) function defined on the spectrum (set of all eigenvalues) of the diagonalizable matrix A, we define the function of this matrix by the formula:
f(A)=Sf(Λ)S1,where f(Λ)=[f(λ1)0000f(λ2)00000f(λn)].

 

Example: Consider the 3×3 matrix A=[14161820412147] that has three distinct eigenvalues

A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
Eigenvectors[A]
Out[3]= {{1, -2, 1}, {4, -5, 2}, {4, -4, 1}}
Using eigenvectors, we build the transition matrix S of its eigenvectors:
S=[144254121],withS1=[344234123].

Then we are ready to construct eight (it is 23 roots because each square root of an eigenvalue has two values; for instance, 9=±3 ) square roots of this positive definite matrix:

A=SΛS1=[144254121][±3000±2000±1][344234123],
with appropriate choice of roots on the diagonal. In particular,
A=[348224221],[212832344652162225],[11203261428027],[294456426276182631].
We check with Mathematica for specific roots of eigenvalues: 3, 2, and 1. However, we can take any combination of these roots using ±3,±2,±1 next time.
S = Transpose[Eigenvectors[A]]
square = {{3, 0, 0}, {0, 2, 0}, {0, 0, 1}}
S.square.Inverse[S]
Out[7]= {{3, 4, 8}, {2, 2, -4}, {-2, -2, 1}}
Now we build other matrix functions.
eAt=S[e9t000e4t000et]S1=e9t[344688344]+e4t[81216101520468]+et[48124812123],
which we check with Mathematica standard command:
A = {{1, 4, 16}, {18, 20, 4}, {-12, -14, -7}}
MatrixExp[A*t]
Recall that the exponential matrix function is the unique solution to the following initial value problem:
ddtΦ=AΦ(t)=Φ(t)A,Φ(0)=I,
where I is the identity square matrix. Two other important matrix functions depending on time variable t:
sin(At)A=S[sin3t3000sin2t2000sint]S1=sin3t[143432838314343]+sin2t[468515210234]+sint[48124812123],cos(At)=S[cos3t000cos2t000cost]S1=cos3t[344688344]+cos2t[81212101520468]+cost[48124812123].
S = Transpose[Eigenvectors[A]]
S.{{Sin[3*t]/3, 0, 0}, {0, Sin[2*t]/2, 0}, {0, 0, Sin[t]}}.Inverse[S]
S.{{Cos[3*t], 0, 0}, {0, Cos[2*t], 0}, {0, 0, Cos[t]}}.Inverse[S]
These two matrix functions are unique solutions of the following initial value problems:
Φ¨+AΦ=0,Φ(0)=0,Φ˙(0)=IforΦ(t)=sin(At)A,
and
Ψ¨+AΨ=0,Ψ(0)=I,Ψ˙(0)=0forΨ(t)=cos(At).

Example: Consider the 3×3 matrix A=[2042216136122413] that has two distinct eigenvalues

A = {{-20, -42, -21}, {6, 13, 6}, {12, 24, 13}}
Eigenvalues[A]
Out[2]= 4, 1, 1
Eigenvectors[A]
Out[3]= {{ -7, 2, 4 }, {-1, 0, 1 }, {-2, 1, 0 }}
Since the double eigenvalue λ=1 has two linearly independent eigenvectors, the given matrix is diagonalizable, and we are able to build the transition matrix of its eigenvectors:
S=[712201410],withS1=[121483232].
For three functions, f(λ)=eλt,Φ(λ)=sin(λt)λ,Ψ(λ)=cos(λt) we construct the corresponding matrix-functions:

f(A)=SeΛtS1=e2t[7147242484]+et[8147232483],Φ(A)=Ssin(Λt)ΛS1=sin2t[7/277/2121242]+sint[8147232483],Ψ(A)=Scos(Λt)S1=cos2t[7147242484]+cost[8147232483].

These matrix functions are unique solutions of the following initial value problems:
ddteAt=AeAt,limt0eAt=I,where I is the identity matrix;
d2dt2Φ(A)+AΦ(A)=0,limt0Φ(A)=0,limt0Φ˙(A)=I,where I is the identity matrix;
d2dt2Ψ(A)+AΨ(A)=0,limt0Ψ(A)=I,limt0Ψ˙(A)=0.

Example: Consider the 3×3 matrix A=[123234264] that has two complex conjugate eigenvalues λ=1±2j and one real eigenvalue λ=2. Mathematica confirms:
A = {{1, 2, 3}, {2, 3, 4}, {2, -6, -4}}
Eigenvalues[A]
Out[2]= {1 + 2 I, 1 - 2 I, -2}
Eigenvectors[A]
Out[3]= {{-1 - I, -2 - I, 2}, {-1 + I, -2 + I, 2}, {-7, -6, 11}}
We build the transition matrix of its eigenvectors:
S=[1j1+j72j2+j6221],withS1=16[110j1+13j1+8j1+10j113j18j442].
Now we are ready to define a function of the given square matrix. For example, if f(λ)=eλt, we obtain the corresponding exponential matrix:
eAt=S[e(1+2j)t000e(12j)t000e2t]S1=[1j1+j72j2+j6221][et(cos2t+jsin2t)000et(cos2tjsin2t)000e2t]16[110j1+13j1+8j1+10j113j18j442]=13e2t[1414712126221]+13etcos2t[1114712156222]+13etsin2t[9129192517202616].
Here we use Euler's formula: ea+bj=ea(cosb+jsinb). Mathematica confirms
S = {{-1-I, -1+I, -7}, {-2-I, -2+I, -6}, {2, 2, 1}}
diag = {{Exp[t]*(Cos[2*t] + I*Sin[2*t]), 0, 0} , {0, Exp[t]*(Cos[2*t] - I*Sin[2*t]), 0}, {0, 0, Exp[-2*t]}}
FullSimplify[S.diag.Inverse[S]*3]
The matrix function eAt is the unique solution of the following matrix initial value problem:
ddteAt=AeAt,limt0eAt=I,
where I is the identity matrix. ■
============== to be modified =========

Theorem: For a square matrix A, the geometric multiplicity of its any eigenvalue is less than or equal to its algebraic multiplicity. ■

Let λ be an eigenvalue of a n×n matrix A, and suppose that the dimension of its eigenspace, ker(λI - A), is k. Let x1, x2, ... , xk be a basis for this eigenspace. We build n×k matrix X from these eigenvectors:
X=[x1 x2  xk]AX=[Ax1 Ax2  Axk]=[λx1 λx2  λxk]=λX.
We complete matrix X with n×(n-k) matrix X' so that the square n×n matrix S = [X X'] becomes invertible. Then
[S1X S1X]=S1S=[Ik×k00I(nk)×(nk)],
so
S1X=[Ik×k0].
Compute
S1A S=S1[S1X S1X]=S1[λX AX]=[λS1X S1AX]=[λIk×k0C(nk)×(nk)],
for some (n-k)×(n-k) matrix C. Since similar matrices have the same characteristic polynomial, we get
χA(z)=χS1AS(z)=χλIk(z)χC(z)=(zλ)kχC(z).
Consequenly, λ is a root of χA(z) = 0 with multiplicity at least k. ■
Theorem: Let T be a linear operator on an n-dimensional vector space V. Then T is digonalizable if and only if its minimal polynomial ψ(λ) is the product of simple terms:
ψ(λ)=(λλ1)(λλ2)(λλs),
where λ1,λ2,,λs are distinct scalars (which are eigenvalues of matrix A)     ▣
Suppose that T is diagonalizable. Let λ1,λ2,,λs be the distinct eigenvalues of T, and define
p(λ)=(λλ1)(λλ2)(λλs).
We know that p(λ) divides the minimal polynomial ψ(λ) of T. Let β={v1,v2,,vn} be a basis for V consisting of eigenvectors of T, and consider one vector vi from β. Then (λiIT)vi=0 for some eigenvalues λi. Since λ - λi divides p(λ), there is a polynomial q(λ) such that p(λ)=q(λ)(λλi). Hence
p(T)vi=q(T)(TλiI)vi=0.
It follows that p(T)=0 since p(T) moves each element of a basis β for V into zero vector. Therefore, p(λ) is the minimal polynomial.

Conversely, suppose that there are distinct scalars λ1,λ2,,λs such that the minimal polynomial factors as

ψ(λ)=(λλ1)(λλ2)(λλs).
According to previous theorem, all λi are eigenvalues of T. We apply mathematical induction on n = dim(V). Clearly, T is diagonalizable for n=1. Now suppose that T is diagonalizable whenever dim(V) < n for some n>1, and suppose that dim(V) = n. Let U be the range of transformation λsI-T. Clearly UV because λs is an eigenvalue of T. If U={0}, then T = λsI, which is clearly diagonalizable. So suppose that 0 < dim(U) < n. Then U is T-invariant, and for any xU,
(Tλ1I)(Tλ2I)(TλsI)x=0.

It follows that the minimal polynomial for TU, the projection of T on subspace U, divides the polynomial (λλ1)(λλ2)(λλs1). Hence, by the induction hypothesis, TU is diagonalizable. Furethemore, λs is not an eigenvalue of TU. Therefore, Uker(λsIT)={0}. Now let β1={v1,v2,,vm} be a basis for U consisting of eigenvectors of TU (and hence of T), and let β2={w1,w2,,wk} be a basis for the kernel of λkI-T, the eigenspace of T corresponding to λs. Then β1 and β2 are disjoint. Also observe that m+k=n by the dimension theorem applied to λkI-T. We show that β=β1β2 is linearly independent. Consider scalars a1,,am,andb1,,bk such that
a1v1++amvm+b1w1++bkwk=0.
Let
x=a1v1++amvmandy=b1w1++bkwk.
Then xU,yker(λkIT), and x + y = 0. It follows that x=yUker(λkIT), and therefore x = 0. Since β1 is linearly independent, we have a1==am=0. Similarly, b1==bk=0, and we conclude that β is a linearly independent subset of V consisting of eigenvectors of T, and therefore, T is diagonalizable.
Theorem: An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.     ▣
If matrix A is diagonalizable, then its eigenvectors, written as colomn vectors, can be used to form a nonsingular matrix S and then S-1 A S is the diagonal matrix having the eigenvalues of A as the diagonal entries. Thus, if A is not diagonalizable, then A does not have n linearly independent eigenvectors and we cannot form the matrix S.