Matrix Algebra

Sage is a powerful system for studying and exploring many different areas of mathematics. This page presents some topics from Linear Algebra needed for construction of solutions to systems of linear ordinary differential equations and some applications.

 

How to Define and Operate on Vectors

A vector usually describes a quantity that has both magnitude and direction. Geometrically, a vector can be represented by a directed line segment---that is by arrow---and is denoted either by a boldface lower case symbol or by a symbol with an arrow over it. A vector whose initial point is A and whose terminal point is B is written \( \vec{AB} .\) Vectors are said to be free, which means that a vector can be moved from one position to another provided that its magnitude (denoted by \( \| \vec{AB} \| \) ) and direction are not changed.

A set of vectors is usually called a vector space (also a linear space), which is an abstract definition in mathematics. Historically, the first ideas leading to vector spaces can be traced back as far as the 17th century; however, the idia crystallized with the work of the German mathematician Hermann Günther Grassmann (1809--1877), who published a paper in 1862. A vector space is a collection of objects called vectors, which may be added together and multiplied ("scaled") by numbers, called scalars in this context. Scalars are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called axioms

When a basis in a vector space has been chosen, a vector can be expanded with respect to the basis vectors. For example, when unit vectors in n-dimensional space \( \mathbb{R}^n \) have been chosen, any vector can be uniquely expanded with respect to this basis. In three dimensional space, it is custom to use the Cartesian coordinate system and denote these unit vectors by i (abscissa or horizontal axis, usually denoted by x), j (ordinate or vertical axis, usually denoted by y), and k (applicate, usually denoted by z). With respect to these unit vectors, any vector can be written as \( {\bf v} = x\,{\bf i} + y\,{\bf j} + z\,{\bf k} . \) Then the vector v is uniquely defined by an ordered tripple of real numbers

\[ {\bf v} = \left[ x, y , z \right] . \]
In general, a vector in n-dimensional space is identified by n-tuple (an ordered array of numbers), and in infinite dimensional space by a sequence of numbers.

We remember basic properties of vectors. First we define a vector over the field of real numbers:

sage: v = vector(RR, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1.00000000000000, 0.500000000000000, 0.333333333333333, 0.250000000000000)

However, if we define the same vector over the field of rational numbers, we get

sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v 
(1, 1/2, 1/3, 1/4)
It is essential that each element knows what its classification.
sage: v.parent()
Vector space of dimension 4 over Rational Field
sage: v.degree()
4
sage: v[3]
1/4

Even though we typed v[3], the last command asks to display the fourth element in the vector v because Sage counts from zero. The degree command provides the number of elements in the vector, which is its size.

However, if you try to define the same vector over the ring of natural numbers, Sage will complain:

sage: v = vector(ZZ, 4, [1, 1/2, 1/3, 1/4]); v
because entries are not intergers; on the other hand, the command below works
sage: v = vector(ZZ, 4, [1, 2, 3, -4])
sage: v
(1, 2, 3, -4)
To check, you may want to type:
sage: v.parent() 
Ambient free module of rank 4 over the principal ideal domain
In mathematics and applications, it is a custom to distinguish column vectors
\[ {\bf v} = \left( \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right) \qquad \mbox{also written as } \qquad {\bf v} = \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right] , \]
for which we use lowercase letters in boldface type, from row vectors (ordered n-tuple)
\[ \vec{v} = \left[ v_1 , v_2 , \ldots , v_n \right] . \]
Here entries \( v_i \) are known as the component of the vector. The column vectors and the row vectors can be defined using matrix command as an example of an \( n\times 1 \) matrix and \( 1\times n \) matrix, respectively:
sage: u = matrix() 
output
sage: v = matrix()
output

Vectors in Sage are built, manipulated and interrogated similarly to matrices (see next subsection). However, as simple lists (“one-dimensional,” not “two-dimensional” such as matrices that look more tabular), they are simpler to construct and manipulate. Sage will print a vector across the screen, even if we wish to interpret it as a column vector. It will be delimited by parentheses ( (,) ) which allows us to distinguish a vector from a matrix with just one row, if we look carefully. The number of “slots” in a vector is not referred to in Sage as rows or columns, but rather by “degree.” Here are some examples (remember to start counting from zero):

sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1, 1/2, 1/3, 1/4)
sage: v.degree()
4
sage: v.parent()
Vector space of dimension 4 over Rational Field
Sage is largely object-oriented, which means many commands apply to an object by using the “dot” notation.

Mathematical objects in Sage often come from sets of similar objects. This set is called the “parent” of the element. We can use this to learn how Sage deduces the type of entries in a matrix. It is possible to also skip specifying the type of numbers used for entries of a matrix, however this is fraught with peril, as Sage will make an informed guess about your intent, which may not be what you want. Consider when you enter the single character “2” into a computer program like Sage. Is this the integer 2, the rational number \( \frac{2}{1}, \) the real number 2.00000, the complex number 2 + 0i, or the polynomial \( p(x)=2? \) In context, we can usually figure it out, but a literal-minded computer is not so smart. It happens that the operations we can perform, and how they behave, are influenced by the type of the entries in a matrix. So it is important to get this right and our advice is to be explicit and be in the habit of always specifying the type of the entries of a matrix you create.

Sage knows a wide variety of sets of numbers. These are known as “rings” or “fields,” but we will call them “number systems” here. Examples include:
SR to enter symbolic entries,
ZZ is the set of integers,
QQ is the set of rationals,
RR is the real numbers with reasonable precision, and
CC is the complex numbers with reasonable precision.
We will present the theory of linear algebra over the complex numbers. However, in any computer system, there will always be complications surrounding the inability of digital arithmetic to accurately represent all complex numbers. In contrast, Sage can represent rational numbers exactly as the quotient of two (perhaps very large) integers. So our Sage examples will begin by using QQ as our number system and we concentrate on understanding the key concepts.

sage: z = zero_vector(QQ, 5)
sage: z
(0, 0, 0, 0, 0)
Let S be a set of vectors \( {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_k .\) A vector v is said to be a linear combination of the vectors from S if and only if there are scalars (not all zeroes) \( c_1 , \ c_2 , \ \ldots , \ c_k , \) such that \( {\bf v} = c_1 {\bf v}_1 + c_2 {\bf v}_2 + \cdots + c_k {\bf v}_k .\) That is, a linear combination of vectors from S is a sum of scalar multiples of those vectors. Let S be a nonempty subset of a vector space V. Then the span of S in V is the set of all possible (finite) linear combinations of the vectors in S (including zero vector). It is usually denoted by span(S). In other words, a span of a set of vectors in a vector space is the intersection of all subspaces containing that set.

Example. The vector [-2, 8, 5, 0] is a linear combination of the vectors [3, 1, -2, 2], [1, 0, 3, -1], and [4, -2, 1 0], because

\[ 2\,[3,\, 1,\, -2,\,2] + 4\,[1,\,0,\,3,\,-1] -3\,[4,\,-2,\, 1,\, 0] = [-2,\,8,\, 5,\, 0] . \qquad ■ \]

Let S be a subset of a vector space V.

(1) S is a linearly independent subset of V if and only if no vector in S can be expressed as a linear combination of the other vectors in S.
(2) S is a linearly dependent subset of V if and only if some vector v in S can be expressed as a linear combination of the other vectors in S.

Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \) in a vector space V is linearly independent if and only if the only coefficients satisfying the vector equation

\[ k_1 {\bf v}_1 + k_2 {\bf v}_2 + \cdots + k_r {\bf v}_r = {\bf 0} \]
are \( k_1 =0, \ k_2 =0, \ \ldots , \ k_r =0 . \)

Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \) in a vector space V is linearly independent if and only if the matrix of the column-vectors from S has rank r. ■

If \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_n \} \) is a set of vectors in a finite-dimensional vector space V, then S is called basis for V if:

  • S spans V;
  • S is linearly independent. ■

Sage has three multiplication commands for vectors: the dot and outer products (for arbitrary vectors), and the cross product (for three dimensional vectors).

The dot product of two vectors of the same size \( {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \) and \( {\bf y} = \left[ y_1 , y_2 , \ldots , y_n \right] \) (independently whether they are columns or rows) is the number, denoted either by \( {\bf x} \cdot {\bf y} \) or \( \left\langle {\bf x} , {\bf y} \right\rangle ,\)

\[ \left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n , \]
when entries are real, or
\[ \left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = \overline{x_1} y_1 + \overline{x_2} y_2 + \cdots + \overline{x_n} y_n , \]

when entries are complex. The dot product was first introduced by the American physicist and mathematician Josiah Willard Gibbs (1839--1903) in the 1880s. An outer product is the tensor product of two coordinate vectors \( {\bf u} = \left[ u_1 , u_2 , \ldots , u_m \right] \) and \( {\bf v} = \left[ v_1 , v_2 , \ldots , v_n \right] , \) denoted \( {\bf u} \otimes {\bf v} , \) is an m-by-n matrix W such that its coordinates satisfy \( w_{i,j} = u_i v_j . \) The outer product \( {\bf u} \otimes {\bf v} , \) is equivalent to a matrix multiplication \( {\bf u} \, {\bf v}^{\ast} , \) (or \( {\bf u} \, {\bf v}^{\mathrm T} , \) if vectors are real) provided that u is represented as a column \( m \times 1 \) vector, and v as a column \( n \times 1 \) vector. Here \( {\bf v}^{\ast} = \overline{{\bf v}^{\mathrm T}} . \)

For three dimensional vectors \( {\bf a} = a_1 \,{\bf i} + a_2 \,{\bf j} + a_3 \,{\bf k} = \left[ a_1 , a_2 , a_3 \right] \) and \( {\bf b} = b_1 \,{\bf i} + b_2 \,{\bf j} + b_3 \,{\bf k} = \left[ b_1 , b_2 , b_3 \right] \) , it is possible to define special multiplication, called cross-product:
\[ {\bf a} \times {\bf b} = \det \left[ \begin{array}{ccc} {\bf i} & {\bf j} & {\bf k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array} \right] = {\bf i} \left( a_2 b_3 - b_2 a_3 \right) - {\bf j} \left( a_1 b_3 - b_1 a_3 \right) + {\bf k} \left( a_1 b_2 - a_2 b_1 \right) . \]

Example. For instance, if m = 4 and n = 3, then

\[ {\bf u} \otimes {\bf v} = {\bf u} \, {\bf v}^{\mathrm T} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \begin{bmatrix} v_1 & v_2 & v_3 \end{bmatrix} = \begin{bmatrix} u_1 v_1 & u_1 v_2 & u_1 v_3 \\ u_2 v_1 & u_2 v_2 & u_2 v_3 \\ u_3 v_1 & u_3 v_2 & u_3 v_3 \\ u_4 v_1 & u_4 v_2 & u_4 v_3 \end{bmatrix} . \]

An inner product of two vectors of the same size, usually denoted by \( \left\langle {\bf x} , {\bf y} \right\rangle ,\) is a generalization of the dot product if it satisfies the following properties:

  • \( \left\langle {\bf v}+{\bf u} , {\bf w} \right\rangle = \left\langle {\bf v} , {\bf w} \right\rangle + \left\langle {\bf u} , {\bf w} \right\rangle . \)
  • \( \left\langle {\bf v} , \alpha {\bf u} \right\rangle = \alpha \left\langle {\bf v} , {\bf u} \right\rangle \) for any scalar α.
  • \( \left\langle {\bf v} , {\bf u} \right\rangle = \overline{\left\langle {\bf u} , {\bf v} \right\rangle} , \) where overline means complex conjugate.
  • \( \left\langle {\bf v} , {\bf v} \right\rangle \ge 0 , \) and equal if and only if \( {\bf v} = {\bf 0} . \)

The fourth condition in the list above is known as the positive-definite condition. A vector space together with the inner product is called an inner product space. Every inner product space is a metric space. The metric or norm is given by

\[ \| {\bf u} \| = \sqrt{\left\langle {\bf u} , {\bf u} \right\rangle} . \]
The nonzero vectors u and v of the same size are orthogonal (or perpendicular) when their inner product is zero: \( \left\langle {\bf u} , {\bf v} \right\rangle = 0 . \) We abbreviate it as \( {\bf u} \perp {\bf v} . \) If A is an \( n \times n \) positive definite matrix and u and v are n-vectors, then we can define the weighted Euclidean inner product
\[ \left\langle {\bf u} , {\bf v} \right\rangle = {\bf A} {\bf u} \cdot {\bf v} = {\bf u} \cdot {\bf A}^{\ast} {\bf v} \qquad\mbox{and} \qquad {\bf u} \cdot {\bf A} {\bf v} = {\bf A}^{\ast} {\bf u} \cdot {\bf v} . \]
In particular, if w1, w2, ... , wn are positive real numbers, which are called weights, and if u = ( u1, u2, ... , un) and v = ( v1, v2, ... , vn) are vectors in \( \mathbb{R}^n , \) then the formular
\[ \left\langle {\bf u} , {\bf v} \right\rangle = w_1 u_1 v_1 + w_2 u_2 v_2 + \cdots + w_n u_n v_n \]
defines an inner product on \( \mathbb{R}^n , \) that is called the weighted Euclidean inner product with weights w1, w2, ... , wn.

Example. The Euclidean inner product and the weighted Euclidean inner product (when \( \left\langle {\bf u} , {\bf v} \right\rangle = \sum_{k=1}^n a_k u_k v_k , \) for some positive numbers \( a_k , \ (k=1,2,\ldots , n \) ) are special cases of a general class of inner products on \( \mathbb{R}^n \) called matrix inner product. Let A be an invertible n-by-n matrix. Then the formula

\[ \left\langle {\bf u} , {\bf v} \right\rangle = {\bf A} {\bf u} \cdot {\bf A} {\bf v} = {\bf v}^{\mathrm T} {\bf A}^{\mathrm T} {\bf A} {\bf u} \]
defines an inner product generated by A.

Example. In the set of integrable functions on an interval [a,b], we can define the inner product of two functions f and g as

\[ \left\langle f , g \right\rangle = \int_a^b \overline{f} (x)\, g(x) \, {\text d}x \qquad\mbox{or} \qquad \left\langle f , g \right\rangle = \int_a^b f(x)\,\overline{g} (x) \, {\text d}x . \]
Then the norm \( \| f \| \) (also called the 2-norm) becomes the square root of
\[ \| f \|^2 = \left\langle f , f \right\rangle = \int_a^b \left\vert f(x) \right\vert^2 \, {\text d}x . \]
In particular, the 2-norm of the function \( f(x) = 5x^2 +2x -1 \) on the interval [0,1] is
\[ \| 2 x^2 +2x -1 \| = \sqrt{\int_0^1 \left( 5x^2 +2x -1 \right)^2 {\text d}x } = \sqrt{7} . \]

Example. Consider a set of polynomials of degree n. If

\[ {\bf p} = p(x) = p_0 + p_1 x + p_2 x^2 + \cdots + p_n x^n \quad\mbox{and} \quad {\bf q} = q(x) = q_0 + q_1 x + q_2 x^2 + \cdots + q_n x^n \]
are two polynomials, and if \( x_0 , x_1 , \ldots , x_n \) are distinct real numbers (called sample points), then the formula
\[ \left\langle {\bf p} , {\bf q} \right\rangle = p(x_0 ) q(x_0 ) + p_1 (x_1 )q(x_1 ) + \cdots + p(x_n ) q(x_n ) \]
defines an inner product, which is called the evaluation inner product at \( x_0 , x_1 , \ldots , x_n . \)

sage: v = vector([1,2,3]); w = vector([0,5,-9])
sage: v.cross_product(v)
(0, 0, 0)
sage: u = v.cross_product(w); u
(-33, 9, 5)
sage: v.dot_product(u)
-17

With dot product, we can assign a length of a vector, which is also called the Euclidean norm or 2-norm:

\[ \| {\bf x} \| = \sqrt{ {\bf x}\cdot {\bf x}} = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} . \]
In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space—save for the zero vector, which is assigned a length of zero. On an n-dimensional complex space \( \mathbb{C}^n ,\) the most common norm is
\[ \| {\bf z} \| = \sqrt{ {\bf z}\cdot {\bf z}} = \sqrt{\overline{z_1} \,z_1 + \overline{z_2}\,z_2 + \cdots + \overline{z_n}\,z_n} = \sqrt{|z_1|^2 + |z_2 |^2 + \cdots + |z_n |^2} . \]
A unit vector u is a vector whose length equals one: \( {\bf u} \cdot {\bf u} =1 . \) We say that two vectors x and y are perpendicular if their dot product is zero. There are known many other norms, from which we mention Taxicab norm or Manhattan norm, which is also called 1-norm:
\[ \| {\bf x} \| = \sum_{k=1}^n | x_k | = |x_1 | + |x_2 | + \cdots + |x_n | . \]
For any norm, the Cauchy--Bunyakovsky--Schwarz (or simply CBS) inequality holds:
\[ | {\bf x} \cdot {\bf y} | \le \| {\bf x} \| \, \| {\bf y} \| . \]
The inequality for sums was published by Augustin-Louis Cauchy (1789--1857) in 1821, while the corresponding inequality for integrals was first proved by Viktor Yakovlevich Bunyakovsky (1804--1889) in 1859. The modern proof (which is actually a repetition of the Bunyakovsky's one) of the integral inequality was given by Hermann Amandus Schwarz (1843--1921) in 1888. With Euclidean norm, we can define the dot product as
\[ {\bf x} \cdot {\bf y} = \| {\bf x} \| \, \| {\bf y} \| \, \cos \theta , \]
where \( \theta \) is the angle between two vectors. ■

 

How to Define Matrices

 

Sage has built-in commands that will solve a linear system of equations, given a coefficient matrix and a vector of constants. We need to learn some more theory before we can entirely understand this command, but we can begin to explore its use. For now, consider these methods experimental and do not let it replace row-reducing augmented matrices.

\[ {\bf A}\, {\bf x} = {\bf b} . \]
The command A.solve_right(b) will provide information about solutions to the linear system of equations with coefficient matrix A and vector of constants b. The reason for the “right” (and the corresponding command named with “left”) will have to wait for a while. For now, it is generally correct in this course to use the “right” variant of any Sage linear algebra command that has both “left” and “right” variants. Lets apply the .solve_right() command to a system with no solutions:

An important use of Schur complements is the partitioned solution of linear systems. In fact this is the most important application in Finite Element Method (FEM for short), in connection with superelement analysis. Suppose that we have the linear system \( {\bf M}\,{\bf z} = {\bf r} , \) in which the given coefficient matrix M is partitioned into 2 by 2 blocks: \( {\bf M} = \begin{bmatrix} {\bf A} & {\bf B} \\ {\bf C} & {\bf D} \end{bmatrix} . \) Suppose that the rigth-hand side given vector is also broken as \( {\bf r} = [{\bf p}\ {\bf q}]^T , \) while \( {\bf z} = [{\bf x}\ {\bf y}]^T \) is unknown. The system is equivalent to the pair of vector equations:

\[ \begin{split} {\bf A}\, {\bf x} + {\bf B}\, {\bf y} = {\bf p} , \\ {\bf C}\, {\bf x} + {\bf D}\, {\bf y} = {\bf q} . \end{split} \]
If D is nonsingular, eliminating y and solving for x yields
\[ {\bf x} = \left( {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) = \left( {\bf M} / {\bf D} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) , \]
where \( {\bf M} / {\bf D} = {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \) is the Schur complement. Similarly, if A is nonsingular, eliminating x and solving for y yields
\[ {\bf y} = \left( {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) = \left( {\bf M} / {\bf A} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) , \]
where \( {\bf M} / {\bf A} = {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} . \)

Eigenvalues and Eigenvectors

If A is a square \( n \times n \) matrix and v is an \( n \times 1 \) column vector, then the product \( {\bf A}\,{\bf v} \) is defined and is another \( n \times 1 \) column vector. It is important in many applications to determine whether there exist nonzero column vectors v such that the product vector \( {\bf A}\,{\bf v} \) is a constant multiple \( \lambda . \)

Let A be a square \( n \times n \) matrix. A number λ is said to be an eigenvalue of A if there exists a nonzero solution vector v of the linear system

\[ {\bf A}\,{\bf v} = \lambda {\bf v} . \]
The solution vector v is said to be an eigenvector corresponding to the eigenvalue λ. The set of all eigenvectors corresponding to the eigenvalue together with the zero vector is called the eigenspace of the eigenvalue λ.

The set of all eigenvalues of a square matrix A is called spectrum of A and usually denoted by \( \sigma ({\bf A}) . \) In other words, the eigenspace of the eigenvalue λ is the null space (or kernel) of the square matrix \( \lambda {\bf I} - {\bf A} . \)

If A is an \( n \times n \) matrix, then λ is an eigenvalue of A if and only if it satisfies the equation

\[ \det \left( \lambda {\bf I} - {\bf A}\right) = 0 . \]

This is called the characteristic equation and the polynomial \( \chi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A}\right) \) of degree n is called the characteristic polynomial of square matrix A.

An eigenvalue \( \lambda_0 \) of a square matrix A has algebraic multiplicity k if k is the largest integer such that \( (\lambda - \lambda_0 )^k \) is a factor of the characteristic polynomial of A. In other words, k is the first derivative with respect to λ of the characteristic polynomial that is not zero at \( \lambda = \lambda_0 . \)

The geometric multiplicity of an eigenvalue \( \lambda_0 \) of a matrix A is the dimension of the eigenspace of \( \lambda_0 . \) That is, the dimension of the null space of \( \lambda_0 {\bf I} - {\bf A} . \) In other words, the geometrical multiplicity counts the independent eigenvectors for \( \lambda_0 . \)

Theorem: A square matrix A is invertible if and only if \( \lambda =0 \) is not an eigenvalue.

Theorem: Let A be a nonsingular square matrix. If λ is an eigenvalue of A with corresponding eigenvector v, then \( \lambda^{-1} = 1/\lambda \) is an eigenvalue of \( {\bf A}^{-1} \) with the same corresponding eigenvector v.

Theorem: Let A be a square matrix and \( p(\lambda ) = p_m \,\lambda^m + p_{m-1} \,\lambda^{m-1} + \cdots + p_0 \) be a polynomail of degree m. If λ is an eigenvalue of A with corresponding eigenvector v, then \( p(\lambda ) \) is an eigenvalue of \( p\left( {\bf A} \right) \) with the same corresponding eigenvector v.

Theorem: The eigenvalues of an upper triangular, lower triangular, and diagonal matrix are the main diagonal entries.

Theorem: The determinant of a square matrix A is equal to the product of its eigenvalues. The sum of all eigenvalues is equal to the trace of the matrix.

Theorem: If A is a self-adjoint or real symmetric matrix, then the eigenvalues of A are real.

Theorem: Let A be an \( n \times n \) self-adjoint matrix. Then eigenvectors corresponding to distinct (different) eigenvalues are orthogonal.

Theorem: An \( n \times n \) matrix A is orthogonal (that is, \( {\bf A}^{-1} = {\bf A}^T \) ) if and only if its columns \( {\bf c}_1 , \ {\bf c}_2 , \ \ldots , \ {\bf c}_n \) form an orthonormal set. ■

 

Diagonalization

We show how to define a function of a square matrix using diagonalization procedure. This method is applicable only for such matrices, and is not suitable for defective matrices. Recall that a matrix A is called diagonalizable if there exists a nonsingular matrix S such that \( {\bf S}^{-1} {\bf A} {\bf S} = {\bf \Lambda} , \) a diagonal matrix. In other words, the matrix A is similar to a diagonal matrix. An \( n \times n \) square matrix is diagonalizable if and only if there exist n linearly independent eigenvectors, so geometrical multiplicities of each eigenvalue are the same as its algebraic multiplicities. Then the matrix S can be built from eigenvectors of A, column by column.

Let A be a square \( n \times n \) diagonalizable matrix, and let Λ be the corresponding diagonal matrix of its eigenvalues:

\[ {\bf \Lambda} = \begin{bmatrix} \lambda_1 & 0 & 0 & \cdots & 0 \\ 0&\lambda_2 & 0& \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & \lambda_n \end{bmatrix} , \]

where \( \lambda_1 , \lambda_2 , \ldots , \lambda_n \) are eigenvalues (that may be equal) of the matrix A.

Let \( {\bf x}_1 , {\bf x}_2 , \ldots , {\bf x}_n \) be linearly independent eigenvectors, corresponding to the eigenvalues \( \lambda_1 , \lambda_2 , \ldots , \lambda_n .\) We build the nonsingular matrix S from these eigenvectors (every column is an eigenvector):

\[ {\bf S} = \begin{bmatrix} {\bf x}_1 & {\bf x}_2 & {\bf x}_3 & \cdots & {\bf x}_n \end{bmatrix} . \]
Using this matrix S of eigenvectors, we can diagonalize the matrix
\[ {\bf A} = {\bf S} \, {\bf \Lambda} \, {\bf S}^{-1} \qquad \Longrightarrow \qquad {\bf \Lambda} = {\bf S}^{-1} {\bf A} \, {\bf S} . \]

For any reasonable (we do not specify this word, it is sufficient to be smooth) function defined on the spectrum (set of all eigenvalues) of the diagonalizable matrix A, we define the function of this matrix by the formula:
\[ f \left( {\bf A} \right) = {\bf S} f\left( {\bf \Lambda} \right) {\bf S}^{-1} , \qquad \mbox{where } \quad f\left( {\bf \Lambda} \right) = \begin{bmatrix} f(\lambda_1 ) & 0 & 0 & \cdots & 0 \\ 0 & f(\lambda_2 ) & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & f(\lambda_n ) \end{bmatrix} . \]

Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1&4&16 \\ 18&20&4 \\ -12&-14&-7 \end{bmatrix} \) that has three distinct eigenvalues

A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
Eigenvectors[A]
sage: M.rank()
sage: M.right_nullity()
Using eigenvectors, we build the transition matrix of its eigenvectors:
\[ {\bf S} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} . \]

Then we are ready to construct eight square roots of this positive definite matrix:

\[ \sqrt{\bf A} = {\bf S} \sqrt{\Lambda} {\bf S}^{-1} = \begin{bmatrix} 1&4&4 \\ -2&-5&-4 \\ 1&2&1 \end{bmatrix} \begin{bmatrix} \pm 3&0&0 \\ 0&\pm 2&0 \\ 0&0&\pm 1 \end{bmatrix} \begin{bmatrix} -3&-4&-4 \\ 2&3&4 \\ -1&-2&-3 \end{bmatrix} , \]
with appropriate choice of roots on the diagonal. In particular,
\[ \sqrt{\bf A} = \begin{bmatrix} 3&4&8 \\ 2&2&-4 \\ -2&-2&1 \end{bmatrix} , \quad \begin{bmatrix} 21&28&32 \\ -34&-46&-52 \\ 16&22&25 \end{bmatrix} , \quad \begin{bmatrix} -11&-20&-32 \\ 6&14&28 \\ 0&-2&-7 \end{bmatrix} , \quad \begin{bmatrix} 29&44&56 \\ -42&-62&-76 \\ 18&26&31 \end{bmatrix} . \]

 

Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} -20&-42&-21 \\ 6&13&6 \\ 12&24&13 \end{bmatrix} \) that has two distinct eigenvalues

A = {{-20, -42, -21}, {6, 13, 6}, {12, 24, 13}}
Eigenvalues[A]
Out[2]= 4, 1, 1
Eigenvectors[A]
Out[3]= {{ -7, 2, 4 }, {-1, 0, 1 }, {-2, 1, 0 }}
sage: M.rank()
sage: M.right_nullity()
Since to double eigenvalue \( \lambda_2 =1 \) corresponds two linearly independent eigenvectors, the given matrix is diagonalizable, and we are able to build the transition matrix:
\[ {\bf S} = \begin{bmatrix} -7&-1&-2 \\ 2&0&1 \\ 4&1&0 \end{bmatrix} , \quad\mbox{with} \quad {\bf S}^{-1} = \begin{bmatrix} 1&2&1 \\ -4&-8&-3 \\ -2&-3&-2 \end{bmatrix} . \]
For three functions, \( f(\lambda ) = e^{\lambda \,t} , \quad \Phi (\lambda ) = \frac{\sin \left( \sqrt{\lambda} \,t \right)}{\sqrt{\lambda}} , \quad \Psi (\lambda ) = \cos \left( \sqrt{\lambda} \,t \right) \) we construct the corresponding matrix-functions:
\begin{align*} f({\bf A}) &= {\bf S} e^{{\bf \Lambda}\,t} {\bf S}^{-1} = e^{2t} \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} + e^t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} , \\ \Phi ({\bf A}) &= {\bf S} \frac{\sin \left( \sqrt{\bf \Lambda} \,t \right)}{\sqrt{\bf \Lambda}} {\bf S}^{-1} = \sin 2t \begin{bmatrix} -7/2 & -7 & -7/2 \\ 1&2&1 \\ 2&4&2 \end{bmatrix} + \sin t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} , \\ \Psi ({\bf A}) &= {\bf S} \cos \left( {\bf \Lambda}\,t \right) {\bf S}^{-1} = \cos 2t \begin{bmatrix} -7 & -14 & -7 \\ 2&4&2 \\ 4&8&4 \end{bmatrix} + \cos t \begin{bmatrix} 8&14&7 \\ -2&-3&-2 \\ -4&-8&-3 \end{bmatrix} . \end{align*}

 

Example. Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1 &2&3 \\ 2 &3&4 \\ 2&-6&-4 \end{bmatrix} \) that has two complex conjugate eigenvalues \( \lambda = 1 \pm 2{\bf j} \) and one real eigenvalue \( \lambda = -2 .\) Sage confirms:

A = {{1, 2, 3}, {2, 3, 4}, {2, -6, -4}}
Eigenvalues[A]
Eigenvectors[A]
sage: M.rank()
sage: M.right_nullity()
We build the transition matrix of its eigenvectors:
\[ {\bf S} = \begin{bmatrix} -1-{\bf j} & -1+{\bf j} &-7 \\ -2-{\bf j} & -2+{\bf j} &-6 \\ 2&2&1 \end{bmatrix} , \quad \mbox{with} \quad {\bf S}^{-1} = \frac{1}{676} \begin{bmatrix} 13\left( 14 + 5{\bf j} \right) & 13\left( 9 + 7{\bf j} \right) & 13\left( 4 + 7{\bf j} \right) \\ 82 -175 {\bf j} & -257 -245{\bf j} & -88 -245{\bf j} \\ 4 \left( -12 + 5{\bf j} \right) & 4 \left( 17 + 7{\bf j} \right) & 4 \left( 17 + 7{\bf j} \right) \end{bmatrix} . \]

 

Sylvester's Formula

Suppose that A is a diagonalizable \( n \times n \) matrix; this means that all its eigenvalues are not defective and there exists a basis of n linearly independent eigenvectors. Now assume that its minimal polynomial (the polynomial of least possible degree that anihilates the matrix) is a product of simple terms:

\[ \psi (\lambda ) = \left( \lambda - \lambda_1 \right) \left( \lambda - \lambda_2 \right) \cdots \left( \lambda - \lambda_k \right) , \]
where \( \lambda_1 , \lambda_2 , \ldots , \lambda_k \) are distinct eigenvalues of the matrix A ( \( k \le n \) ).

Let \( f(\lambda ) \) be a function defined on the spectrum of the matrix A. The last condition means that every eigenvalue λi is in the domain of f, and that every eigenvalue λi with multiplicity mi > 1 is in the interior of the domain, with f being (mi — 1) times differentiable at λi. We build a function \( f\left( {\bf A} \right) \) of diagonalizable square matrix A according to James Sylvester, who was an English lawyer and music tutor before his appointment as a professor of mathematics in 1885. To define a function of a square matrix, we need to construct k Sylvester auxiliary matrices for each distinct eigenvalue \( \lambda_i , \quad i= 1,2,\ldots k: \)

\[ {\bf Z}_{\lambda_i} = {\bf Z}_{i} = \dfrac{\left( {\bf A} - \lambda_1 {\bf I} \right) \cdots \left( {\bf A} - \lambda_{i-1} {\bf I} \right) \left( {\bf A} - \lambda_{i+1} {\bf I} \right) \cdots \left( {\bf A} - \lambda_k {\bf I} \right)}{(\lambda_i - \lambda_1 ) \cdots (\lambda_i - \lambda_{i-1} ) (\lambda_i - \lambda_{i+1} ) \cdots (\lambda_i - \lambda_k )} . \]
Now we define the function \( f\left( {\bf A} \right) \) according to the formula:
\[ f\left( {\bf A} \right) = \sum_{i=1}^k f(\lambda_i )\,{\bf Z}_i . \]

Each Sylvester's matrix is a projection matrix on eigenspace of the corresponding eigenvalue.

Example. Consider a 2-by-2 matrix

\[ {\bf A} = \begin{bmatrix} 4&5 \\ -1& -2 \end{bmatrix} \]
that has two distinct eigenvalues \( \lambda_1 =3 \) and \( \lambda_2 =-1 . \) The eigenvectors corresponding these eigenvalues are
\[ {\bf v}_1 = \left\langle -5 , \ 1 \right\rangle^T \qquad {\bf v}_2 = \left\langle 1 , \ -1 \right\rangle^T , \]

respectively. Since the minimal polynomail is \( \psi (\lambda ) = (\lambda -3)(\lambda +1) \) is a product of simple factors, we build Sylvester's auxiliary matrices:
\[ {\bf Z}_3 = \frac{{\bf A} +1}{3+1} = \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \quad\mbox{and} \quad {\bf Z}_{-1} = \frac{{\bf A} -3}{-1-3} = \frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} . \]
These matrices are projectors on eigenspaces:
\[ {\bf Z}_3^2 = {\bf Z}_3 \qquad \mbox{and} \qquad {\bf Z}_{-1}^2 = {\bf Z}_{-1} . \]
Let us take an arbitrary vector, say \( {\bf x} = \langle 2, \ 2 \rangle^T , \) and expand it with respect to eigenvectors:
\[ {\bf x} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} = a\, {\bf v}_1 + b\,{\bf v}_2 = a \begin{bmatrix} -5 \\ 1 \end{bmatrix} + b \begin{bmatrix} 1 \\ -1 \end{bmatrix} , \]
where coefficients a and b need to be determined. Of course, we can find them by solving the system of linear equations:
\[ \begin{split} 2 &= a(-5) + b , \\ 2 &= a - b . \end{split} \]
However, the easiet way is to apply Sylvester's auxiliary matrices:
\begin{align*} {\bf Z}_3 \, {\bf x} &= \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \, \begin{bmatrix} 2\\ 2 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} \, \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 10 \\ 2 \end{bmatrix} = \begin{bmatrix} 5\\ -1 \end{bmatrix} = -{\bf v}_1 , \\ {\bf Z}_{-1}\, {\bf x} &= \frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} \, \begin{bmatrix} 2\\ 2 \end{bmatrix} = \begin{bmatrix} -3 \\ 3 \end{bmatrix} = -3\, {\bf v}_2 . \end{align*}
Therefore, coefficients are \( a =-1 \) and \( b =-3 . \) So we get the expansion:
\[ {\bf x} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} = - {\bf v}_1 - 3\,{\bf v}_2 = - \begin{bmatrix} -5 \\ 1 \end{bmatrix} -3 \begin{bmatrix} 1 \\ -1 \end{bmatrix} . \]
We can define some functions, for instance, \( f(\lambda ) = e^{\lambda \,t} \) and \( g(\lambda ) = \cos \left( \lambda \,t\right) . \) Indeed,
\begin{align*} f({\bf A}) &= e^{3t}\, {\bf Z}_3 + e^{-t}\, {\bf Z}_{-1} = e^{3t}\, \frac{1}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} + e^{-t} \, \frac{1}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} , \\ g({\bf A}) &= \cos (3t) \, {\bf Z}_3 + \cos\, t\, {\bf Z}_{-1} = \frac{\cos \,3t}{4} \begin{bmatrix} 5&5 \\ -1& -1 \end{bmatrix} + \frac{\cos\, t}{4} \begin{bmatrix} -1&-5 \\ 1&5 \end{bmatrix} .\end{align*}
sage: M.rank()
sage: M.right_nullity()

Example 1.7.1: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} 1&4&16 \\ 18&20&4 \\ -12&-14&-7 \end{bmatrix} \) that has three distinct eigenvalues
A = {{1,4,16},{18,20,4},{-12,-14,-7}}
Eigenvalues[A]
Out[2]= 9, 4, 1
To determine a function of matrix, \( f ({\bf A} ) , \) we first find Sylvester's auxiliary matrices:
sage: M.rank()
sage: M.right_nullity()

 

Example 1.7.2: Consider the \( 3 \times 3 \) matrix \( {\bf A} = \begin{bmatrix} -20&-42&-21 \\ 6&13&6 \\ 12&24&13 \end{bmatrix} \) that has two distinct eigenvalues
Eigenvalues[A]
Out[2]= 4, 1, 1
Then we define the resolvent:
resolvent = FullSimplify[Inverse[lambda*IdentityMatrix[3] - A]]
Out[2]= {{(-25 + lambda)/( 4 - 5 lambda + lambda^2), -(42/(4 - 5 lambda + lambda^2)), -(21/( 4 - 5 lambda + lambda^2))}, {6/(4 - 5 lambda + lambda^2), ( 8 + lambda)/(4 - 5 lambda + lambda^2), 6/( 4 - 5 lambda + lambda^2)}, {12/(4 - 5 lambda + lambda^2), 24/( 4 - 5 lambda + lambda^2), (8 + lambda)/(4 - 5 lambda + lambda^2)}}
Using the resolvent, we define Sylvester's auxiliary matrices (that are actiually projector matrices on eigenspace)
Z1 = FullSimplify[(lambda - 1)*resolvent] /. lambda -> 1
Out[3]= {{8, 14, 7}, {-2, -3, -2}, {-4, -8, -3}}
Z4 = FullSimplify[(lambda - 4)*resolvent] /. lambda -> 4
Out[4]= {{-7, -14, -7}, {2, 4, 2}, {4, 8, 4}}
sage: M.rank()
sage: M.right_nullity()

 

Example 1.7.3:
sage: M.rank()
sage: M.right_nullity()

 

Resolvent Method

The resolvent method and its applications to partial differential equations was developed by Vladimir Dobrushkin in 1980s. We show its utilization for defining a function of a square matrix, which will be called matrix function.

A general matrix function is a correspondence f that relates to each admissible matrix A of order n a matrix, denoted by \( f({\bf A}) , \) of order n with elements in the complex filed. Such correspondence is assumed to satisfy the following conditions:

  • If \( f(\lambda ) = \lambda , \) then \( f({\bf A}) = {\bf A} ; \)
  • If \( f(\lambda ) = k, \) a constant, then \( f({\bf A}) = k\,{\bf I} , \) where I is the identity matrix;
  • If \( f(\lambda ) = g(\lambda ) + h(\lambda ) , \) then \( f({\bf A}) = g({\bf A}) + h({\bf A}) ; \)
  • If \( f(\lambda ) = g(\lambda ) \cdot h(\lambda ) , \) then \( f({\bf A}) = g({\bf A}) \, h({\bf A}) . \)
These requirements will insure that the definition, when applied to a polynomial \( p(\lambda ) , \) will yield the usual matrix polynomial p(A) and that any rational identity in scalar functions of a complex varibale will be fulfilled by the corresponding matrix functions. The above conditions are not sufficient for most applications, and a fifth requirement would be highly desirable
  • If \( f(\lambda ) = h \left( g(\lambda ) \right) , \) then \( f({\bf A}) = h\left( g({\bf A}) \right) . \)
for holomorphic functions and all admissible matrices. The extension of the concept of a function of a complex variable to matrix functions has occupied the attention of a number of mathematicians since 1883. While there known many approaches to define a function of a square matrix that could be found in the following references
  • R. A. Frazer, W. J. Duncan, and A. R. Collar. Elementary Matrices and Some Applications to Dynamics and Differential Equations. Cambridge University Press, 1938.
  • R. F. Rinehart. The equivalence of definitions of a matric function. American Mathematical Monthly, 62, No 6, 395--414, 1955.
  • Cleve B. Moler and Charles F. Van Loan. Nineteen dubious ways to compute the exponential of a matrix. SIAM Review, 20, No 4, 801--836, 1978.
  • Nicholas J. Higham, Functions of Matrices. Theory and Computation. SIAM, 2008,
we present another method, which is understandable at undergraduate level.

Recall that the resolvent of a square matrix A is

\[ {\bf R}_{\lambda} \left( {\bf A} \right) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} , \]
which is a matrix-function depending on a parameter λ. In general, the resolvent, after reducing all common multiples, is a ratio of a polynomial matrix \( {\bf Q}(\lambda ) \) of degree \( k-1 , \) where k is the degree of the minimal polynomial \( \psi (\lambda ): \)
\[ {\bf R}_{\lambda} \left( {\bf A} \right) = \left( \lambda {\bf I} - {\bf A} \right)^{-1} = \frac{1}{\psi (\lambda )} \, {\bf Q} (\lambda ) . \]
It is assumed that polynomials in \( {\bf Q}(\lambda ) \) and \( \psi (\lambda ) \) are relatively prime.

The residue of the ratio \( f(\lambda ) = P(\lambda )/Q(\lambda ) \) of two polynomials (or entire functions) at the pole \( \lambda_0 \) of multiplicity m is defined by

\[ \mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{1}{(m-1)!} \, \frac{{\text d}^{m-1}}{{\text d} \lambda^{m-1}} \, \frac{(\lambda - \lambda_0 )^{m} P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0} = \lim_{\lambda \mapsto \lambda_0} \left( \frac{1}{(m-1)!} \, \frac{{\text d}^{m-1}}{{\text d} \lambda^{m-1}} \, \frac{(\lambda - \lambda_0 )^{m} P(\lambda )}{Q(\lambda )} \right) . \]
In particular, for simple pole \( m=1 , \) we have
\[ \mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \frac{P(\lambda_0 )}{Q'(\lambda_0 )} . \]
For double pole, \( m=2 , \) we have
\[ \mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{{\text d}}{{\text d} \lambda} \, \frac{(\lambda - \lambda_0 )^2 \, P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0} , \]
and for triple pole, \( m=3 , \) we get
\[ \mbox{Res}_{\lambda_0} \, \frac{P(\lambda )}{Q(\lambda )} = \left. \frac{1}{2!} \, \frac{{\text d}^{2}}{{\text d} \lambda^{2}} \, \frac{(\lambda - \lambda_0 )^{3} P(\lambda )}{Q(\lambda )} \right\vert_{\lambda = \lambda_0} . \]
sage: M.rank()
sage: M.right_nullity()
If a real-valued function \( f(\lambda ) \) has a pair of complex conjugate poles \( a \pm b {\bf j} \) (here \( {\bf j}^2 =-1 \) ), then
\[ \mbox{Res}_{a+b{\bf j}} f(\lambda ) + \mbox{Res}_{a-b{\bf j}} f(\lambda ) = 2\, \Re \, \mbox{Res}_{a+b{\bf j}} f(\lambda ) , \]
where Re = \( \Re \) stands for the real part of a complex number.

Let \( f(\lambda ) \) be a function defined on the spectrum of a square matrix A. Then

\[ f({\bf A}) = \sum_{\mbox{all eigenvalues of {\bf A}}} \, \mbox{Res} \, f(\lambda ) \,{\bf R}_{\lambda} ({\bf A}) . \]

Example.

Example.

Example.

Spectral Decomposition

Originally, spectral decomposition was developed for symmetric or self-adjoint matrices. Following tradition, we present this method for symmetric/self-adjoint matrices, and later expand it for arbitrary matrices.

A matrix A is said to be unitary diagonalizable if there is a unitary matrix U such that \( {\bf U}^{\ast} {\bf A}\,{\bf U} = {\bf \Lambda} , \) where Λ is a diagonal matrix and \( {\bf U}^{\ast} = {\bf U}^{-1} . \) A matrix A is said to be orthogonally diagonalizable if there is an orthogonal matrix P such that \( {\bf P}^{\mathrm T} {\bf A}\,{\bf P} = {\bf \Lambda} , \) where Λ is a diagonal matrix and \( {\bf P}^{\mathrm T} = {\bf P}^{-1} . \)

Theorem: The matrix A is orthogonally diaginalisable if and only if A is symmetric (\( {\bf A} = {\bf A}^{\mathrm T} \) ). ■

Theorem: The matrix A is unitary diaginalisable if and only if A is normal (\( {\bf A}\, {\bf A}^{\ast} = {\bf A}^{\ast} {\bf A} \) ). ■

Example. The matrix

\[ {\bf A} = \begin{bmatrix} 1&{\bf j}&0 \\ {\bf j}&1&0 \\ 0&0&1 \end{bmatrix} \]
is symmetric, normal, but not self-adjoint. Another matrix
\[ {\bf B} = \begin{bmatrix} 2&1 \\ -1&2 \end{bmatrix} \]
is normal, but not self-adjoint. Therefore, both matrices are unitary diagonalizable but not orthogonally diagonalizable.

Example. Consider a symmetric matrix

\[ {\bf A} = \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix} \]
that has the characteristic polynomial \( \chi_{A} (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \left( \lambda -1 \right)^2 \left( \lambda -4 \right) .\) Thus, the distinct eigenvalues of A are \( \lambda_1 =1, \) which has geometrical multiplicity 2, and \( \lambda_3 =4. \) The corresponding eigenvectors are
\[ {\bf u}_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} , \quad {\bf u}_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \qquad \mbox{and} \qquad {\bf u}_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} . \]
The vectors \( {\bf u}_1 , \ {\bf u}_2 \) form the basis for the two-dimensional eigenspace corresponding \( \lambda_1 =1 , \) while \( {\bf u}_3 \) is the eigenvectors corresponding to \( \lambda_3 =4 . \) Applying the Gram--Schmidt process to \( {\bf u}_1 , \ {\bf u}_2 \) yields the following orthogonal basis:
\[ {\bf v}_1 = {\bf u}_1 = \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} \quad \mbox{and} \quad {\bf v}_2 = {\bf u}_2 - \frac{\langle {\bf u}_2 , {\bf v}_1 \rangle}{\| {\bf v}_1 \|^2} \, {\bf v}_1 = \frac{1}{2} \begin{bmatrix} 1 \\ -2 \\ 1 \end{bmatrix} \]
because \( \langle {\bf u}_2 , {\bf u}_1 \rangle = -1 \) and \( \| {\bf v}_1 \|^2 =2 . \) Normalizing these vectors, we obtain orthonormal basis:
\[ {\bf q}_1 = \frac{{\bf v}_1}{\| {\bf v}_1 \|} = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} , \quad {\bf q}_2 = \frac{{\bf v}_2}{\| {\bf v}_2 \|} = \frac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ -2 \\ 1 \end{bmatrix} , \quad {\bf q}_3 = \frac{{\bf v}_3}{\| {\bf v}_3 \|} = \frac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} . \]
Finally, using \( {\bf q}_1 ,\ {\bf q}_2 , \ {\bf q}_3 \) as column vectors, we obtain the unitary matrix
\[ {\bf U} = \begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0& \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix} , \]
which orthogonally diagonalizes A. As a check, we confirm
\[ {\bf U}^{\mathrm T} {\bf A} \,{\bf U} = \begin{bmatrix} \frac{-1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{6}} & \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{bmatrix} \, \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix} \, \begin{bmatrix} \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0& \frac{-2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix} = \begin{bmatrix} 1 &0&0 \\ 0 &1&0 \\ 0 &0&4 \end{bmatrix} , \]

sage: M.rank()
sage: M.right_nullity()

Theorem: Let A be a symmetric or normal \( n \times n \) matrix with eigenvalues \( \lambda_1 , \ \lambda_2 , \ \ldots , \ \lambda_n \) and corresponding eigenvectors \( {\bf u}_1 , \ {\bf u}_2 , \ \ldots , \ {\bf u}_n . \) Then

\[ {\bf A} = \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ {\bf u}_1 & {\bf u}_2 & \cdots & {\bf u}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \, \begin{bmatrix} \lambda_1 &&&0 \\ &\lambda_2 && \\ &&\ddots & \\ 0&&& \lambda_n \end{bmatrix} \, \begin{bmatrix} \longleftarrow & {\bf u}_1 & \longrightarrow \\ \longleftarrow & {\bf u}_2 & \longrightarrow & \\ \vdots & \\ \longleftarrow & {\bf u}_n & \longrightarrow \end{bmatrix} = \sum_{i=1}^n \lambda_i {\bf u}_i {\bf u}_i^{\ast} , \]
which is called spectral decomposition for a symmetric/ normal matrix A. ■
The term was cointed around 1905 by a German mathematician David Hilbert (1862--1942). Denoting the rank one projection matrix \( {\bf u}_i {\bf u}_i^{\ast} \) by \( {\bf E}_i = {\bf u}_i {\bf u}_i^{\ast} , \) we obtain spectral decomposition of A:
\[ {\bf A} = \lambda_1 {\bf E}_1 + \lambda_2 {\bf E}_2 + \cdots + \lambda_n {\bf E}_n . \]
This formular allows one to define a function of a square symmetric/self-adjoint matrix:
\[ f\left( {\bf A} \right) = f(\lambda_1 )\, {\bf E}_1 + f(\lambda_2 )\, {\bf E}_2 + \cdots + f(\lambda_n )\,{\bf E}_n \]
because the matrices \( {\bf E}_k , \quad k=1,2,\ldots n, \) are projection matrices:
\[ {\bf E}_i {\bf E}_j = \delta_{i,j} {\bf E}_i = \begin{cases} {\bf E}_i , & \mbox{ if } i=j, \\ {\bf 0} , & \mbox{ if } i \ne j , \end{cases} \qquad i,j =1,2,\ldots n. \]

Example. Consider a self-adjoint 2-by-2 matrix

\[ {\bf A} = \begin{bmatrix} 1 &2+{\bf j} \\ 2- {\bf j} & 5\end{bmatrix} , \]
where \( {\bf j}^2 =-1 . \) We have \( {\bf S}^{\ast} {\bf A} {\bf S} = {\bf S}^{-1} {\bf A} {\bf S} = {\bf \Lambda} \) for the matrices
\[ {\bf S} = \begin{bmatrix} \left( 2+{\bf j} \right) / \sqrt{6} & \left( 2+{\bf j} \right) / \sqrt{30} \\ - 1/\sqrt{6} & 5\sqrt{30} \end{bmatrix} \quad \mbox{and} \quad {\bf \Lambda} = \begin{bmatrix} 0&0 \\ 0& 6 \end{bmatrix} . \]
Here column vectors in matrix S are normalized eigenvectors corresponding eigenvalues \( \lambda_1 =0 \quad \mbox{and} \quad \lambda_2 =6 . \)

Therefore, the spectral decomposition of A becomes \( {\bf A} = 0\,{\bf E}_1 + 6\,{\bf E}_2 , \) which is clearly matrix A itself.

In our case, projection matrices are

\[ {\bf E}_1 = \frac{1}{6} \begin{bmatrix} 5 & -2 - {\bf j} \\ -2+{\bf j} & 1 \end{bmatrix} , \qquad {\bf E}_2 = \frac{1}{6} \begin{bmatrix} 1 &2+{\bf j} \\ 2- {\bf j} & 5\end{bmatrix} = \frac{1}{6}\, {\bf A} . \]
It is easy to check that
\[ {\bf E}_1^2 = {\bf E}_1 , \qquad {\bf E}_2^2 = {\bf E}_2 , \qquad \mbox{and} \qquad {\bf E}_1 {\bf E}_2 = {\bf 0} . \]
The exponential matrix-function is
\[ e^{{\bf A}\,t} = {\bf E}_1 + e^{6t} \,{\bf E}_2 . \]
sage: M.rank()
sage: M.right_nullity()

Example. Consider a symmetric 3-by-3 matrix from the previous example

\[ {\bf A} = \begin{bmatrix} 2&1&1 \\ 1&2&1 \\ 1&1&2 \end{bmatrix} . \]
Its spectral decomposition is \( {\bf A} = 1\,{\bf E}_1 + 1\,{\bf E}_2 + 4\,{\bf E}_3 , \) where the projection matrices \( {\bf E}_i = {\bf q}_i {\bf q}_i^{\ast} \) are obtained from the orthonormal eigenvectors:
\begin{align*} {\bf E}_1 &= \frac{1}{6} \begin{bmatrix} -1 \\ -1 \\ 2 \end{bmatrix} \left[ -1 \ -1 \ 2 \right] = \frac{1}{6} \begin{bmatrix} 1&1& -2 \\ 1&1& -2 \\ -2&-2& 4 \end{bmatrix} , \\ {\bf E}_2 &= \frac{1}{2} \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix} \left[ -1 \ 1 \ 0 \right] = \frac{1}{2} \begin{bmatrix} 1&-1&0 \\ -1&1&0 \\ 0&0&0 \end{bmatrix} , \\ {\bf E}_3 &= \frac{1}{3} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \left[ 1 \ 1 \ 1 \right] = \frac{1}{3} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} . \end{align*}
Indeed, these diagonalizable matrices satisfy the following relations:
\[ {\bf E}_1^2 = {\bf E}_1 , \quad {\bf E}_2^2 = {\bf E}_2 , \quad {\bf E}_3^2 = {\bf E}_3 , \quad {\bf E}_1 {\bf E}_2 = {\bf 0} , \quad {\bf E}_1 {\bf E}_3 = {\bf 0} , \quad {\bf E}_3 {\bf E}_2 = {\bf 0} , \]
and they all have eigenvalues \( \lambda = 1, 0, 0 . \) Using this spectral decomposition, we define two matrix-functions corresponding to \( {\Phi}(\lambda ) = \cos \left( \sqrt{\lambda} \,t \right) \) and \( {\Psi}(\lambda ) = \frac{1}{\sqrt{\lambda}} \,\sin \left( \sqrt{\lambda} \,t \right) \) that do not depend on the branch of the choisen square root:
\begin{align*} {\bf \Phi} (t) &= \cos \left( \sqrt{\bf A} \,t \right) = \cos t\, {\bf E}_1 + \cos t\, {\bf E}_2 + \cos (2t) \,{\bf E}_3 = \frac{\cos t}{3} \, \begin{bmatrix} 2&-1&-1 \\ -1&2&-1 \\ -1&-1& 2 \end{bmatrix} + \frac{\cos 2t}{3} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} , \\ {\bf \Psi} (t) &= \frac{1}{\sqrt{\bf A}} \,\sin \left( \sqrt{\bf A} \,t \right) = \sin t\, {\bf E}_1 + \sin t\, {\bf E}_2 + \frac{\sin (2t)}{2} \,{\bf E}_3 = \frac{\sin t}{3} \, \begin{bmatrix} 2&-1&-1 \\ -1&2&-1 \\ -1&-1& 2 \end{bmatrix} + \frac{\sin 2t}{6} \begin{bmatrix} 1&1& 1 \\ 1&1& 1 \\ 1&1& 1 \end{bmatrix} . \end{align*}
These matrix-functions are solutions of the following initial value problems for the second order matrix differential equations:
\begin{align*} & \ddot{\bf \Phi}(t) + {\bf A}\,{\bf \Phi} (t) ={\bf 0} , \qquad {\bf \Phi}(0) = {\bf I}, \quad \dot{\bf \Phi}(0) = {\bf 0}, \\ &\ddot{\bf \Psi}(t) + {\bf A}\,{\bf \Phi} (t) ={\bf 0} , \qquad {\bf \Phi}(0) = {\bf 0}, \quad \dot{\bf \Psi}(0) = {\bf I} . \end{align*}
Since the given matrix A is positive definite, we can define four square roots:
\begin{align*} {\bf R}_1 &= {\bf E}_1 + {\bf E}_2 + 2\,{\bf E}_3 = \frac{1}{3} \begin{bmatrix} 4&1&1 \\ 1&4&1 \\ 1&1&4 \end{bmatrix} , \\ {\bf R}_2 &= {\bf E}_1 + {\bf E}_2 - 2\,{\bf E}_3 = \begin{bmatrix} 0&-1&-1 \\ -1&0&-1 \\ -1&-1&0 \end{bmatrix} , \\ {\bf R}_3 &= {\bf E}_1 - {\bf E}_2 + 2\,{\bf E}_3 = \frac{1}{3} \begin{bmatrix} 1&4&1 \\ 4&1&1 \\ 1&1&4 \end{bmatrix} , \\ {\bf R}_4 &= -{\bf E}_1 + {\bf E}_2 + 2\,{\bf E}_3 = \begin{bmatrix} 1&0&1 \\ 0&1&1 \\ 1&1&0 \end{bmatrix} , \end{align*}

and four others are just negative of these four; so total number of square roots is 8. Note that we cannot obtain \( {\bf R}_3 \) and \( {\bf R}_4 \) using neither Sylvester's method nor the Resolvent method because they are based on the minimal polynomial \( \psi (\lambda ) = (\lambda -1)(\lambda -4) . \)

sage: M.rank()
sage: M.right_nullity()

We expand spectral decomposition for arbitary square matrices. Let \( f (\lambda ) \) be an analytic function in a neighborhood of the origin and A be a square \( n \times n \) matrix. We choose the origin as an example; application of the spectral decomposition requirs functions to be expressed as convergent power series in neighborhoods of every eigenvalue. Using a Maclaurin series

\[ f(\lambda ) = f_0 + f_1 \lambda + f_2 \lambda^2 + \cdots + f_k \lambda^k + \cdots = \sum_{k\ge 0} f_k \lambda^k , \]
we can define the matrix-valued function \( f ({\bf A} ) \) as
\[ f({\bf A} ) = \sum_{k\ge 0} f_k {\bf A}^k . \]
Let \( \psi (\lambda ) \) be a minimal polynomial of degree m for the matrix A. Then every power \( {\bf A}^p \) of matrix A can be expressed as a polynomial of degree not higher than \( m-1. \) Therefore,
\[ f({\bf A} ) = \sum_{j= 0}^{m-1} b_j {\bf A}^j , \]
where coefficients \( b_j , \quad j=0,1,\ldots , m-1; \) should satisfy the following equations
\[ f(\lambda_k ) = \sum_{j= 0}^{m-1} b_k \,\lambda_k^j , \]
for each eigenvalue \( \lambda_k , \quad k=1,2,\ldots , s , \) where s is the number of distinct eigenvalues. If the eigenvalue \( \lambda_k \) is of multiplicity \( m_k \) in the minimal polynomial \( \psi (\lambda ) , \) then we need to add \( m_k -1 \) auxiliary algebraic equations
\[ \left. \frac{{\text d}^p f(\lambda )}{{\text d} \lambda^p} \right\vert_{\lambda = \lambda_k} = \left. \frac{{\text d}^p}{{\text d} \lambda^p} \, \sum_{j= 0}^{m-1} b_k \,\lambda_k^j \right\vert_{\lambda = \lambda_k} , \quad p=1,2,\ldots , m_k -1 . \]

Example. Consider a diagonalizable 4-by-4 matrix

\[ {\bf A} = \begin{bmatrix} -4&7&1&4 \\ 6&-16&-3&-9 \\ 12&-27&-4&-15 \\ -18&43&7&24 \end{bmatrix} . \]
Its characteristic polynomial is
\[ \chi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \lambda \left( \lambda -2 \right) \left( \lambda +1 \right)^2 , \]
but its minimal polynomial is a product of simple terms:
\[ \psi (\lambda ) = \det \left( \lambda {\bf I} - {\bf A} \right) = \lambda \left( \lambda -2 \right) \left( \lambda +1 \right) . \]
Therefore, matrix A is diagonalizable because its minimal polynomail is a product of simple terms, and we don't need to find eigenspaces.
sage: M.rank()
sage: M.right_nullity()

Example. Consider a nondiagonalizable 3-by-3 matrix

\[ {\bf A} = \begin{bmatrix} -4&7&1&4 \\ 6&-16&-3&-9 \\ 12&-27&-4&-15 \\ -18&43&7&24 \end{bmatrix} . \]
sage: M.rank()
sage: M.right_nullity()

 

Applications