Matrix Algebra

This page supports the main stream of the web site by providing/reminding some information regading Matrix Algebra. We demonstrate capabilities of MuPad for this topic.

 

How to Define and Operate on Vectors

A vector usually describes a quantity that has both magnitude and direction. Geometrically, a vector can be represented by a directed line segment---that is by arrow---and is denoted either by a boldface lower case symbol or by a symbol with an arrow over it. A vector whose initial point is A and whose terminal point is B is written \( \vec{AB} .\) Vectors are said to be free, which means that a vector can be moved from one position to another provided that its magnitude (denoted by \( \| \vec{AB} \| \) ) and direction are not changed.

A set of vectors is usually called a vector space (also a linear space), which is an abstract definition in mathematics. Historically, the first ideas leading to vector spaces can be traced back as far as the 17th century; however, the idia crystallized with the work of the German mathematician Hermann Günther Grassmann (1809--1877), who published a paper in 1862. A vector space is a collection of objects called vectors, which may be added together and multiplied ("scaled") by numbers, called scalars in this context. Scalars are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called axioms

When a basis in a vector space has been chosen, a vector can be expanded with respect to the basis vectors. For example, when unit vectors in n-dimensional space \( \mathbb{R}^n \) have been chosen, any vector can be uniquely expanded with respect to this basis. In three dimensional space, it is custom to use the Cartesian coordinate system and denote these unit vectors by i (abscissa or horizontal axis, usually denoted x), j (ordinate or vertical axis, usually denoted by y), and k (applicate, usually denoted by z). With respect to these unit vectors, any vector can be written as \( {\bf v} = x\,{\bf i} + y\,{\bf j} + z\,{\bf k} . \) Then the vector v is uniquely defined by an ordered tripple of real numbers

\[ {\bf v} = \left[ x, y , z \right] . \]
In general, a vector in n-dimensional space is identified by n-tuple (an ordered array of numbers), and in infinite dimensional space by a sequence of numbers.

We remember basic properties of vectors. First we define a vector over the field of real numbers:

sage: v = vector(RR, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1.00000000000000, 0.500000000000000, 0.333333333333333, 0.250000000000000)

However, if we define the same vector over the field of rational numbers, we get

sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v 
(1, 1/2, 1/3, 1/4)
It is essential that each element knows what its classification.
sage: v.parent()
Vector space of dimension 4 over Rational Field
sage: v.degree()
4
sage: v[3]
1/4

Even though we typed v[3], the last command asks to display the fourth element in the vector v because Sage counts from zero. The degree command provides the number of elements in the vector, which is its size.

However, if you try to define the same vector over the ring of natural numbers, Sage will complain:

sage: v = vector(ZZ, 4, [1, 1/2, 1/3, 1/4]); v
because entries are not intergers; on the other hand, the command below works
sage: v = vector(ZZ, 4, [1, 2, 3, -4])
sage: v
(1, 2, 3, -4)
To check, you may want to type:
sage: v.parent() 
Ambient free module of rank 4 over the principal ideal domain
In mathematics and applications, it is a custom to distinguish column vectors
\[ {\bf v} = \left( \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right) \qquad \mbox{also written as } \qquad {\bf v} = \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_m \end{array} \right] , \]
for which we use lowercase letters in boldface type, from row vectors (ordered n-tuple)
\[ \vec{v} = \left[ v_1 , v_2 , \ldots , v_n \right] . \]
Here entries \( v_i \) are known as the component of the vector. The column vectors and the row vectors can be defined using matrix command as an example of an \( n\times 1 \) matrix and \( 1\times n \) matrix, respectively:
sage: u = matrix() 
output
sage: v = matrix()
output

Vectors in Sage are built, manipulated and interrogated similarly to matrices (see next subsection). However, as simple lists (“one-dimensional,” not “two-dimensional” such as matrices that look more tabular), they are simpler to construct and manipulate. Sage will print a vector across the screen, even if we wish to interpret it as a column vector. It will be delimited by parentheses ( (,) ) which allows us to distinguish a vector from a matrix with just one row, if we look carefully. The number of “slots” in a vector is not referred to in Sage as rows or columns, but rather by “degree.” Here are some examples (remember to start counting from zero):

sage: v = vector(QQ, 4, [1, 1/2, 1/3, 1/4])
sage: v
(1, 1/2, 1/3, 1/4)
sage: v.degree()
4
sage: v.parent()
Vector space of dimension 4 over Rational Field
Sage is largely object-oriented, which means many commands apply to an object by using the “dot” notation.

Mathematical objects in Sage often come from sets of similar objects. This set is called the “parent” of the element. We can use this to learn how Sage deduces the type of entries in a matrix. It is possible to also skip specifying the type of numbers used for entries of a matrix, however this is fraught with peril, as Sage will make an informed guess about your intent, which may not be what you want. Consider when you enter the single character “2” into a computer program like Sage. Is this the integer 2, the rational number \( \frac{2}{1}, \) the real number 2.00000, the complex number 2 + 0i, or the polynomial \( p(x)=2? \) In context, we can usually figure it out, but a literal-minded computer is not so smart. It happens that the operations we can perform, and how they behave, are influenced by the type of the entries in a matrix. So it is important to get this right and our advice is to be explicit and be in the habit of always specifying the type of the entries of a matrix you create.

Sage knows a wide variety of sets of numbers. These are known as “rings” or “fields,” but we will call them “number systems” here. Examples include:
SR to enter symbolic entries,
ZZ is the set of integers,
QQ is the set of rationals,
RR is the real numbers with reasonable precision, and
CC is the complex numbers with reasonable precision.
We will present the theory of linear algebra over the complex numbers. However, in any computer system, there will always be complications surrounding the inability of digital arithmetic to accurately represent all complex numbers. In contrast, Sage can represent rational numbers exactly as the quotient of two (perhaps very large) integers. So our Sage examples will begin by using QQ as our number system and we concentrate on understanding the key concepts.

sage: z = zero_vector(QQ, 5)
sage: z
(0, 0, 0, 0, 0)
Let S be a set of vectors \( {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_k .\) A vector v is said to be a linear combination of the vectors from S if and only if there are scalars (not all zeroes) \( c_1 , \ c_2 , \ \ldots , \ c_k , \) such that \( {\bf v} = c_1 {\bf v}_1 + c_2 {\bf v}_2 + \cdots + c_k {\bf v}_k .\) That is, a linear combination of vectors from S is a sum of scalar multiples of those vectors. Let S be a nonempty subset of a vector space V. Then the span of S in V is the set of all possible (finite) linear combinations of the vectors in S (including zero vector). It is usually denoted by span(S). In other words, a span of a set of vectors in a vector space is the intersection of all subspaces containing that set.

Example. The vector [-2, 8, 5, 0] is a linear combination of the vectors [3, 1, -2, 2], [1, 0, 3, -1], and [4, -2, 1 0], because

\[ 2\,[3,\, 1,\, -2,\,2] + 4\,[1,\,0,\,3,\,-1] -3\,[4,\,-2,\, 1,\, 0] = [-2,\,8,\, 5,\, 0] . \qquad ■ \]

Let S be a subset of a vector space V.

(1) S is a linearly independent subset of V if and only if no vector in S can be expressed as a linear combination of the other vectors in S.
(2) S is a linearly dependent subset of V if and only if some vector v in S can be expressed as a linear combination of the other vectors in S.

Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \) in a vector space V is linearly independent if and only if the only coefficients satisfying the vector equation

\[ k_1 {\bf v}_1 + k_2 {\bf v}_2 + \cdots + k_r {\bf v}_r = {\bf 0} \]
are \( k_1 =0, \ k_2 =0, \ \ldots , \ k_r =0 . \)

Theorem: A nonempty set \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_r \} \) in a vector space V is linearly independent if and only if the matrix of the column-vectors from S has rank r. ■

If \( S = \{ {\bf v}_1 , \ {\bf v}_2 , \ \ldots , \ {\bf v}_n \} \) is a set of vectors in a finite-dimensional vector space V, then S is called basis for V if:

  • S spans V;
  • S is linearly independent. ■

Sage has three multiplication commands for vectors: the dot and outer products (for arbitrary vectors), and the cross product (for three dimensional vectors).

The dot product of two vectors of the same size \( {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \) and \( {\bf y} = \left[ y_1 , y_2 , \ldots , y_n \right] \) (independently whether they are columns or rows) is the number, denoted either by \( {\bf x} \cdot {\bf y} \) or \( \left\langle {\bf x} , {\bf y} \right\rangle ,\)

\[ \left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n , \]
when entries are real, or
\[ \left\langle {\bf x} , {\bf y} \right\rangle = {\bf x} \cdot {\bf y} = \overline{x_1} y_1 + \overline{x_2} y_2 + \cdots + \overline{x_n} y_n , \]

when entries are complex. The dot product was first introduced by the American physicist and mathematician Josiah Willard Gibbs (1839--1903) in the 1880s. An outer product is the tensor product of two coordinate vectors \( {\bf u} = \left[ u_1 , u_2 , \ldots , u_m \right] \) and \( {\bf v} = \left[ v_1 , v_2 , \ldots , v_n \right] , \) denoted \( {\bf u} \otimes {\bf v} , \) is an m-by-n matrix W such that its coordinates satisfy \( w_{i,j} = u_i v_j . \) The outer product \( {\bf u} \otimes {\bf v} , \) is equivalent to a matrix multiplication \( {\bf u} \, {\bf v}^{\ast} , \) (or \( {\bf u} \, {\bf v}^{\mathrm T} , \) if vectors are real) provided that u is represented as a column \( m \times 1 \) vector, and v as a column \( n \times 1 \) vector. Here \( {\bf v}^{\ast} = \overline{{\bf v}^{\mathrm T}} . \)

For three dimensional vectors \( {\bf a} = a_1 \,{\bf i} + a_2 \,{\bf j} + a_3 \,{\bf k} = \left[ a_1 , a_2 , a_3 \right] \) and \( {\bf b} = b_1 \,{\bf i} + b_2 \,{\bf j} + b_3 \,{\bf k} = \left[ b_1 , b_2 , b_3 \right] \) , it is possible to define special multiplication, called cross-product:
\[ {\bf a} \times {\bf b} = \det \left[ \begin{array}{ccc} {\bf i} & {\bf j} & {\bf k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array} \right] = {\bf i} \left( a_2 b_3 - b_2 a_3 \right) - {\bf j} \left( a_1 b_3 - b_1 a_3 \right) + {\bf k} \left( a_1 b_2 - a_2 b_1 \right) . \]

Example. For instance, if m = 4 and n = 3, then

\[ {\bf u} \otimes {\bf v} = {\bf u} \, {\bf v}^{\mathrm T} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \begin{bmatrix} v_1 & v_2 & v_3 \end{bmatrix} = \begin{bmatrix} u_1 v_1 & u_1 v_2 & u_1 v_3 \\ u_2 v_1 & u_2 v_2 & u_2 v_3 \\ u_3 v_1 & u_3 v_2 & u_3 v_3 \\ u_4 v_1 & u_4 v_2 & u_4 v_3 \end{bmatrix} . \]

An inner product of two vectors of the same size, usually denoted by \( \left\langle {\bf x} , {\bf y} \right\rangle ,\) is a generalization of the dot product if it satisfies the following properties:

  • \( \left\langle {\bf v}+{\bf u} , {\bf w} \right\rangle = \left\langle {\bf v} , {\bf w} \right\rangle + \left\langle {\bf u} , {\bf w} \right\rangle . \)
  • \( \left\langle {\bf v} , \alpha {\bf u} \right\rangle = \alpha \left\langle {\bf v} , {\bf u} \right\rangle \) for any scalar α.
  • \( \left\langle {\bf v} , {\bf u} \right\rangle = \overline{\left\langle {\bf u} , {\bf v} \right\rangle} , \) where overline means complex conjugate.
  • \( \left\langle {\bf v} , {\bf v} \right\rangle \ge 0 , \) and equal if and only if \( {\bf v} = {\bf 0} . \)

The fourth condition in the list above is known as the positive-definite condition. A vector space together with the inner product is called an inner product space. Every inner product space is a metric space. The metric or norm is given by

\[ \| {\bf u} \| = \sqrt{\left\langle {\bf u} , {\bf u} \right\rangle} . \]
The nonzero vectors u and v of the same size are orthogonal (or perpendicular) when their inner product is zero: \( \left\langle {\bf u} , {\bf v} \right\rangle = 0 . \) We abbreviate it as \( {\bf u} \perp {\bf v} . \) If A is an \( n \times n \) matrix and u and v are n-vectors, then
\[ {\bf A} {\bf u} \cdot {\bf v} = {\bf u} \cdot {\bf A}^{\ast} {\bf v} \qquad\mbox{and} \qquad {\bf u} \cdot {\bf A} {\bf v} = {\bf A}^{\ast} {\bf u} \cdot {\bf v} . \]

Example. The Euclidean inner product and the weighted Euclidean inner product (when \( \left\langle {\bf u} , {\bf v} \right\rangle = \sum_{k=1}^n a_k u_k v_k , \) for some positive numbers \( a_k , \ k=1,2,\ldots , n \) ) are special cases of a general class of inner products on \( \mathbb{R}^n \) called matrix inner product. Let A be an invertible n-by-n matrix. Then the formula

\[ \left\langle {\bf u} , {\bf v} \right\rangle = {\bf A} {\bf u} \cdot {\bf A} {\bf v} \]
defines an inner product generated by A.

Example. In the set of integrable functions on an interval [a,b], we can define the inner product of two functions f and g as

\[ \left\langle f , g \right\rangle = \int_a^b \overline{f} (x)\, g(x) \, {\text d}x \qquad\mbox{or} \qquad \left\langle f , g \right\rangle = \int_a^b f(x)\,\overline{g} (x) \, {\text d}x . \]
Then the norm \( \| f \| \) (also called the 2-norm) becomes the square root of
\[ \| f \|^2 = \left\langle f , f \right\rangle = \int_a^b \left\vert f(x) \right\vert^2 \, {\text d}x . \]
In particular, the 2-norm of the function \( f(x) = 5x^2 +2x -1 \) on the interval [0,1] is
\[ \| 2 x^2 +2x -1 \| = \sqrt{\int_0^1 \left( 5x^2 +2x -1 \right)^2 {\text d}x } = \sqrt{7} . \]

Example. Consider a set of polynomials of degree n. If

\[ {\bf p} = p(x) = p_0 + p_1 x + p_2 x^2 + \cdots + p_n x^n \quad\mbox{and} \quad {\bf q} = q(x) = q_0 + q_1 x + q_2 x^2 + \cdots + q_n x^n \]
are two polynomials, and if \( x_0 , x_1 , \ldots , x_n \) are distinct real numbers (called sample points), then the formula
\[ \left\langle {\bf p} , {\bf q} \right\rangle = p(x_0 ) q(x_0 ) + p_1 (x_1 )q(x_1 ) + \cdots + p(x_n ) q(x_n ) \]
defines an inner product, which is called the evaluation inner product at \( x_0 , x_1 , \ldots , x_n . \)

sage: v = vector([1,2,3]); w = vector([0,5,-9])
sage: v.cross_product(v)
(0, 0, 0)
sage: u = v.cross_product(w); u
(-33, 9, 5)
sage: v.dot_product(u)
-17

With dot product, we can assign a length of a vector, which is also called the Euclidean norm or 2-norm:

\[ \| {\bf x} \| = \sqrt{ {\bf x}\cdot {\bf x}} = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} . \]
In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space—save for the zero vector, which is assigned a length of zero. On an n-dimensional complex space \( \mathbb{C}^n ,\) the most common norm is
\[ \| {\bf z} \| = \sqrt{ {\bf z}\cdot {\bf z}} = \sqrt{\overline{z_1} \,z_1 + \overline{z_2}\,z_2 + \cdots + \overline{z_n}\,z_n} = \sqrt{|z_1|^2 + |z_2 |^2 + \cdots + |z_n |^2} . \]
A unit vector u is a vector whose length equals one: \( {\bf u} \cdot {\bf u} =1 . \) We say that two vectors x and y are perpendicular if their dot product is zero. There are known many other norms, from which we mension Taxicab norm or Manhattan norm, which is also called 1-norm:
\[ \| {\bf x} \| = \sum_{k=1}^n | x_k | = |x_1 | + |x_2 | + \cdots + |x_n | . \]
For any norm, the Cauchy--Bunyakovsky--Schwarz (or simply CBS) inequality holds:
\[ | {\bf x} \cdot {\bf y} | \le \| {\bf x} \| \, \| {\bf y} \| . \]
The inequality for sums was published by Augustin-Louis Cauchy (1789--1857) in 1821, while the corresponding inequality for integrals was first proved by Viktor Yakovlevich Bunyakovsky (1804--1889) in 1859. The modern proof (which is actually a repetition of the Bunyakovsky's one) of the integral inequality was given by Hermann Amandus Schwarz (1843--1921) in 1888. With Euclidean norm, we can define the dot product as
\[ {\bf x} \cdot {\bf y} = \| {\bf x} \| \, \| {\bf y} \| \, \cos \theta , \]
where \( \theta \) is the angle between two vectors. ■

 

How to Define Matrices

 

Sage has built-in commands that will solve a linear system of equations, given a coefficient matrix and a vector of constants. We need to learn some more theory before we can entirely understand this command, but we can begin to explore its use. For now, consider these methods experimental and do not let it replace row-reducing augmented matrices.

\[ {\bf A}\, {\bf x} = {\bf b} . \]
The command A.solve_right(b) will provide information about solutions to the linear system of equations with coefficient matrix A and vector of constants b. The reason for the “right” (and the corresponding command named with “left”) will have to wait for a while. For now, it is generally correct in this course to use the “right” variant of any Sage linear algebra command that has both “left” and “right” variants. Lets apply the .solve_right() command to a system with no solutions:

An important use of Schur complements is the partitioned solution of linear systems. In fact this is the most important application in Finite Element Method (FEM for short), in connection with superelement analysis. Suppose that we have the linear system \( {\bf M}\,{\bf z} = {\bf r} , \) in which the given coefficient matrix M is partitioned into 2 by 2 blocks: \( {\bf M} = \begin{bmatrix} {\bf A} & {\bf B} \\ {\bf C} & {\bf D} \end{bmatrix} . \) Suppose that the rigth-hand side given vector is also broken as \( {\bf r} = [{\bf p}\ {\bf q}]^T , \) while \( {\bf z} = [{\bf x}\ {\bf y}]^T \) is unknown. The system is equivalent to the pair of vector equations:

\[ \begin{split} {\bf A}\, {\bf x} + {\bf B}\, {\bf y} = {\bf p} , \\ {\bf C}\, {\bf x} + {\bf D}\, {\bf y} = {\bf q} . \end{split} \]
If D is nonsingular, eliminating y and solving for x yields
\[ {\bf x} = \left( {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) = \left( {\bf M} / {\bf D} \right)^{-1} \left( {\bf p} - {\bf B} \,{\bf D}^{-1} {\bf q} \right) , \]
where \( {\bf M} / {\bf D} = {\bf A} - {\bf B}\,{\bf D}^{-1} {\bf C} \) is the Schur complement. Similarly, if A is nonsingular, eliminating x and solving for y yields
\[ {\bf y} = \left( {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) = \left( {\bf M} / {\bf A} \right)^{-1} \left( {\bf q} - {\bf C} \,{\bf A}^{-1} {\bf p} \right) , \]
where \( {\bf M} / {\bf A} = {\bf D} - {\bf C}\,{\bf A}^{-1} {\bf B} . \)