Matrices are both a very ancient and a very current mathematical concept. "Matrix" is the Latin word for womb.
References to matrices and systems of equations can be found in
Chinese manuscripts dating back to around 200 B.C. The term matrix was first used by the English mathematician
James Sylvester (1814--1897), who defined the term in 1850. James Joseph Sylvester also coined many
mathematical terms or used them in a "new or unusual ways" mathematically such as graph, discriminant, annihilator,
canonical form, minor, nullity, and many others. Over the years,
mathematicians and scientists have found many applications of matrices.
More recently, the advent of personal and large-scale computers has
increased the use of matrices in a wide variety of applications.
James Sylvester (his original name was James Joseph, and he adopted a new
family name "Sylvester" following his brother who dwelled in the US under the
name "Sylvester") was born into a Jewish family in London, and was to become
one of the supreme algebraists of the nineteen century. At the age of 14,
Sylvester was a student of
Augustus De Morgan at the
University of London. His family withdrew him from the University after he was accused of
stabbing a fellow student with a knife. Despite having studied for several
years at
St John's College, Cambridge, he was not permitted to take his degree there because
he "professed the faith in which the founder of Christianianity was
educated." Therefore, he received his degrees from Trinity College, Dublin.
In 1841, James moved to the United States to become a professor of mathematics at the University of Virginia, but left
after less than four months following a violent encounter with two students he had disciplined. He moved to New York
City and began friendships with the Harvard mathematician Benjamin Peirce and the Princeton physicist Joseph Henry.
However, he left in November 1843 after being denied appointment as Professor of Mathematics at Columbia College
(now University), again for his Judaism, and returned to England. He was hired in 1844 by the Equity and Law
Life Assurance Society for which he developed successful actuarial models and served as de facto CEO, a position that
required a law degree. In 1872, he finally received his B.A. and M.A. from Cambridge, having been denied the degrees
due to his being a Jew. In 1876 Sylvester again crossed the Atlantic Ocean to become the inaugural professor of
mathematics at the new Johns Hopkins University in Baltimore, Maryland. Sylvester was an avid poet, prefacing many
of his mathematical papers with examples of his work.
A matrix (plural matrices) is a rectangular array of numbers, functions, or any symbols. It can be written as
We denote this array by a single letter A (usually a capital boldfaced letter) or by \( \left( a_{i,j} \right) \)
or \( \left[ a_{i,j} \right] , \) depending what notation (parenthesis or brackets) is in use.
The symbol \( a_{i,j} ,\) or sometimes \( a_{ij} ,\) in the ith
row and jth column is called the \( \left( i, \, j \right) \) entry. We say that A has
m rows and n columns, and that it is an \( m \times n \) matrix or
m-by-n matrix, while m and n are called its dimensions. We also
refer to A as a matrix of size \( m \times n . \) Matrices with a single
row are called row vectors, and those with a single
column are called column vectors. Here is an example of 4×6 matrix
A matrix with the
same number of rows and columns is called a square matrix.
In particular, a square matrix having all elements equal to zero except those on the principal diagonal is called a
diagonal matrix. Mathematica has two dedicated commands:
DiagonalMatrix[ list ]
This command gives a matrix with the elements of list on the leading diagonal, and 0 elsewhere. Another important command is
the identity matrix (which is a particular case of the diagonal matrix when all diagonal elements are zeroes):
IdentityMatrix[ n ]
Here n is the dimension of the matrix. For example, the identity matrix in 3D space is
That is, \( {\bf I}_n = \left[ \delta_{ij} \right] , \) in which δ_{ij} is the
Kronecker delta (which is zero when \( i \ne j \) and 1 otherwise). If size is clear from
context, we write I in place of I_{n}.
Before we can discuss arithmetic operations for matrices, we have to define equality for matrices. Two matrices are equal if they have the same size and their corresponding elements are equal. A matrix with elements that are all 0's is called a zero or null matrix.
A null matrix usually is indicated as 0.
As we will see, a matrix can represent many different things. However, in this part, we will focus on how matrices can be used to represent systems of linear equations.
Any \( m \times n \) matrix can be considered as an array of \( n \) columns
Here the column vector \( {\bf c}_i = \langle a_{1,i} , a_{2,i} , \ldots , a_{m,i} \rangle^T \) in ith row contains entries of matrix A in ith column.
Correspondingly, the row vector \( {\bf r}_j = \langle a_{j,1} , a_{j,2} , \ldots , a_{j,n} \rangle \) in jth column contains entries of matrix A in jth row.
Before we can discuss arithmetic operations for matrices, we have to define equality
for matrices. Two matrices are equal if they have the same size and their corresponding
elements are equal. A matrix with elements that are all 0's is called a zero or null matrix. A null matrix usually is indicated as 0.
Matrices are fundamental objects in linear algebra, so
there are a variety of ways to construct a matrix in Mathematica.
Generally, you need to specify what types of entries the matrix
contains (more on that to come), the number of rows and columns,
and the entries themselves. First, let's dissect an example:
Example 1:
Our first example deals with economics. Let us consider two families Anderson (A) and Boichuck (B) that have expenses every month
such as: utilities, health, entertainment, food, etc... . Let us restrict ourselves to: food, utilities, and health.
How would one represent the data collected? Many ways are available but one of them has an advantage of combining the data so that it is easy to manipulate them.
Indeed, we will write the data as follows:
The size of the matrix, as a block, is defined by the number of Rows and the number of Columns.
In this case, the above matrix has 2 rows and 3 columns. In our case, we say that the matrix is a \( m \times n \) matrix (pronounce m-by-n matrix).
Keep in mind that the first entry (meaning m) is the number of rows while the second entry (n) is the number of columns. Our above matrix is a (2 × 3) matrix.
Let us assume, for example, that the matrices for the months of July, August, and September are
respectively. The next question may sound easy to answer, but requires a new concept in the matrix context.
What is the matrix-expense for the two families for the summer? The idea is to add the three matrices above by adding the corresponding entries:
Example 2: Let \( \texttt{D}\,:\,P_3 (\mathbb{R}) \to P_2 (\mathbb{R}) \)
be the linear differential operator from the set of polynomials of degree 3 into the set of polynomial of degree 2.
This operator acts as \( \texttt{D}\,p = p' , \) for any polynomial
\( p(x) = p_0 + p_1 x + p_2 x^2 + p_3 x^3 . \) Let \( \beta =
\left\{ 1, x, x^2 , x^3 \right\} \) and \( \gamma = \left\{ 1, x, x^2 \right\} \)
be the standard ordered bases for P_{3} and
P_{2}, respectively. Then to the differential
operator corresponds the matrix
Therefore, \( \texttt{D} \) is the 3×4 matrix.
On the other hand, the antiderivative operator
\( \texttt{D}^{-1} \) that assigns to a polynomial
its indefinite integral, which ignores to add a constant of integration, has
the matrix representation:
Therefore, \( \texttt{D}^{-1} \) is the 4×3 matrix, mapping a polynomial of the second degree into a polynomial of the third degree
with respect to chosen previously bases β and γ. Their product is
However, the reverse product \( \texttt{D} \,\texttt{D}^{-1} \) does not exist.
■
Now we introduce addition/subtraction to the set of all rectangular matrices
as well as multiplication by a scalar (real or complex number) so that the
resulting set becomes the vector space. Namely, we consider the set of
rectangular matrices with m rows and n columns; it is natural to
denote this set as M_{m,n}. In the the set of m×n matrices, we
impose addition/subtraction by adding (or subtracting) the corresponding
entries:
where \( {\bf A} = \left[ a_{i,j} \right] , \quad {\bf B} = \left[ b_{i,j} \right] . \)
Multiplication by a scalar (real or complex number) r is obvious:
Therefore, to multiply by a scalar, one needs to multiply by this scalar every entry. With these two operations (addition
and multiplication by a scalar), the set M_{m,n} becomes a vector space.
The negative of a matrix M, denoted by \( -{\bf M} ,\) is a matrix with
elements that are the negatives of the elements in M.
It should be emphasized that matrices with different dimensions cannot be added/subtracted. Therefore, row-vectors
cannot be added to column-vectors. ■
Now we are going to introduce matrix multiplication that may at first seem rather strange. We don't
know exactly by whom or when the multiplication of matrices was invented. At least we know that the
work of 1812 by Jacques Philippe Marie Binet (1786--1856) contains the definition of the product of matrices.
Jacques Binet was a French mathematician, physicist and astronomer who made significant contributions to number theory,
and the mathematical foundations of matrix algebra which would later lead to important generalizations by Cayley,
Sylvester, and others. In his memoir on the theory of the conjugate axis and of the moment of inertia of bodies he
enumerated the principle now known as Binet's theorem.
When two linear maps are represented by matrices, then the matrix product
represents the composition of the two maps. This concept is due to the German mathematician Gotthold Eisenstein
(1823--1852), a student of Carl Gauss. Despite that Gotthold was born to the Jewish family, they converted from
Judaism to become Protestants. His father served in the Prussian army for eight years. Gotthold mathematical talents
were recognized early and he became a student of Carl Gauss. Gotthold also showed a considerable talent for music from a young age and he played the
piano and composed music throughout his life. Eisenstein introduced the notation
\( {\bf A} \times {\bf B} \) to denote the matrix multiplication around 1844. The idea
was then expanded on and formalized by Carley in his Memoir on the Theory of Matrices
that was published in 1858. However, Gotthold, who survived childhood meningitis, suffered bad health his entire life.
Let \( {\bf A} = \left[ a_{ij} \right] \) be an m×n matrix, and let
\( {\bf B} = \left[ b_{ij} \right] \) be an n×s matrix. The matrix product A B is the
m×s matrix \( {\bf C} = \left[ c_{ij} \right] , \) where c_{ij} is the
dot product of the i-th row vector of A and the j-th column vector of B:
In order to multiply A by B from the right, the number of columns of A must be equal to the number of rows of B.
If we write \( {\bf B} = \left[ {\bf b}_{1} \ {\bf b}_2 \ \cdots \ {\bf b}_n \right] \) as
a \( 1 \times n \) array of its columns, then matrix multiplication says that
For handy reference, we list the basic properties of matrix operations. These properties are valid for all vectors,
scalars, and matrices for which the indicated quantities are defined. It should be noted that matrix multiplication
is not necessarily commutative.
A + B = B + A; Commutative law of addition
(A + B) + C = A + (B + C); Associative law of addition
A + 0 = 0 + A = A; Identity for addition
r (A + B) = r A + r B; A left distributive law
(r + s) A = r A + s A; A right distributive law
(rs) A = r (sA); Associative law of scalar multiplication
(r A) B) = A (r B) = r (AB); Scalars pull through
A (BC) = (AB) C; Associative law of matrix multiplication
I A = A and B I = B; Identity for matrix multiplication
A (B + C)= AB + AC; A left distributive law
(A + B) C = AC + BC; A right distributive law
Example 4:
Let us consider two 3×4 matrices from the previous example:
These matrices can be added/subtracted, but not multiplied because they do not have matching dimensions. However, if
we consider the transpose 4×3 matrix
The set ℳ_{m,n} of all m × n matrices under the field of either real or complex numbers is a vector space of dimension m · n.
In order to determine how close two matrices are, or to define the convergence of sequences of matrices, a special concept of matrix norm is employed, with notation \( \| {\bf A} \| . \) A norm
is a function from a real or complex vector space to the nonnegative real numbers that satisfies the following conditions:
Positivity: ‖A‖ ≥ 0,
‖A‖ = 0 iff A = 0.
Homogeneity: ‖kA‖ = |k| ‖A‖ for arbitrary scalar k.
Triangle inequality: ‖A + B‖ ≤ ‖A‖ + ‖B‖.
The norm of a matrix may be thought of as its magnitude or length because it is a nonnegative number. There are known three kinds of matrix norms:
The operator norms are norms induced by a matrix considered as a linear operator from ℝ^{m} into ℝ^{n} for real scalars or ℂ^{m} into ℂ^{n} for complex scalars.
The entrywise norms treat an m-by-n matrix as the vector of length m · n. Therefore, these norms are directly related to norms in a vector space.
The norm notation ‖ · ‖ is heavily overloaded, obliging readers to disambiguate norms by paying close attention to the linguistic type and context of each norm’s argument. Therefore, the main norm notation ‖ · ‖ is subject to some indices, depending on author's preference. Moreover, the reader should be aware that different kinds of norms may lead to the same definition (for example, the Euclidean norm is actually the spectral norm in Schatten's sense).
For a rectangular m-by-n matrix A and given norms \( \| \ \| \)
in \( \mathbb{R}^n \mbox{ and } \mathbb{R}^m , \) the norm of A is defined as follows:
This matrix norm is called the operator norm or induced norm.
The term "induced" refers to the fact that the definition of a norm for vectors such as A x and x is what enables the definition above of a matrix norm.
This definition of matrix norm is not computationally friendly, so we use other options. The following popular norms are listed below.
For a rectangular m×n matrix A, there are known the following operator norms.
‖A‖₁ is the maximum absolute column sum of the matrix:
‖A‖₂
is the Euclidean norm, the greatest singular value of
A, which is the square root of the greatest eigenvalue of \( {\bf A}^{\ast} {\bf A} , \)
i.e., its spectral radius
where \( \lambda_{\max} \left( {\bf A}^{\ast} {\bf A} \right) \) is is the maximal
eigenvalue of \( {\bf A}^{\ast} {\bf A} , \) also is known as the spectral radius, and
\( \sigma_{\max} \left( {\bf A} \right) \) is the maximal singular value of A.
The induced matrix norms constitute a large and important part of possible matrix norms---there are known many non-induced norms.
The following very important non-induced norm is called after Ferdinand Georg Frobenius (1849--1917).
The Frobenius
norm\( \| \cdot \|_F : \mathbb{C}^{m\times n} \to \mathbb{R}_{+} \) is defined for a rectangular m-by-n matrix A by
where A* is the adjoint matrix to A.
Recall that the trace function returns the sum of diagonal entries of a square matrix.
Mathematica has a dedicated command for evaluating the Frobenius norm: Norm[A, "Frobenius"]
One can think of the Frobenius norm as taking the columns of the matrix, stacking them on top of each other to create a vector of size m×n, and then taking the vector 2-norm of the result.
The Frobenius norm is the matrix norm that is unitary invariant, i.e., it is conserved or invariant under a unitary transformation (such as a rotation).
For a norm to be unitary invariant, it should depend solely upon the singular values of the matrix.
So if B = R*A R with a unitary (orthogonal if real) matrix R satisfying R* = R^{-1}, then
Besides the Frobenius norm, there are known other non-induced norms that treat an m-by-n matrix as the vector of length m · n. For example, the following "entrywise" norms are also widely used.
There is this unfortunate but unavoidable overuse of the notation.
\( \| {\bf A} \|_1 \) is the absolute sum of all elements of A:
Robert Schatten (1911--1977) suggested to define a matrix norm based on the singular values σ_{i} or eigenvalues. Therefore, these norms are called after him. We present three of them, with overlapping previously used notations.
We denote by r the rank of the rectangular m×n matrix A.
Again, we double check this answer with two approaches. First, we calculate the sum of eigenvalues of matrices A₃ and A₄:
N[Sum[Sqrt[Eigenvalues[A4][[i]]], {i, 1, 4}]]
15.556136510657433
and then repeat with matrix A₃:
N[Sum[Sqrt[Eigenvalues[A3][[i]]], {i, 1, 3}]]
15.556136510657433
Since the sum of eigenvalues of a square matrix is its trace, we find first of of the roots of matrix A₃. To determine its square root, we use Sylvester's method:
Then find its eigenvalues and add them. You may want to verify that the sum of absolute values of eigenvalues of 3×3 matrix A₃ is exactly the same as the corresponding sum for 4×4 matrix A₄.
The square of the Schatten 2-norm is the square of the Frobenius norm