Matrices

We usually denote vectors with lower case letters while matrices with upper case letters. Say we define a \( 2\times 3 \) matrix (with two rows and three columns) as
A ={{1,2,3},{-1,3,0}}
However, to see the traditional form of the matrix on the screen, one needs to add a corresponding command, either TraditionalForm or MatrixForm. These special commands tell Mathematica that the output should be displayed with the elements of list arranged in a regular array:
A ={{1,2,3},{-1,3,0}} // MatrixForm
Out[1]= \( \begin{pmatrix} 1&2&3 \\ -1&3&0 \end{pmatrix} \)

So we see that a is a column vector, which is a matrix of dimension \( 3 \times 1 ,\) while b is a row vector, which is a matrix of dimension \( 1 \times 3 .\) When we multiply the matrix A by vector a from left or right, Mathematica treats this vector either as a \( 3 \times 1 \) matrix or as a \( 1 \times 3 \) vector:

A ={{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}
a = {1, 0, 2};
A.a
Out[7]= {7, 16, 25}
a.A
Out[8]= {15, 18, 21}
However, when we apply the matrix A to the vector b, Mathematica will not accept multiplication from the right:
b.A
Out[9]= {{15, 18, 21}}
A.b
Dot: Tensors {{1,2,3},{4,5,6},{7,8,9}} and {{1,0,2}} have incompatible shapes.

James Sylvester.
Matrices are both a very ancient and a very current mathematical concept. "Matrix" is the Latin word for womb. References to matrices and systems of equations can be found in Chinese manuscripts dating back to around 200 B.C. The term matrix was first used by the English mathematician James Sylvester (1814--1897), who defined the term in 1850. James Joseph Sylvester also coined many mathematical terms or used them in a "new or unusual ways" mathematically such as graph, discriminant, annihilator, canonical form, minor, nullity, and many others. Over the years, mathematicians and scientists have found many applications of matrices. More recently, the advent of personal and large-scale computers has increased the use of matrices in a wide variety of applications.

James Sylvester (his original name was James Joseph, and he adopted a new family name "Sylvester" following his brother who dwelled in the US under the name "Sylvester") was born into a Jewish family in London, and was to become one of the supreme algebraists of the nineteen century. At the age of 14, Sylvester was a student of Augustus De Morgan at the University of London. His family withdrew him from the University after he was accused of stabbing a fellow student with a knife. Despite having studied for several years at St John's College, Cambridge, he was not permitted to take his degree there because he "professed the faith in which the founder of Christianianity was educated." Therefore, he received his degrees from Trinity College, Dublin.

In 1841, James moved to the United States to become a professor of mathematics at the University of Virginia, but left after less than four months following a violent encounter with two students he had disciplined. He moved to New York City and began friendships with the Harvard mathematician Benjamin Peirce and the Princeton physicist Joseph Henry. However, he left in November 1843 after being denied appointment as Professor of Mathematics at Columbia College (now University), again for his Judaism, and returned to England. He was hired in 1844 by the Equity and Law Life Assurance Society for which he developed successful actuarial models and served as de facto CEO, a position that required a law degree. In 1872, he finally received his B.A. and M.A. from Cambridge, having been denied the degrees due to his being a Jew. In 1876 Sylvester again crossed the Atlantic Ocean to become the inaugural professor of mathematics at the new Johns Hopkins University in Baltimore, Maryland. Sylvester was an avid poet, prefacing many of his mathematical papers with examples of his work.

A matrix (plural matrices) is a rectangular array of numbers, functions, or any symbols. It can be written as

\[ {\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] \qquad \mbox{or} \qquad {\bf A} = \left( \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right) . \]

We denote this array by a single letter A (usually a capital boldfaced letter) or by \( \left( a_{i,j} \right) \) or \( \left[ a_{i,j} \right] , \) depending what notation (parenthesis or brackets) is in use. The symbol \( a_{i,j} ,\) or sometimes \( a_{ij} ,\) in the ith row and jth column is called the \( \left( i, \, j \right) \) entry. We say that A has m rows and n columns, and that it is an \( m \times n \) matrix or m-by-n matrix, while m and n are called its dimensions. We also refer to A as a matrix of size \( m \times n . \) Matrices with a single row are called row vectors, and those with a single column are called column vectors. Here is an example of 4×6 matrix

\[ {\bf M} = \begin{bmatrix} \phantom{-}1&\phantom{-}2&3&4&5&6 \\ \phantom{-}2&\phantom{-}9&8&7&6&5 \\ -7&\phantom{-}8&9&6&4&4 \\ \phantom{-}4&-1&6&7&8&2 \end{bmatrix} \]
that has 4 rows and 6 columns. Its third row is
\[ \left[ -7 \ -8 \ 9\ 6\ 4 \ 4 \right] . \]
The fourth column of this matrix is
\[ \begin{bmatrix} 4 \\ 7 \\ 6 \\ 7 \end{bmatrix} . \]
A matrix with the same number of rows and columns is called a square matrix. In particular, a square matrix having all elements equal to zero except those on the principal diagonal is called a diagonal matrix. Mathematica has two dedicated commands:

DiagonalMatrix[ list ]
This command gives a matrix with the elements of list on the leading diagonal, and 0 elsewhere. Another important command is the identity matrix (which is a particular case of the diagonal matrix when all diagonal elements are zeroes):
IdentityMatrix[ n ]
Here n is the dimension of the matrix. For example, the identity matrix in 3D space is
\[ {\bf I}_3 = \begin{bmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{bmatrix} = {\bf I} . \]
That is, \( {\bf I}_n = \left[ \delta_{ij} \right] , \) in which δij is the Kronecker delta (which is zero when \( i \ne j \) and 1 otherwise). If size is clear from context, we write I in place of In.

Before we can discuss arithmetic operations for matrices, we have to define equality for matrices. Two matrices are equal if they have the same size and their corresponding elements are equal. A matrix with elements that are all 0's is called a zero or null matrix. A null matrix usually is indicated as 0.

As we will see, a matrix can represent many different things. However, in this part, we will focus on how matrices can be used to represent systems of linear equations. Any \( m \times n \) matrix can be considered as an array of \( n \) columns

\[ {\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] = \left[ \left( \begin{array}{c} a_{1,1} \\ a_{2,1} \\ \vdots \\ a_{m,1} \end{array} \right) , \ \left( \begin{array}{c} a_{1,2} \\ a_{2,2} \\ \vdots \\ a_{m,2} \end{array} \right) , \ \cdots \left( \begin{array}{c} a_{1,n} \\ a_{2,n} \\ \vdots \\ a_{m,n} \end{array} \right) \right] = \left[ {\bf c}_1 , {\bf c}_2 , \ldots {\bf c}_n \right] , \]
or as a collection of m rows
\[ {\bf A} = \left[ \begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right] = \left( \begin{array}{cccc} \langle a_{1,1} , a_{1,2} , \cdots , a_{1,n} \rangle \\ \langle a_{2,1} , a_{2,2} , \cdots , a_{2,n} \rangle \\ \vdots \\ \langle a_{m,1} , a_{m,2} , \cdots , a_{m,n} \rangle \end{array} \right) = \left[ \begin{array}{c} {\bf r}_1 \\ {\bf r}_2 \\ \vdots \\ {\bf r}_m \end{array} \right] . \]
Here the column vector \( {\bf c}_i = \langle a_{1,i} , a_{2,i} , \ldots , a_{m,i} \rangle^T \) in ith row contains entries of matrix A in ith column. Correspondingly, the row vector \( {\bf r}_j = \langle a_{j,1} , a_{j,2} , \ldots , a_{j,n} \rangle \) in jth column contains entries of matrix A in jth row.

Before we can discuss arithmetic operations for matrices, we have to define equality for matrices. Two matrices are equal if they have the same size and their corresponding elements are equal. A matrix with elements that are all 0's is called a zero or null matrix. A null matrix usually is indicated as 0.

Matrices are fundamental objects in linear algebra, so there are a variety of ways to construct a matrix in Mathematica. Generally, you need to specify what types of entries the matrix contains (more on that to come), the number of rows and columns, and the entries themselves. First, let's dissect an example:

Example 1: Our first example deals with economics. Let us consider two families Anderson (A) and Boichuck (B) that have expenses every month such as: utilities, health, entertainment, food, etc... . Let us restrict ourselves to: food, utilities, and health. How would one represent the data collected? Many ways are available but one of them has an advantage of combining the data so that it is easy to manipulate them. Indeed, we will write the data as follows:

\[ \mbox{Month} = \begin{bmatrix} \mbox{Family} & \mbox{Food} & \mbox{Utilities} & \mbox{Entertainment} \\ \mbox{A} & f_1 & u_1 & e_1 \\ \mbox{B} & f_2 & u_2 & e_2 \end{bmatrix} . \]
If we have no problem confusing the names of the families and what the expenses are, then we may record them in matrix form:
\[ \mbox{Month} = \begin{bmatrix} f_1 & u_1 & e_1 \\ f_2 & u_2 & e_2 \end{bmatrix} . \]
The size of the matrix, as a block, is defined by the number of Rows and the number of Columns. In this case, the above matrix has 2 rows and 3 columns. In our case, we say that the matrix is a \( m \times n \) matrix (pronounce m-by-n matrix). Keep in mind that the first entry (meaning m) is the number of rows while the second entry (n) is the number of columns. Our above matrix is a (2 × 3) matrix.

Let us assume, for example, that the matrices for the months of July, August, and September are

\[ {\bf J} = \begin{bmatrix} 650 & 125 & 50 \\ 600 & 150 & 60 \end{bmatrix} , \qquad {\bf A} = \begin{bmatrix} 700 & 250 & 150 \\ 650 & 200 & 80 \end{bmatrix} , \qquad \mbox{and} \qquad {\bf S} = \begin{bmatrix} 750 & 300 & 200 \\ 650 & 275 & 120 \end{bmatrix} , \]
respectively. The next question may sound easy to answer, but requires a new concept in the matrix context. What is the matrix-expense for the two families for the summer? The idea is to add the three matrices above by adding the corresponding entries:
\begin{align*} \mbox{Summer} &= {\bf J} + {\bf A} + {\bf S} = \begin{bmatrix} 21000 & 675 & 400 \\ 20000 & 775 & 260 \end{bmatrix} . \end{align*}

Example 2: Let \( \texttt{D}\,:\,P_3 (\mathbb{R}) \to P_2 (\mathbb{R}) \) be the linear differential operator from the set of polynomials of degree 3 into the set of polynomial of degree 2. This operator acts as \( \texttt{D}\,p = p' , \) for any polynomial \( p(x) = p_0 + p_1 x + p_2 x^2 + p_3 x^3 . \) Let \( \beta = \left\{ 1, x, x^2 , x^3 \right\} \) and \( \gamma = \left\{ 1, x, x^2 \right\} \) be the standard ordered bases for P3 and P2, respectively. Then to the differential operator corresponds the matrix
\[ \texttt{D} = \begin{bmatrix} 0&1&0&0 \\ 0&0&2&0 \\ 0&0&0&3 \end{bmatrix} . \]
Therefore, \( \texttt{D} \) is the 3×4 matrix. On the other hand, the antiderivative operator \( \texttt{D}^{-1} \) that assigns to a polynomial its indefinite integral, which ignores to add a constant of integration, has the matrix representation:
\[ \texttt{D}^{-1} = \begin{bmatrix} 0&0&0 \\ 1&0&0 \\ 0&1/2&0 \\ 0&0&1/3 \end{bmatrix} . \]
Therefore, \( \texttt{D}^{-1} \) is the 4×3 matrix, mapping a polynomial of the second degree into a polynomial of the third degree with respect to chosen previously bases β and γ. Their product is
\[ \texttt{D}^{-1} \texttt{D} = \begin{bmatrix} 0&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix} . \]
{{0, 0, 0}, {1, 0, 0}, {0, 1/2, 0}, {0, 0, 1/3}}.{{0, 1, 0, 0}, {0, 0, 2, 0}, {0, 0, 0, 3}}
However, the reverse product \( \texttt{D} \,\texttt{D}^{-1} \) does not exist. ■

Now we introduce addition/subtraction to the set of all rectangular matrices as well as multiplication by a scalar (real or complex number) so that the resulting set becomes the vector space. Namely, we consider the set of rectangular matrices with m rows and n columns; it is natural to denote this set as Mm,n. In the the set of m×n matrices, we impose addition/subtraction by adding (or subtracting) the corresponding entries:

\[ {\bf C} = \left[ c_{i,j} \right] = {\bf A} \pm {\bf B} \qquad \Longleftrightarrow \qquad c_{i,j} = a_{i,j} \pm b_{i,j} , \]
where \( {\bf A} = \left[ a_{i,j} \right] , \quad {\bf B} = \left[ b_{i,j} \right] . \) Multiplication by a scalar (real or complex number) r is obvious:
\[ r\,{\bf A} = r \left[ a_{i,j} \right] = \left[ r\,a_{i,j} \right] . \]

Therefore, to multiply by a scalar, one needs to multiply by this scalar every entry. With these two operations (addition and multiplication by a scalar), the set Mm,n becomes a vector space. The negative of a matrix M, denoted by \( -{\bf M} ,\) is a matrix with elements that are the negatives of the elements in M. It should be emphasized that matrices with different dimensions cannot be added/subtracted. Therefore, row-vectors cannot be added to column-vectors. ■

Example 3: Consider two 3×4 matrices
\[ {\bf A} = \begin{bmatrix} 4&1&4&3 \\ 0&-1&3&2 \\ 1&5&4&-1 \end{bmatrix} \qquad\mbox{and} \qquad {\bf B} = \begin{bmatrix} -2&4&7&3 \\ 1&2&3&4 \\ 3&2&1&0 \end{bmatrix} . \]
Using Mathematica, we can add and subtract these matrices:
A = {{4, 1, 4, 3}, {0, -1, 3, 2}, {1, 5, 4, -1}}
B = {{-2, 4, 7, 3}, {1, 2, 3, 4}, {3, 2, 1, 0}}
A + B
{{2, 5, 11, 6}, {1, 1, 6, 6}, {4, 7, 5, -1}}
A - B
{{6, -3, -3, 0}, {-1, -3, 0, -2}, {-2, 3, 3, -1}}
Finally, we multiply by a scalar:
1.5*A
{{6., 1.5, 6., 4.5}, {0., -1.5, 4.5, 3.}, {1.5, 7.5, 6., -1.5}}


Gotthold Eisenstein

Now we are going to introduce matrix multiplication that may at first seem rather strange. We don't know exactly by whom or when the multiplication of matrices was invented. At least we know that the work of 1812 by Jacques Philippe Marie Binet (1786--1856) contains the definition of the product of matrices. Jacques Binet was a French mathematician, physicist and astronomer who made significant contributions to number theory, and the mathematical foundations of matrix algebra which would later lead to important generalizations by Cayley, Sylvester, and others. In his memoir on the theory of the conjugate axis and of the moment of inertia of bodies he enumerated the principle now known as Binet's theorem.

When two linear maps are represented by matrices, then the matrix product represents the composition of the two maps. This concept is due to the German mathematician Gotthold Eisenstein (1823--1852), a student of Carl Gauss. Despite that Gotthold was born to the Jewish family, they converted from Judaism to become Protestants. His father served in the Prussian army for eight years. Gotthold mathematical talents were recognized early and he became a student of Carl Gauss. Gotthold also showed a considerable talent for music from a young age and he played the piano and composed music throughout his life. Eisenstein introduced the notation \( {\bf A} \times {\bf B} \) to denote the matrix multiplication around 1844. The idea was then expanded on and formalized by Carley in his Memoir on the Theory of Matrices that was published in 1858. However, Gotthold, who survived childhood meningitis, suffered bad health his entire life.

Let \( {\bf A} = \left[ a_{ij} \right] \) be an m×n matrix, and let \( {\bf B} = \left[ b_{ij} \right] \) be an n×s matrix. The matrix product A B is the m×s matrix \( {\bf C} = \left[ c_{ij} \right] , \) where cij is the dot product of the i-th row vector of A and the j-th column vector of B:
\begin{equation} \label{EqMatrix.1} c_{ij} = \sum_{k=1}^n a_{ik} b_{kj} . \end{equation}

In order to multiply A by B from the right, the number of columns of A must be equal to the number of rows of B. If we write \( {\bf B} = \left[ {\bf b}_{1} \ {\bf b}_2 \ \cdots \ {\bf b}_n \right] \) as a \( 1 \times n \) array of its columns, then matrix multiplication says that

\begin{equation} \label{EqMatrix.2} {\bf A}\,{\bf B} = \left[ {\bf A}\,{\bf b}_1 \ {\bf A}\,{\bf b}_2 \ \cdots \ {\bf A}\,{\bf b}_n \right] . \end{equation}
In particular, if B is column-vector x of size n:
\[ {\bf A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \qquad\mbox{and} \qquad {\bf x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} , \]
then we have two options to represent the product Ax as rows or as columns.

\[ {\bf A}\, {\bf x} = \begin{bmatrix} \left( \mbox{row 1} \right) \cdot {\bf x} \\ \left( \mbox{row 2} \right) \cdot {\bf x} \\ \vdots \\ \left( \mbox{row m} \right) \cdot {\bf x} \end{bmatrix} . \]

Multiplication by columns, which is a combination of column vectors of A:

\[ {\bf A}\, {\bf x} = x_1 \, \left( \mbox{ column 1 of } {\bf A} \right) + x_2 \, \left( \mbox{ column 2 of } {\bf A} \right) + \cdots + x_n \, \left( \mbox{ column n of } {\bf A} \right) . \]

For handy reference, we list the basic properties of matrix operations. These properties are valid for all vectors, scalars, and matrices for which the indicated quantities are defined. It should be noted that matrix multiplication is not necessarily commutative.

  1. A + B = B + A;         Commutative law of addition
  2. (A + B) + C = A + (B + C);         Associative law of addition
  3. A + 0 = 0 + A = A;         Identity for addition
  4. r (A + B) = r A + r B;         A left distributive law
  5. (r + s) A = r A + s A;         A right distributive law
  6. (rs) A = r (sA);         Associative law of scalar multiplication
  7. (r A) B) = A (r B) = r (AB);         Scalars pull through
  8. A (BC) = (AB) C;         Associative law of matrix multiplication
  9. I A = A and B I = B;         Identity for matrix multiplication
  10. A (B + C)= AB + AC;         A left distributive law
  11. (A + B) C = AC + BC;         A right distributive law

Example 4: Let us consider two 3×4 matrices from the previous example:

\[ {\bf A} = \begin{bmatrix} 4&1&4&3 \\ 0&-1&3&2 \\ 1&5&4&-1 \end{bmatrix} \qquad\mbox{and} \qquad {\bf B} = \begin{bmatrix} -2&4&7&3 \\ 1&2&3&4 \\ 3&2&1&0 \end{bmatrix} . \]
These matrices can be added/subtracted, but not multiplied because they do not have matching dimensions. However, if we consider the transpose 4×3 matrix
\[ {\bf B}^{\mathrm T} = \begin{bmatrix} -2&1&3 \\ 4&2&2 \\ 7&3&1 \\ 3&4&0 \end{bmatrix} , \]
we will be able to multiply it from the left and right by matrix A:
\[ {\bf A}\, {\bf B}^{\mathrm T} = \begin{bmatrix} 33&30&18 \\ 23&15&1 \\ 43&19&17 \end{bmatrix} \qquad\mbox{and}\qquad {\bf B}^{\mathrm T}{\bf A} = \begin{bmatrix} -5&12&7&-7 \\ 18&12&30&14 \\ 29&9&41&26 \\ 12&-1&24&17 \end{bmatrix} \]
So we see that \( {\bf A}\, {\bf B}^{\mathrm T} \) is the 3×3 matrix, but \( {\bf B}^{\mathrm T} {\bf A} \) is the 4×4 matrix. Mathematica confirms:
A.Transpose[B]
{{33, 30, 18}, {23, 15, 1}, {43, 19, 17}}
Transpose[B].A
{{-5, 12, 7, -7}, {18, 12, 30, 14}, {29, 9, 41, 26}, {12, -1, 24, 17}}
We can also represent BT as an arrow of column vectors:
\[ {\bf B}^{\mathrm T} = \begin{bmatrix} -2&1&3 \\ 4&2&2 \\ 7&3&1 \\ 3&4&0 \end{bmatrix} = \left[ \begin{bmatrix} -2 \\ 4 \\ 7 \\ 3 \end{bmatrix} , \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} , \begin{bmatrix} 3 \\ 2 \\ 1 \\ 0 \end{bmatrix} \right] \]
Then:
\[ {\bf A}\, {\bf B}^{\mathrm T} = \left[ {\bf A} \begin{bmatrix} -2 \\ 4 \\ 7 \\ 3 \end{bmatrix} , {\bf A} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} , {\bf A} \begin{bmatrix} 3 \\ 2 \\ 1 \\ 0 \end{bmatrix} \right] = \begin{bmatrix} 33&30&18 \\ 23&15&1 \\ 43&19&17 \end{bmatrix} \]
because:
\[ {\bf A}\begin{bmatrix} -2 \\ 4 \\ 7 \\ 3 \end{bmatrix} = \begin{bmatrix} 33 \\ 23 \\ 43 \end{bmatrix} , \quad {\bf A}\begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 30 \\ 15 \\ 19 \end{bmatrix} , \quad {\bf A}\begin{bmatrix} 3 \\ 2 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 18 \\ 1 \\ 17 \end{bmatrix} . \]

http://www.math.chalmers.se/~larisa/www/NumLinAlg/Lecture3_2019.pdf

The set ℳm,n of all m × n matrices under the field of either real or complex numbers is a vector space of dimension m  · n. In order to determine how close two matrices are, or to define the convergence of sequences of matrices, a special concept of matrix norm is employed, with notation \( \| {\bf A} \| . \) A norm is a function from a real or complex vector space to the nonnegative real numbers that satisfies the following conditions:

  • Positivity:     ‖A‖ ≥ 0,     ‖A‖ = 0 iff A = 0.
  • Homogeneity:     ‖kA‖ = |k| ‖A‖ for arbitrary scalar k.
  • Triangle inequality:     ‖A + B‖ ≤ ‖A‖ + ‖B‖.
The norm of a matrix may be thought of as its magnitude or length because it is a nonnegative number. There are known three kinds of matrix norms:
  • The operator norms are norms induced by a matrix considered as a linear operator from ℝm into ℝn for real scalars or ℂm into ℂn for complex scalars.
  • The entrywise norms treat an m-by-n matrix as the vector of length m · n. Therefore, these norms are directly related to norms in a vector space.
  • The Schatten norms are based on the singular values σi or eigenvalues of the given matrix.
The norm notation ‖ · ‖ is heavily overloaded, obliging readers to disambiguate norms by paying close attention to the linguistic type and context of each norm’s argument. Therefore, the main norm notation ‖ · ‖ is subject to some indices, depending on author's preference. Moreover, the reader should be aware that different kinds of norms may lead to the same definition (for example, the Euclidean norm is actually the spectral norm in Schatten's sense).
For a rectangular m-by-n matrix A and given norms \( \| \ \| \) in \( \mathbb{R}^n \mbox{ and } \mathbb{R}^m , \) the norm of A is defined as follows:
\begin{equation} \label{EqMatrix.3} \| {\bf A} \| = \sup_{{\bf x} \ne {\bf 0}} \ \dfrac{\| {\bf A}\,{\bf x} \|_m}{\| {\bf x} \|_n} = \sup_{\| {\bf x} \| = 1} \ \| {\bf A}\,{\bf x} \| . \end{equation}
This matrix norm is called the operator norm or induced norm.

The term "induced" refers to the fact that the definition of a norm for vectors such as A x and x is what enables the definition above of a matrix norm. This definition of matrix norm is not computationally friendly, so we use other options. The following popular norms are listed below.

For a rectangular m×n matrix A, there are known the following operator norms.
  • A‖₁ is the maximum absolute column sum of the matrix:
    \begin{equation} \label{EqMatrix.4} \| {\bf A} \|_1 = \max_{1 \le j \le n} \ \sum_{i=1}^m |a_{ij}| . \end{equation}
  • A‖₂ is the Euclidean norm, the greatest singular value of A, which is the square root of the greatest eigenvalue of \( {\bf A}^{\ast} {\bf A} , \) i.e., its spectral radius
    \begin{equation} \label{EqMatrix.5} \| {\bf A} \|_2 = \sigma_{\max} \left( {\bf A} \right) = \sqrt{\lambda_{\max} \left( {\bf A}^{\ast} {\bf A} \right)} \end{equation}
    where \( \lambda_{\max} \left( {\bf A}^{\ast} {\bf A} \right) \) is is the maximal eigenvalue of \( {\bf A}^{\ast} {\bf A} , \) also is known as the spectral radius, and \( \sigma_{\max} \left( {\bf A} \right) \) is the maximal singular value of A.
  • A is maximum absolute row sum:
    \begin{equation} \label{EqMatrix.6} \| {\bf A} \|_{\infty} = \| {\bf A}^{\ast} \|_{1} = \max_{1 \le i \le m} \, \sum_{j=1}^n |a_{ij} | . \end{equation}
Mathematica has dedicated commands for evaluating the operator norms:
  • Norm[A, 1] for evaluating ‖ · ‖₁;
  • Norm[A] = Norm[A, 2] for evaluating the Euclidean norm;
  • Norm[A, Infinity] for evaluating ‖ · ‖.
Note that to a rectangular m×n matrix A corresponds a self-adjoint square (m+n)×(m+n) matrix
\[ {\bf B} = \begin{bmatrix} {\bf 0} & {\bf A}^{\ast} \\ {\bf A} & {\bf 0} \end{bmatrix} . \]

Theorem 1: For arbitrary square n × n matrices A and B, any norm ‖ · ‖, and a scalar k , we have

\[ \begin{split} \| {\bf A}\,{\bf x} \| &\le \| {\bf A} \| \,\| {\bf x} \| , \\ \| {\bf A}\,{\bf B} \| &\le \| {\bf A} \|\,\|{\bf B} \| , \\ \| {\bf A} + {\bf B} \| &\le \| {\bf A}\| + \| {\bf B} \| , \\ \left\vert \| {\bf A} - {\bf B}\|\right\vert &\le \| {\bf A} - {\bf B} \| , \\ \| k\,{\bf A} \| &= |k|\, \| {\bf A} \| . \end{split} \]

The induced matrix norms constitute a large and important part of possible matrix norms---there are known many non-induced norms. The following very important non-induced norm is called after Ferdinand Georg Frobenius (1849--1917).

The Frobenius norm \( \| \cdot \|_F : \mathbb{C}^{m\times n} \to \mathbb{R}_{+} \) is defined for a rectangular m-by-n matrix A by
\begin{equation} \label{EqMatrix.7} \| {\bf A} \|_F = \left( \sum_{i=1}^m \,\sum_{j=1}^n |a_{ij} |^2 \right)^{1/2} = \left( \mbox{tr}\, {\bf A} \,{\bf A}^{\ast} \right)^{1/2} = \left( \mbox{tr}\, {\bf A}^{\ast} {\bf A} \right)^{1/2} , \end{equation}
where A* is the adjoint matrix to A. Recall that the trace function returns the sum of diagonal entries of a square matrix.
Mathematica has a dedicated command for evaluating the Frobenius norm:
Norm[A, "Frobenius"]

One can think of the Frobenius norm as taking the columns of the matrix, stacking them on top of each other to create a vector of size m×n, and then taking the vector 2-norm of the result. The Frobenius norm is the matrix norm that is unitary invariant, i.e., it is conserved or invariant under a unitary transformation (such as a rotation). For a norm to be unitary invariant, it should depend solely upon the singular values of the matrix. So if B = R*A R with a unitary (orthogonal if real) matrix R satisfying R* = R-1, then

\[ \| {\bf B} \|_F^2 = \mbox{tr} \left( {\bf B}^{\ast} {\bf B} \right) = \mbox{tr} \left[ \left( {\bf R}^{\ast} {\bf A\,R} \right)^{\ast} \left( {\bf R}^{\ast} {\bf A\,R} \right) \right] = \mbox{tr} \left( \right) = \mbox{tr} \left( {\bf A}^{\ast} {\bf A} \right) = \| {\bf A} \|_F^2 . \]

There is also another function that that provides infinum of all norms of a square matrix: \( \rho ({\bf A}) \le \|{\bf A}\| . \)

The spectral radius of a square matrix A is
\begin{equation} \label{EqBasic.8} \rho ({\bf A}) = \lim_{k\to \infty} \| {\bf A}^k \|^{1/k} = \max \left\{ |\lambda | : \ \lambda \mbox{ is eigenvalue of }\ {\bf A} \right\} . \end{equation}
Besides the Frobenius norm, there are known other non-induced norms that treat an m-by-n matrix as the vector of length m · n. For example, the following "entrywise" norms are also widely used. There is this unfortunate but unavoidable overuse of the notation.

  • \( \| {\bf A} \|_1 \) is the absolute sum of all elements of A:
    \begin{equation} \label{EqMatrix.7a} \| {\bf A} \|_1 = \sum_{i=1}^m \,\sum_{j=1}^n |a_{ij} | . \end{equation}
  • \( \| {\bf A} \|_{\max} \) is the maximum norm, the maximum absolute value among all mn elements of A:
    \begin{equation} \label{EqMatrix.9a} \| {\bf A} \|_{\max} = \max_{i,j} \,|a_{ij} | . \end{equation}

Robert Schatten (1911--1977) suggested to define a matrix norm based on the singular values σi or eigenvalues. Therefore, these norms are called after him. We present three of them, with overlapping previously used notations. We denote by r the rank of the rectangular m×n matrix A.

  • \( \| {\bf A} \|_1 \) is the trace norm:
    \[ \| {\bf A} \|_1 = \sum_{i=1}^r \sigma_i = \sum_{i=1}^r \sqrt{\lambda_i} \left[ = \mbox{trace} \left( \sqrt{{\bf A}^{\ast} {\bf A}} \right) \right] . \]
  • The last formula is for disposition because a square root of a matrix may not exit, but when it exists, it is not unique.
  • \( \| {\bf A} \|_2 = \| {\bf A} \|_F \) is the Frobenius norm.
  • \[ \| {\bf A} \|_2 = \sqrt{ \sum_{i=1}^r \sigma_i^2 } = \sqrt{ \sum_{i=1}^r \lambda_i } = \| {\bf A} \|_F . \]
  • \( \| {\bf A} \|_{\infty} \) is the spectral norm (the spectral radius of A*A):
    \[ \| {\bf A} \|_{\infty} = \max \left\{ \sigma_1 , \ldots , \sigma_n \right\} = \sigma_{\max} \left( {\bf A} \right) = \sqrt{\lambda_{\max} \left( {\bf A}^{\ast} {\bf A} \right)} . \]

Theorem 2: For m×n matrix A of rank r, the following inequalities hold

  • \[ \| {\bf A} \|_2^2 \le \| {\bf A} \|_1 \| {\bf A} \|_{\infty} . \]
  • \[ \| {\bf A} \|_2 \le \| {\bf A} \|_F \le \sqrt{r} \, \| {\bf A} \|_{2} . \]
  • \[ \| {\bf A} \|_{\max} \le \| {\bf A} \|_2 \le \sqrt{mn} \, \| {\bf A} \|_{\max} .\]
  • \[ \frac{1}{\sqrt{n}} \,\| {\bf A} \|_{\infty} \le \| {\bf A} \|_2 \le \sqrt{m} \, \| {\bf A} \|_{\infty} . \]
  • \[ \frac{1}{\sqrt{m}} \,\| {\bf A} \|_{1} \le \| {\bf A} \|_2 \le \sqrt{n} \, \| {\bf A} \|_{1} . \]


Example 5: Consider the matrix from the previous example:
\[ {\bf A} = \begin{bmatrix} 4&\phantom{-}1&4&\phantom{-}3 \\ 0&-1&3&\phantom{-}2 \\ 1&\phantom{-}5&4&-1 \end{bmatrix} . \]
Summing down the columns of A, we find that
\begin{align*} \| {\bf A} \|_{1} &= \max_{1 \le j \le 4}\, \sum_{i=1}^3 |a_{ij}| \\ &= \max \left\{ 4+0+1, \ 1+1+5 , \ 4+3+4 , \ 3+2+1 \right\} \\ &= \max \left\{ 5, 7, 11, 6 \right\} = 11 . \end{align*}
This answer is checked with Mathematica:
A = {{4, 1, 4, 3}, {0, -1, 3, 2}, {1, 5, 4, -1}};
Norm[A, 1]
11
Now we find its infinity-norm. Summing along the rows of A, we find that
\begin{align*} \| {\bf A} \|_{\infty} &= \max_{1 \le i \le 3}\, \sum_{i=1}^3 |a_{ij}| \\ &= \max \left\{ 4+1+4+3, 1+3+2, 1+5+4+1 \right\} = \max \left\{ 12, 6, 11 \right\} \\ &= 12 \end{align*}
Norm[A, Infinity]
12
The same answer is obtained when transposed matrix is used:
A = {{4, 1, 4, 3}, {0, -1, 3, 2}, {1, 5, 4, -1}};
Norm[Transpose[A], 1]
12
For the Euclidean norm, we need to calculate the eigenvalues of the products A*A and AA*:
A4 = Transpose[A].A
A3 = A.Transpose[A]
\[ {\bf A}_4 = {\bf A}^{\ast} {\bf A} = \begin{bmatrix} 17&9&20&11 \\ 9&27&21&-4 \\ 20&21&41&14 \\ 11&-4&14&14 \end{bmatrix} , \qquad {\bf A}_3 = {\bf A}\,{\bf A}^{\ast} = \begin{bmatrix} 42&17 & 22 \\ 17&14& 5 \\ 11& 5 & 43 \end{bmatrix} . \]
Both matrices A₃ and A₄ have the same trace (sum of diagonal entries) 99, and the same largest eigenvalue:
Max[N[Eigenvalues[A4]]]
Max[N[Eigenvalues[A3]]]
68.9783
Take the square root, we obtain the Euclidean norm:
\[ \| {\bf A} \|_2 = \sigma_{\max} \left( {\bf A} \right) = \sqrt{\lambda_{\max} \left( {\bf A}^{\ast} {\bf A} \right)} \approx 8.30532 . \]
euclid = N[Sqrt[Max[Eigenvalues[A3]]]]
8.30532017177966
We double check our answer with Mathematica by evaluating the maximum singular value first,
N[Max[SingularValueList[A]]]
8.30532
and then by finding the norm of matrix A:
N[Norm[A]]
8.30532

 

Now we turn to non-induced norms. To determine the Frobenius norm, we use its definition.
square = 0;
For[i = 1, i <= 3, i++, For[j = 1, j <= 4, j++, square = square + A[[i, j]]^2]]
Then we extract the square root:
Sqrt[square]
3 Sqrt[11]
N[%]
9.9498743710662
To calculate the Frobenius norm, we again ask Mathematica.
Norm[A, "Frobenius"]
3 Sqrt[11]
N[%]
9.94987

To determine the "entrywise" 1-norm, we add absolute values of all matrix entries:

\[ \| {\bf A} \|_1 = \sum_{i,j} \left\vert a_{i,j} \right\vert = 4+1+4+2 +1+3+2+1+5+4+1 = 12+6+11 = 29 . \]
Sum[ Sum[Abs[A[[i, j]]], {j, 1, 4}], {i, 1, 3}]
29
Its maximum norm is
\[ \| {\bf A} \|_{\max} = \max_{i,j} \left\vert a_{i,j} \right\vert = 5 . \]
We find the spectral norm with the aid of Mathematica:
N[SingularValueList[A]]
{8.30532, 4.99188, 2.25894}
Therefore, the spectral norm is 8.30532. To check the answer, we calculate the self-adjoint matrix
\[ {\bf A}^{\ast} {\bf A} = \begin{bmatrix} 17&9&20&11 \\ 9&27&21&-4 \\ 20&21&41&14 \\ 11&-4&14&14 \end{bmatrix} = {\bf A}_4 . \]
Its eigenvalues are obtained with Mathematica:
A4 = Transpose[A].A
{{17, 9, 20, 11}, {9, 27, 21, -4}, {20, 21, 41, 14}, {11, -4, 14, 14}}
N[Eigenvalues[A4]]
{68.9783, 24.9189, 5.1028, 0.}
Sqrt[%]
{8.30532, 4.99188, 2.25894, 0.}

 

Now we calculate the Schatten norms. We start with the first one
\[ \| {\bf A} \|_1 = \sum_{i=1}^r \sigma_i = \sum_{i=1}^r \sqrt{\lambda_i} \left[ = \mbox{trace} \left( \sqrt{{\bf A}^{\ast} {\bf A}} \right) \right] \approx 15.5561 . \]
We verify this value with Mathemnatica
N[Sum[SingularValueList[A][[i]], {i, 1, 3}]]
15.556136510657433
Again, we double check this answer with two approaches. First, we calculate the sum of eigenvalues of matrices A₃ and A₄:
N[Sum[Sqrt[Eigenvalues[A4][[i]]], {i, 1, 4}]]
15.556136510657433
and then repeat with matrix A₃:
N[Sum[Sqrt[Eigenvalues[A3][[i]]], {i, 1, 3}]]
15.556136510657433
Since the sum of eigenvalues of a square matrix is its trace, we find first of of the roots of matrix A₃. To determine its square root, we use Sylvester's method:
\[ \sqrt{{\bf A}_3} = \sqrt{{\bf A}\,{\bf A}^{\ast}} = \frac{\left( {\bf A}_3 - \lambda_2 {\bf I} \right) \left( {\bf A}_3 - \lambda_3 {\bf I} \right)}{\left( \lambda_1 - \lambda_2 \right)\left( \lambda_1 - \lambda_3 \right)} + \frac{\left( {\bf A}_3 - \lambda_1 {\bf I} \right) \left( {\bf A}_3 - \lambda_3 {\bf I} \right)}{\left( \lambda_2 - \lambda_1 \right)\left( \lambda_2 - \lambda_3 \right)} + \frac{\left( {\bf A}_3 - \lambda_2 {\bf I} \right) \left( {\bf A}_3 - \lambda_1 {\bf I} \right)}{\left( \lambda_3 - \lambda_1 \right)\left( \lambda_3 - \lambda_1 \right)} . \]
Then find its eigenvalues and add them. You may want to verify that the sum of absolute values of eigenvalues of 3×3 matrix A₃ is exactly the same as the corresponding sum for 4×4 matrix A₄.

The square of the Schatten 2-norm is the square of the Frobenius norm

\[ \| {\bf A} \|_2^2 = \sum_{i=1}^3 \sigma_i \left( {\bf A} \right) = \sum_{i=1}^3 \lambda_i \left( {\bf A}^{\ast}{\bf A} \right) = 99 \]
because
N[Sum[Eigenvalues[A3][[i]], {i, 1, 3}]]
99
Then taking a square root, we obtain ‖A‖₂ to be
\[ \| {\bf A} \|_2 = \sqrt{99} = 3\sqrt{11} = \| {\bf A} \|_F , \]
which is the Frobenius norm. The last Schatten norm to consider is the infinite-norm:
\[ \| {\bf A} \|_{\infty} = \max \left\{ \sigma_1, \sigma_2 , \sigma_2 \right\} \approx 8.30532 \]
N[SingularValueList[A][[1]]]
8.30532017177966
which is the Euclidean norm.

 

  1. Coffee
  2. Tea
  3. Milk

 

  1. Kahan, W., A Tutorial Overview of Vector and Matrix Norms, U.C. Berkeley, 2010.