es

To enhance pedagogical effectiveness, the treatment of the dot product is presented in several distinct sections

Metric

Geometrical interpretation

Duality

Orthogonality

Projection

Solvability

We denote by 𝔽 one of the following four fields: ℤ, a set of integers, ℚ, a set of rational numbers, ℝ, a set of real numbers, and ℂ, a set of complex numbers. However, in this section, we mostly use only one of them----the set of real numbers mostly because the definition of length or norm definition involves only ℝ (its extension for ℂ is discussed later in inner product section).

This section is devoted to one of the most important operations in all of linear algebra---dot product. Many operations and algorithms involve dot product, including convolution, correlation, matrix multiplication, duality, the Fourier transform, signal filtering, and many others.

Dot Product

We met many times in previous sections a special linear combination of numerical vectors. For instance, a linear equation in n unknowns
\begin{equation} \label{EqDot.1} a_1 x_1 + a_2 x_2 + \cdots + a_n x_n = b , \end{equation}
which we prefer to write in succinct form ax = b, where a = (𝑎₁, 𝑎₂, … , 𝑎n), x = (x₁, x₂, … , xn), and b = (b₁, b₂, … , bn) are numerical vectors from 𝔽n. Another widely used application of this peculiar linear combination is observed in multiplications of matrices.
The dot product of two lists (or arrays) of the same size \( {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \) and \( {\bf y} = \left[ y_1 , y_2 , \ldots , y_n \right] \) is the following expression, denoted by xy, \begin{equation} \label{EqDot.2} {\bf x} \bullet {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n , \end{equation} subject that all multiplications in (2) make sense and their sum is justified. Expression \eqref{EqDot.2} is naturally called the scalar product when it is referred to numerical vectors with entries from the field of scalars 𝔽.

Remark 1:    Although textbooks on linear algebra define dot product for vectors of the same vector space (mostly because it leads to fruitfully theory and geometric applications), our definition extends the dot product for vectors from different vector spaces, but of the same dimension and over the same field 𝔽 of scalars. Importance of this definition stems from practical applications; for instance, in calculus you learn that the line integral involves definition of the dot product for vector field F with infinitesimal dr:

\[ \int_C \mathbf{F} \bullet {\text d}\mathbf{r} \]
for some path C. Another famous example constitutes definition of the Laplacian operator via dot product of two gradient operators:
\[ \Delta = \nabla \bullet \nabla = \left( \frac{\partial}{\partial x_1} , \frac{\partial}{\partial x_2} , \ldots , \frac{\partial}{\partial x_n} \right) \bullet \left( \frac{\partial}{\partial x_1} , \frac{\partial}{\partial x_2} , \ldots , \frac{\partial}{\partial x_n} \right) \]
   
Example 1: Suppose we have a vector a = (3,2,1) and a list of three matrix entries b = (A₁, A₂, A₃), where A₁, A₂, and A₃ are some matrices of the same dimensions. Then their dot product is the matrix \[ \mathbf{a} \bullet \mathbf{b} = 3\,\mathbf{A}_1 + 2\,\mathbf{A}_2 + \mathbf{A}_3 . \] As it is seen from the relation above, this expression leads to the definition of dot product with values in the set of matrices.

Let us make a numerical experiment by choosing the following matrices: \[ \mathbf{A}_1 = \begin{bmatrix} 1& 2.1 \\ -3& 2.2 \\ -3& -1.5 \end{bmatrix}, \quad \mathbf{A}_2 = \begin{bmatrix} -4& 1.3 \\ 1& 2.6 \\ 5& -3.1 \end{bmatrix}, \quad \mathbf{A}_3 = \begin{bmatrix} 2& 1.7 \\ 2& 6.2 \\ 8& 3.9 \end{bmatrix} . \] Then their dot product will be \[ \mathbf{a} \bullet \mathbf{b} = \begin{bmatrix} -3.& 10.6 \\ -5.& 18. \\ 9.& -6.8 \end{bmatrix} . \]

A1 = {{1, 2.1}, {-3, 2.2}, {-3, -1.5}}; A2 = {{-4, 1.3}, {1, 2.6}, {5, -3.1}}; A3 = {{2, 1.7}, {2, 6.2}, {8, 3.9}}; a = {3, 2, 1}; b = {A1, A2, A3}; a . b
{{-3., 10.6}, {-5., 18.}, {9., -6.8}}
   ■
End of Example 1

Remark 2:    In applications, numerical vectors are usually associated with measurements and so inherit units. For instance, integer 5 is associated with 5 millions of dollars in Wall street offices, the same number is considered as a 5 dollars bill by a bank's clerk, but mechanical engineer may look at it as 5 centimeters, and computer science folks consider this information as 5 GB. Only mathematicians see in 5 an integer or number without any unit. Therefore, vectors and scalars in Linear Algebra are not related to any specific unit measurements. Now we all appreciate the beauty of mathematical language because we enter our particular information into a computer---this device recognizes only electric pulses as on or off---there is no room for any unit. When Joseph Fourier (1768--1830) introduced the Fourier transform in 1822

\[ \hat{f}(\xi ) = \int_{\mathbb{R}^n} f(x)\, e^{-{\bf j} 2\pi\,{\bf x}\bullet \xi} {\text d}^n \mathbf{x} , \qquad {\bf j}^2 = -1, \]
he used the dot product, xξ = (x₁, x₂, … , xn) • (ξ₁, ξ₂, … , ξn), with two n-dimensional vectors of different units (say if x measures time, then ξ corresponds to frequency because their dot product should be dimensionless---otherwise the exponential term makes no sense).    
Example 2: Numerical vectors (i.e., vectors from 𝔽n) that are used in the definition of dot product, Eq.(2), may have distinct units depending on applications. We give several examples from physics.

If s represents a displacement (e.g., it has SI units in meters) and f represents a force (e.g., with units in Newtons), then fs represents a work (in Newton-meters or Joules). Therefore, the force has units of "Joules per meter".

The dot product of force (F) and velocity (v in meter per second) is called instantaneous power (or simply power). It represents the rate at which work is done by a force on an object and is a scalar quantity.

Electric flux density, denoted by D and measured in coulombs per square meter (C/m²), describes how electric displacement is distributed in space. In a linear, isotropic medium it is related to the electric field by \[ \mathbf{D}=\varepsilon_0\mathbf{E}+\mathbf{P}. \] This expression separates the response of the vacuum from the response of the material. The constant ε₀ ≈ 8.854187 × 10⁻¹² F m⁻¹, the permittivity of free space, has units C/(V·m) and determines how strongly the vacuum itself “permits’’ electric field lines. The vector P, the polarization, has units C/m² and represents the dipole moment per unit volume created when the molecules of a dielectric shift slightly under the influence of an electric field. Thus, ε₀E describes the field’s effect in empty space, while P encodes the additional displacement arising from bound charges inside matter.

Electric flux is defined through the dot product of the electric field with the unit normal n to a surface, \[ {\text d}\Phi_E = \mathbf{E}\bullet \hat{\bf n}\, {\text d}A. \] The dot product plays two essential roles. Geometrically, it selects only the component of E perpendicular to the surface, ensuring that tangential field components do not contribute to flux. Dimensionally, it preserves the units of E (V/m), and multiplication by area dA produces flux with units volt·meter (V·m). This construction shows how the combination of vector projection and area naturally yields a physically meaningful scalar quantity.

A parallel structure holds for D. Its surface integral, \[ \iint_S \mathbf{D}\bullet \hat{\bf n}\, {\text d}A = Q_{\mathrm{free}}, \] produces free charge (C). Again, the dot product ensures that only the normal component of D contributes, and the units combine cleanly: C/m² multiplied by m² yields C. This result is the content of Gauss’s law in matter, \[ \nabla \bullet \mathbf{D}=\rho_{\mathrm{free}}, \] which states that the divergence of D counts only free charge, because the bound charge associated with polarization has already been absorbed into P.

In vacuum or in a uniform linear dielectric, one may combine ε₀E and P into a single term ε E, so that D = ε E. In such cases D can be eliminated without loss of generality. However, in real materials—especially those with spatially varying permittivity or nonlinear polarization—D becomes indispensable. It provides a clean separation between the intrinsic electric field E and the material’s response P, with the dot product serving as the mathematical bridge that connects vector fields, geometry, and physical units.

The same geometric and dimensional structure appears in other areas of physics. Tor instance, magnetic flux (ΦB) measures the total magnetic field B passing through a surface and is defined by \[ {\text d}\Phi _B = \mathbf{B}\bullet \hat{\bf n}\, {\text d}A. \] Its SI unit is the weber (Wb), equal to tesla·meter² (T·m²), since B has units of tesla and dA has units of m². In fluid mechanics, the normal momentum flux is \[ \Pi =\rho \, (\mathbf{v}\bullet \mathbf{n}), \] with units kg/(m·s²), the same as pressure. In heat transfer, the rate of heat flow through a surface is \[ \dot {Q} = \mathbf{q}\bullet \hat{\bf n}\, {\text d}A, \] where q is the heat‑flux vector with units W/m², so that \( \displaystyle \quad \dot {Q} \quad \) has units of watts (W).

Across electromagnetism, fluid mechanics, and heat transfer, the dot product plays the same conceptual role: it extracts the normal component of a vector field and ensures that the resulting flux carries the correct physical dimensions. This unified structure explains why flux laws across physics share a common mathematical form, even though the underlying fields—electric, magnetic, momentum, or thermal—represent very different physical phenomena.

In continuum mechanics, the stress tensor acts on a surface with unit normal n = (n₁, n₂, n₃) to produce the traction vector t, which is the force per unit area acting on that surface. \[ \mathbf{t}=\left( \begin{matrix}\sigma _{11}n_1+\sigma _{12}n_2+\sigma _{13}n_3\\ \sigma _{21}n_1+\sigma _{22}n_2+\sigma _{23}n_3\\ \sigma _{31}n_1+\sigma _{32}n_2+\sigma _{33}n_3\end{matrix}\right) . \] This is a tensor–vector dot product, producing a vector. In index notation: \[ t_i =\sigma _{ij}n_j . \] Stress tensor units are N/m², while nornal vector is dimensionless. This tensor-vector product is one of the most important formulas in elasticity and continuum mechanics.    ■

End of Example 2

Remark 3:    Recall that two vector spaces V and U are isomorphic (denoted VU) if there is a bijective linear map between them. This bijection (which is one-to-one and onto mapping) can be achieved by considering ordered bases α = [ a₁, a₂, … , an ] and β = [ b₁, b₂, … , bn ] in these vector spaces V and U, respectively. Then components of every vector with respect to a chosen ordered basis can be identified uniquely with an n-tuple. Therefore, the algebraic formula \eqref{EqDot.2} is essentially applied to two isomorphic copies of the Cartesian product 𝔽n. Geometric interpretation of the dot product, which is coordinate independent and therefore conveys invariant properties of these products, is given in the Euclidean space section.

Note:    The definition of the dot product does not restrict applying it to two distinct isomorphic versions of the direct product 𝔽n ≅ 𝔽n×1 ≅ 𝔽1×n. It is the basic computational building-block from which many operations and algorithms are built. So you can find the dot product of a row vector with a column vector. However, we try to avoid writing it as matrix multiplication,

\[ \left[ x_1 , x_2 , \ldots , x_n \right] \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \left[ {\bf x} \bullet {\bf y} \right] \in \mathbb{F}^{1 \times 1} \cong \mathbb{F} , \]
because the right-hand side is a 1×1 matrix that a computer solver always treats differently from a scalar. However, the dot product can be applied to row vectors and column vectors:
\[ \left[ x_1 , x_2 , \ldots , x_n \right] \bullet \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} \bullet \left[ x_1 , x_2 , \ldots , x_n \right] \in \mathbb{F} . \quad    ▣ \]

Mathematica is smart and does not distinguish rows from columns to some extend. Dot product can be accomplished with two Mathematica commands:

a = {1, 2, 3}; b = {3, 2, 1};
Dot[a, b]
a . b
10
However, when you specify one vector as a column, the order matters:
a = {{1}, {2}, {3}}; b = {3, 2, 1}; Dot[a, b]
Dot::dotsh: Tensors {{1},{2},{3}} and {3,2,1} have incompatible shapes.
but
Dot[b, a]
{10}
Josiah Gibbs
The term "dot product" was first introduced by the American physicist and mathematician Josiah Willard Gibbs (1839--1903) in the 1880s. Initially, the scalar product appeared in a pamphlet distributed to his students at Yale University. Gibbs's pamphlet was eventually incorporated into a book entitled Vector Analysis that was published in 1901 and coauthored with one of his students.

One of the main and fruitful applications of the dot product is observed when scalar product involves numerical vectors from 𝔽n or their isomorphic copies. Upon introducing an ordered basis α = [e₁, e₂, … , en] in a finite dimensional vector space V, every its vector v = ce₁ + ce₂ + ⋯ + cnen is uniquely identified with the corresponding coordinate vector ⟦vα = (c₁, c₂, … , cn) ∈ 𝔽n.    

Example 3: Two vectors of length two dotted together looks like: \[ \begin{pmatrix} 2 \\ -3 \end{pmatrix} \bullet \begin{pmatrix} 5 \\ 4 \end{pmatrix} = 2 \cdot 5 + (-3) \cdot 4 = -2 . \]
Dot[{2, -3}, {5, 4}]
-2

Calculate the dot product of two three dimensional vectors a = (3, 2, 1) and b = (4, −5, 2).

Solution: Using the component formula (1) for the dot product of three-dimensional vectors \[ \mathbf{a} \bullet \mathbf{b} = a_1 b_1 + a_2 b_2 + a_3 b_3 , \] we calculate the dot product to be \[ \mathbf{a} \bullet \mathbf{b} = 3 \cdot 4 - 2 \cdot 5 + 1 \cdot 2 = 4. \]

a = {3, 2, 1}; b = {4, -5, 2}; a . b
Dot[a, b]
4

Mathematica uses two commands for evaluation scalar products:

a.b
Dot[a , b]
However, vectors for dot product must be entered into Mathematica notebook as n-tuples from 𝔽n, but not in matrix form as row vectors or column vectors. So the following commands will not be executed.
a = {{3, 2, 1}}; b = {{4, −5, 2}};
a.b
Dot products have incompatible shapes
a = {{3}, {2}, {1}}; b = {{4}, {−5}, {2}};
a.b
Dot products have incompatible shapes
   ■
End of Example 3

Not every curvilinear system of coordinates supports dot product, as the following example shows.    

Example 4: Let us consider a plane ℝ² equipped with polar coordinates. Then every point P is uniquely identified with a polar pair (r, θ), where r is the distance P from a reference point O (known as pole, which is usually the origin) and angle θ formed by the line OP and polar axis, which is usually abscissa. For two points P₁(r₁, θ₁) and P₂(r₂, θ₂), you cannot form the dot product \[ P_1 \bullet P_2 = r_1 r_2 + \theta_1 \theta_2 \qquad \mbox{is wrong} \] because its components have different units: distances are measured in meters (in SI system) and angles have no units (in SI).

Definition of dot product in polar coordinates is presented in section "Dot product in coordinate systems" and Example 23.    ■

End of Example 4

 

Properties of dot product


The dot product is not defined for vectors of different dimensions. It does not matter whether vectors are columns or rows or n-tuples. so you can evaluate dot product of row vectors with column vectors---they must be from the vector spaces over the same field. Therefore, this definition is valid not only for n-tuples (elements from 𝔽n), but also for column vectors and row vectors.

The basic properties (1--4) of the dot product are valid for vectors from the same vector space, but the last one involves compatible vector dimensions. In presented properties, u, v, and w are finite dimensional vectors, and λ is a number (scalar):

Theorem 1: Let u, v, w be vectors of the same finite size and λ be a scalar. Then the following properties hold:
  1. uu > 0 and uu = 0 if and only if u = 0.
  2. uv = vu                   (commutative law);
  3. (u + v) • w = uw + vw         (distributive law);
  4. (λ u) • v = λ (uv) = u • (λ v)     (associative law);
  5. for any two column vectors u ∈ ℝn×1, v ∈ ℝm×1, and matrix A ∈ ℝm×n, the following equation holds:
    vAu = ATvu,     where AT (A′) is the transpose of matrix A.
    A similar relation holds for row vectors: uvA = uATv.

  1. This property is trivial because \[ \mathbf{u} \bullet \mathbf{u} = u_1^2 + u_2^2 + \cdots + u_n^2 > 0 \] unless all components of vector u are zeroes.
  2. Applying the definition of dot product to u · v and v · u, we obtain \begin{align*} \mathbf{u} \bullet \mathbf{v} &= u_1 v_1 + u_2 v_2 + \cdots + u_n v_n \\ \mathbf{v} \bullet \mathbf{u} &= v_1 u_1 + v_2 u_2 + \cdots + v_n u_n \end{align*} Since product of two numbers from field 𝔽 is commutative, we conclude that u · v = v · u.
  3. Since every finite dimensional vector space is isomorphic to 𝔽n, we can assume that these vectors u, v, and w belong to the direct product 𝔽n. Then \[ \mathbf{u} + \mathbf{v} = \left( u_1 , \ldots , u_n \right) + \left( v_1 , \ldots , v_n \right) = \left( u_1 + v_1 , \ldots , u_n + v_n \right) . \] Taking the dot product with w, we get \begin{align*} \left( \mathbf{u} + \mathbf{v} \right) \bullet \mathbf{w} &= \left( u_1 + v_1 , u_2 + v_2 , \ldots , u_n + v_n \right)\left( w_1 , \ldots , w_n \right) \\ &= u_1 w_1 + v_1 w_1 + \cdots u_n w_n + v_n w_n \\ &= \mathbf{u} \bullet \mathbf{w} + \mathbf{v} \bullet \mathbf{w} . \end{align*}
  4. The left-hand side is \[ \left( \lambda\,\mathbf{u} \right) \bullet \mathbf{v} = \lambda\,u_1 v_1 + \cdots + \lambda u_n v_n = \lambda \left( u_1 v_1 + \cdots + u_n v_n \right) , \] which equal to the right-hand side λ (uv).
  5. For matrix A = [𝑎i,j] ∈ ℝm×n, we have \[ \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} = \sum_{i=1}^m v_i \left( \mathbf{A}\,\mathbf{u} \right)_i \] where the i-th component of A u is \[ \left( \mathbf{A}\,\mathbf{u} \right)_i = \sum_{j=1}^n a_{i.j} u_j . \] Changing the order of summation, we get \begin{align*} \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} &= \sum_{i=1}^m v_i \sum_{j=1}^n a_{i.j} u_j \\ &= \sum_{j=1}^n \sum_{i=1}^m v_i a_{i.j} u_j = \sum_{j=1}^n u_j \left( \mathbf{A}^{\mathrm T} \mathbf{v} \right)_j , \end{align*} which is u • (ATv).
    The distributive property claims that a dot product can be broken into the sum of two dot products, by representing one of the vectors as the sum of two vectors.

    Note that the associative law for scalar product (vu) • wv • (uw) is not valid in general; see the following example.    

Example 5: Let’s see some quick examples to make sure that all properties are clearly true.
  1. Any nonzero vector will work; for instance, v = (3, −2, 1) ∈ ℝ³. Then \[ \mathbf{v} \bullet \mathbf{v} = 3^2 + (-2)^2 + 1^2 = 9+4+1 = 14 > 0. \]
          {3, -2, 1} . {3, -2, 1}
          14
  2. Commutativity holds because the dot product is implemented element- wise, and each element-wise multiplication is simply the product of two scalars. Scalar multiplication is commutative, and therefore the dot product is commutative.

    Let \[ \mathbf{v} = \left( 1, 2, 3 \right) , \quad \mathbf{u} = \left( 4, -6, 5 \right) \in \mathbb{R}^3 . \] Then their scalar product is 7, independently of the order of multiplication,as Mathematica confirms:

          v = {1, 2, 3}; u = {4, -6, 5}; v.u
          7
          Dot[u, v]
          7
  3. Suppose we need to find a dot product of two numerical vectors vu, one of which has large entries. For instance, \[ \mathbf{v} = \begin{pmatrix} 3791 \\ -5688 \\ 2894 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 2 \\ 4 \end{pmatrix} . \] Scalar product of these numerical vectors involves large unpleasant multiplications and summation. Using distributive property, we break vector v into sum of four vectors: \[ \mathbf{v} = \mathbf{v}_1 + \mathbf{v}_2 + \mathbf{v}_3 + \mathbf{v}_4 , \] where \[ \mathbf{v}_1 = \begin{pmatrix} 1 \\ -8 \\ 4 \end{pmatrix} , \ \mathbf{v}_2 = \begin{pmatrix} 90 \\ -80 \\ 90 \end{pmatrix} , \ \mathbf{v}_3 = \begin{pmatrix} 700 \\ -600 \\ 800 \end{pmatrix} , \ \mathbf{v}_4 = \begin{pmatrix} 3000 \\ -5000 \\ 2000 \end{pmatrix} . \] The corresponding four scalar products are not tedious to find:
          u = {3,2,4}; v1 = {1,-8,4}; v2 = {90, -80, 90}; v3 = {700,-600,800}; v4 = {3000,-5000,2000}; d1 = u.v1
          3
          d2 = u.v2
          470
          d3 = u.v3
           4100
          d4 = u.v4
           7000
    Adding these four numbers, we get the required dot product: \begin{align*} \mathbf{v} \bullet \mathbf{u} &= \left( \mathbf{v}_1 + \mathbf{v}_2 + \mathbf{v}_3 + \mathbf{v}_4 \right) \bullet \mathbf{u} = \mathbf{v}_1 \bullet \mathbf{u} + \mathbf{v}_2 \bullet \mathbf{u} + \mathbf{v}_3 \bullet \mathbf{u} + \mathbf{v}_4 \bullet \mathbf{u} \\ &= 3+470+4100+7000 = 11573 . \end{align*}
          d1+d2+d3+d4
          11573
  4. We set λ = 3.1415926, v = (236, -718), u = (892, 435). Without computer assistance determination of corresponding dot products will be time consuming. So we ask Mathematica for help and find that \[ \left( \lambda\mathbf{u} \right) \bullet \mathbf{v} = \lambda \left( \mathbf{u} \bullet \mathbf{v} \right) = \mathbf{u} \bullet \left( \lambda \mathbf{v} \right) \approx 319871. \]
          la =3.1415926; v = {236,-718}; u = {892, 435}; (la*u) . v
          319871.
          u . (la*v)
          319871.
          la*(v . u)
          319871.
  5. Let us take a singular matrix and two 3-column vectors: \[ \mathbf{A} = \begin{bmatrix} 1&2&3 \\ 4&5&6 \\ 7&8&9 \end{bmatrix} , \quad \mathbf{u} = \begin{pmatrix} 35 \\ -11 \\ 17 \end{pmatrix} , \quad \mathbf{v} = \begin{pmatrix} 23 \\ 97 \\ 41 \end{pmatrix} . \]
          A = {{1, 2, 3}, {4, 5, 6}, {7,8 ,9}};
    u = {35,-11,17}; v = {23,97,41};
    We set vectors v and u as 3-tuples (elements of ℝ³) but not column vectors (elements of ℝ3×1) because Mathematica is smart to understand that we want column vectors.
           u = {{35}, {-11}, {17}}; v = {{23}, {97}, {41}};
          Dot[u, A . v]
           25049
    and
          Dot[Transpose[A] . u, v]
           25049
    Now we check the same property for row vectors:
           Dot[u . A, v]
           25049
           Dot[v . Transpose[A], u]
           25049

    Rectangular matrix:    we consider 3-by-2 matrix \[ \mathbf{A} = \begin{bmatrix} 1&2 \\ 3&4 \\ 5&6 \end{bmatrix} \] and two column vectors \[ \mathbf{u} = \begin{pmatrix} a \\ b \end{pmatrix}, \qquad \mathbf{v} = \begin{pmatrix} c \\ d \\ e \end{pmatrix} . \] Then \[ \mathbf{A}\,\mathbf{u} = \begin{bmatrix} 1&2 \\ 3&4 \\ 5&6 \end{bmatrix}\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 1\cdot a + 2 \cdot b \\ 3 \cdot a + 4 \cdot b \\ 5 \cdot a + 6 \cdot b \end{pmatrix} . \] Its dot product with v becomes \[ \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} = c \left( a + 2\,b \right) + d \left( 3\,a + 4\, b \right) + e \left( 5\,a + 6\, b \right) . \tag{A} \] On the other hand, \[ \mathbf{A}^{\mathrm T} \mathbf{v} = \begin{bmatrix} 1 & 3 & 5 \\ 2&4&6 \end{bmatrix} \begin{pmatrix} c \\ d \\ e \end{pmatrix} = \begin{pmatrix} 1\cdot c + 3\cdot d + 5\cdot e \\ 2 \cdot c + 4 \cdot d + 6 \cdot e \end{pmatrix} . \] Then its dot product with u becomes \[ \mathbf{A}^{\mathrm T} \mathbf{v} \bullet \mathbf{u} = a \left( c + 3\, d + 5\, e \right) + b \left( 2\,c + 4\, d + 6\, e \right) . \tag{B} \] Do you need a computer assistance to check that expression (A) is the same as (B) ? I don't need it.

  6. Extra warning:    We demonstrate that the identity   v • (uw) = (vu) • w is not valid for arbitrary vectors v, u, and w. It is true only when v is a scalar multiple of w. Indeed, let k = uw and c = vu. Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} \quad \iff \quad \mathbf{v} k = c \mathbf{w} . \] from the latter, it follows that vectors v and w are collinear.

    We choose three vectors (and write them in column form): \[ \mathbf{v} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 4 \end{pmatrix} , \quad \mathbf{w} = \begin{pmatrix} 5 \\ 6 \end{pmatrix} . \] Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \begin{pmatrix} 1 \\ 2 \end{pmatrix} \cdot 39 = \begin{pmatrix} 39 \\ 78 \end{pmatrix} \]

          v = {1, 2}; u = {3, 4}; w = {5, 6};
          u . w
          39
          v*39
          {39, 78}
    and \[ \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} = 11 \cdot \begin{pmatrix} 5 \\ 6 \end{pmatrix} = \begin{pmatrix} 55 \\ 66 \end{pmatrix} . \]
          Dot[v, u]
           11
          11*w
          {55, 66}
    However, for some vectors, we observe v • (uw)) = (vu) • w. Let us pick up three vectors \[ \mathbf{v} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 5 \end{pmatrix} , \quad \mathbf{w} = \begin{pmatrix} 2 \\ 4 \end{pmatrix} . \] Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \mathbf{v} \,26 = 26 \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \]
           v = {1, 2}; u = {3,5}; w = {2,4}; u.w
           26
    and \[ \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} = 13 \mathbf{w} = 13 \begin{pmatrix} 2 \\ 4 \end{pmatrix} . \]
           v.u
           13
    Since \[ 26 \begin{pmatrix} 1 \\ 2 \end{pmatrix} = 13 \begin{pmatrix} 2 \\ 4 \end{pmatrix} , \] we conclude that this identity is valid for collinear vectors v, and w, and arbitrary u.
   ■
End of Example 5
Theorem 2 (Cauchy inequality): For any two real numerical vectors v and u of the same finite dimension, the following inequality holds:

\begin{equation} \label{EqDot.3} \left( \mathbf{u} \bullet \mathbf{v} \right)^2 \leqslant \left( \mathbf{u} \bullet \mathbf{u} \right) \left( \mathbf{v} \bullet \mathbf{v} \right) . \end{equation} Equality holds in Eq.\eqref{EqDot.3} if and only if u and v are linearly dependent, i.e., u = λv for some scalar λ.

There are known many proofs of Cauchy's inequality (see. for instance, Marcus, M. & Minc). Here is one of them.

It is convenient to introduce the following notation:     ∥v∥² = vv. Positive square root of this quantity is call the norm in mathematics. Then Cauchy inequality can be rewritten as \[ \left\vert {\bf u} \bullet {\bf v} \right\vert \le \| {\bf u} \| \cdot \| {\bf v} \| , \] Suppose first that either u or v is zero. Then their dot product is zero and the Cauchy inequality holds.

Now suppose that neither u nor v is zero. It follows that ∥u∥ > 0 and ∥v∥ > 0 because the dot product xx > 0 for any nonzero vector x. We have \begin{align*} 0 &\le \left( \frac{{\bf u}}{\| {\bf u} \|} + \frac{{\bf v}}{\| {\bf v} \|} \right) \bullet \left( \frac{{\bf u}}{\| {\bf u} \|} + \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf u}}{\| {\bf u} \|} \right) + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \left( \frac{{\bf v}}{\| {\bf v} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \left( {\bf u} \bullet {\bf u} \right) + \frac{2}{\| {\bf u} \| \cdot \| {\bf v} \|} \left( {\bf u} \bullet {\bf v} \right) + \frac{1}{\| {\bf v} \|^2} \left( {\bf v} \bullet {\bf v} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \, \| {\bf u} \|^2 + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \frac{1}{\| {\bf v} \|^2} \, \| {\bf v} \|^2 \\ &= 1 + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + 1 \end{align*} Hence,     −∥u∥ · ∥v∥ ≤ uv. Similarly, \begin{align*} 0 &\le \left( \frac{{\bf u}}{\| {\bf u} \|} - \frac{{\bf v}}{\| {\bf v} \|} \right) \bullet \left( \frac{{\bf u}}{\| {\bf u} \|} - \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf u}}{\| {\bf u} \|} \right) - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \left( \frac{{\bf v}}{\| {\bf v} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \left( {\bf u} \bullet {\bf u} \right) - \frac{2}{\| {\bf u} \| \cdot \| {\bf v} \|} \left( {\bf u} \bullet {\bf v} \right) + \frac{1}{\| {\bf v} \|^2} \left( {\bf v} \bullet {\bf v} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \, \| {\bf u} \|^2 - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \frac{1}{\| {\bf v} \|^2} \, \| {\bf v} \|^2 \\ &= 1 - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + 1 \end{align*} Therefore,     uv ≤ ∥u∥ · ∥v∥. By combining the two inequalities, we obtain the Cauchy inequality.

   
Since Cauchy's inequality depends on an integer n (size of numerical vectors), we are going to prove it by mathematical induction.

The case n = 1 is trivially true. When n = 2, Cauchy’s inequality just says \[ \left( a_1 b_1 + a_2 b_2 \right)^2 \leqslant \left( a_1^2 + a_2^2 \right) \left( b_1^2 + b_2^2 \right) . \] Expanding both sides to find the equivalent inequality \[ 0 \leqslant \left( a_1 b_2 \right)^2 - 2 \left( a_1 b_2 a_2 b_1 \right) + \left(a_2 b_1 \right)^2 . \] From the well-known factorization x² − 2xy + y² = (xy)² one finds \[ 0 \leqslant \left( a_1 b_2 - a_2 b_1 \right)^2 , \] and the nonnegativity of this term confirms the truth of Cauchy's inequality for n = 2.

Now that we have proved a nontrivial case of Cauchy’s inequality, we are ready to look at the induction step. If we let H(n) stand for the hypothesis that Cauchy’s inequality is valid for n, we need to show that H(2) and H(n) imply H(n + 1). With this plan in mind, we do not need long to think of first applying the hypothesis H(n) and then using H(2) to stitch together the two remaining pieces. Specifically, we have \begin{align*} & \quad a_1 b_1 + a_2 b_2 + \cdots + a_n b_n + a_{n+1} b_{n+1} \\ &= \left( a_1 b_1 + a_2 b_2 + \cdots + a_n b_n \right) + a_{n+1} b_{n+1} \\ & \leqslant \left( a_1^2 + a_2^2 + \cdots + a_n^2 \right)^{1/2} \left( b_1^2 + b_2^2 + \cdots + b_n^2 \right)^{1/2} + a_{n+1} b_{n+1} , \\ & \leqslant \left( a_1^2 + a_2^2 + \cdots + a_n^2 + a_{n+1}^2 \right)^{1/2} \left( b_1^2 + b_2^2 + \cdots + b_n^2 + b_{n+1}^2 \right)^{1/2} \end{align*} where in the first inequality, we used the inductive hypothesis H(n) and in the second inequality we used H(2) in the form \[ \alpha \beta + a_{n+1} b_{n+1} \leqslant \left( \alpha^2 + a_{n+1}^2 \right)^{1/2} \left( \beta^2 + b_{n+1}^2 \right)^{1/2} , \] with the new variables \[ \alpha = \left( a_1^2 + a_2^2 + \cdots + a_n^2 \right)^{1/2} , \quad \beta = \left( b_1^2 + b_2^2 + \cdots + b_n^2 \right) . \]

   
Example 6: First, we generate randomly two vectors of size 4:
   u = RandomInteger[{-9, 9}, 4]
   {9, 8, -2, 5}
   v = RandomInteger[{-9, 9}, 4]
   {-1, 2, -1, 6}
\[ \mathbf{u} = \left( 9, 8, -2, 5 \right) , \qquad \mathbf{v} = \left( -1, 2, -1, 6 \right) , \] Their dot product is
   Dot[u, v]
   39
Square norms of these two vectors are
   Dot[u,u]
   174
   Dot[v,v]
   42
Since the left-hand side of Eq.(3) is
   Dot[u,v]^2
   1521
but the right-hand side is
   174*42
   7308
These results confirm the Cauchy inequality for these two randomly chosen vectors.

Now we set vector u to be u = 2.71828 v. Using Mathematica, we repeat all previous calculations for these two vectors.

   u = 2.71828*v; Dot[u, v]^2
   13034.3
The square norm of u is
   Dot[u, u]
   310.34
Its product with ∥v∥² is
   Dot[u, u] * 42
   13034.3
We check it with the previous dot product squared and get
   Dot[u, u] * 42 == Dot[u,v]^2
   True

Equality:    We consider two vectors in ℝ²: \[ \mathbf{v} = \left( 1 , 2 \right) , \qquad \mathbf{u} = \left( 2 , 4 \right) \] They are linearly dependent because u = 2v. Their scalar product is \[ \left( 1 , 2 \right) \bullet \left( 2 , 4 \right) = 1 \cdot 2 + 2 \cdot 4 = 10. \] I hope that you can square this number without a computer assistance. Norms squared of these vectors are \[ \| \mathbf{v} \|^2 = 1^2 + 2^2 = 5 , \qquad \| \mathbf{u} \|^2 = 2^2 + 4^2 = 4 + 16 = 20 . \] Their product becomes \[ \| \mathbf{v} \|^2 \cdot \| \mathbf{u} \|^2 = 5 \cdot 20 = 100 = 10^2 = \left( \mathbf{v} \bullet \mathbf{u} \right)^2 . \]    ■

End of Example 6

Important Note:    Cauchy's inequality is not valid for complex vector spaces as the following example shows for two 2-vectors u = (1, j) and v = (1, −j), where j is the imaginary unit vector of the complex plane ℂ so j² = −1.

\[ \mathbf{u} \bullet \mathbf{u} = 1^2 + \mathbf{j}^2 = 0 ,\quad \mathbf{v} \bullet \mathbf{v} = 1^2 + \left( -\mathbf{j} \right)^2 = 0, \quad \mathbf{u} \bullet \mathbf{v} = 1^2 - \mathbf{j}^2 = 2 . \]
u = {1, I}; v = {1, -I}; u . u
0
v . v
0
Dot[u, v]
2

The inequality \eqref{EqDot.3} is also referred to as the Cauchy--Schwarz or Cauchy--Bunyakovsky--Schwarz or usually as CBS inequality.

The inequality \eqref{EqDot.3} was first proved by the French mathematician, engineer, and physicist baron Augustin-Louis Cauchy in 1821. In 1859, Victor Bunyakovsky extended this inequality to the case of infinite summation, that is, he established the integral version of the Cauchy inequality. Contribution of Hermann Schwarz (1843--1921) to the Cauchy inequality is unknown neither to me nor to AI except that he married a daughter of the famous mathematician Ernst Eduard Kummer. In 1888, about 30 years after Bunyakovsky's publication, Hermann presented a proof similar to Bunyakovsky's one,

         
 Augustin-Louis Cauchy    Victor Yakovlevich Bunyakovsky    Hermann Amandus Schwarz

The first step toward the Bunyakovsky result is to establish inequality

\begin{equation} \label{EqDot.4} \left( \sum_{k=1}^{\infty} a_k b_k \right)^2 \leqslant \left( \sum_{i=1}^{\infty} a_i^2 \right) \left( \sum_{i=1}^{\infty} b_i^2 \right) \end{equation}
subject that
\begin{equation} \label{EqDot.5} \sum_{k=1}^{\infty} a_k^2 < \infty , \quad \sum_{i=1}^{\infty} b_i^2 < \infty \qquad \Longrightarrow \qquad \sum_{k=1}^{\infty} a_k b_k < \infty . \end{equation}

Step 1: Finite Cauchy inequality:

For any n ∈ ℕ, define the partial sums \[ S_n = \sum _{k=1}^n a_k b_k. \] By the usual (finite) Cauchy–Schwarz inequality, \[ \left( \sum _{k=1}^na_k b_k\right) ^2 \leq \left( \sum _{k=1}^n a_k^2\right) \left( \sum_{k=1}^n b_k^2\right) . \] So \[ S_n^2 \leq A_n B_n,\quad A_n:=\sum _{k=1}^n a_k^2,\quad B_n :=\sum _{k=1}^n b_k^2. \]

Step 2: Use convergence of \( \displaystyle \quad \sum a_k^2 \quad \& \quad \sum b_k^2 . \)

Assume \[ \sum _{k=1}^{\infty }a_k^2 <\infty ,\qquad \sum _{k=1}^{\infty }b_k^2 <\infty . \] Then (Aₙ) and (Bₙ) are increasing and bounded, hence converge: \[ A_n\rightarrow A:=\sum _{k=1}^{\infty }a_k^2,\qquad B_n\rightarrow B:=\sum _{k=1}^{\infty }b_k^2. \] From the finite inequality, \[ |S_n|\leq \sqrt{A_nB_n}\leq \sqrt{AB}\quad \mathrm{for\ all\ }n. \] So (Sₙ) is a bounded sequence.

Step 3: Show \( \displaystyle \quad \sum a_kb_k \quad \) converges (Cauchy criterion).

For .m > n, \[ S_m-S_n=\sum _{k=n+1}^ma_kb_k. \] Apply finite Cauchy–Schwarz to the finite sum from k=n+1 to m: \[ \left| \sum _{k=n+1}^m a_kb_k\right| \leq \left( \sum _{k=n+1}^ma_k^2\right) ^{1/2}\left( \sum _{k=n+1}^mb_k^2\right) ^{1/2}. \] Let \[ A_{n,m}:=\sum _{k=n+1}^ma_k^2,\quad B_{n,m}:=\sum _{k=n+1}^mb_k^2. \] Then \[ |S_m-S_n|\leq \sqrt{A_{n,m}B_{n,m}}. \] Since \( \displaystyle \quad \sum a_k^2 \quad\& \quad \sum b_k^2 \quad \) converge, their tails go to zero: \[ \sum _{k=n+1}^{\infty }a_k^2\rightarrow 0,\quad \sum _{k=n+1}^{\infty }b_k^2\rightarrow 0\quad \mathrm{as\ }n\rightarrow \infty . \] Thus, for any ε > 0, choose N such that for all n ge; N, \[ \sum _{k=n+1}^{\infty }a_k^2<\varepsilon ^2,\quad \sum _{k=n+1}^{\infty }b_k^2<\varepsilon ^2. \] Then for all m > nN, \[ A_{n,m}\leq \sum _{k=n+1}^{\infty }a_k^2<\varepsilon ^2,\quad B_{n,m}\leq \sum _{k=n+1}^{\infty }b_k^2<\varepsilon ^2, \] so \[ |S_m-S_n|\leq \sqrt{A_{n,m}B_{n,m}}<\sqrt{\varepsilon ^2\varepsilon ^2}=\varepsilon ^2. \] Hence, (Sₙ) is a Cauchy sequence, so it converges. Therefore, \[ \sum _{k=1}^{\infty }a_kb_k \] converges, and we may define \[ S := \sum_{k\ge 1} a_k b_k = \lim_{n\to\infty} S_n . \]

Step 4: Pass to the limit in the inequality.

From the finite inequality \[ S_n^2\leq A_nB_n, \] take limits as n → ∞: \[ \left( \sum _{k=1}^{\infty }a_kb_k\right) ^2=\lim _{n\rightarrow \infty }S_n^2\leq \lim _{n\rightarrow \infty }A_nB_n=\left( \sum _{k=1}^{\infty }a_k^2\right) \left( \sum _{k=1}^{\infty }b_k^2\right) . \] So we obtain the desired extension under the assumptions \[ \sum _{k=1}^{\infty }a_k^2<\infty ,\quad \sum _{k=1}^{\infty }b_k^2<\infty . \]

Now we establish an interesting additive inequality
\begin{equation} \label{EqDot.6} \sum_{k=1}^{\infty} a_k b_k \leqslant \frac{1}{2} \sum_{i=1}^{\infty} a_i^2 + \frac{1}{2} \sum_{i=1}^{\infty} b_i^2 . \end{equation}
In order to verify the latter, we consider a familiar factorization
\[ 0 \leqslant \left( x - y \right)^2 = x^2 - 2xy + y^2 , \]
from which we observe the bound
\[ xy \leqslant \frac{1}{2}\, x^2 + \frac{1}{2}\, y^2 \qquad\mbox{for all real } x, y . \]
When we apply this inequality to x = 𝑎k and y = bk and then sum over all k, we find Eq.\eqref{EqDot.6}.

A positive definite matrix with real entries is a symmetric matrix where all of its eigenvalues are strictly positive. This means that when the matrix is applied to any non-zero vector, it results in a vector that is scaled and not reflected or reduced to zero. In simpler terms, it represents a transformation that only stretches vectors and doesn't flip them or collapse them to a single point.

Theorem 3: If x and y are n-column vectors (so they belong to ℝn×1) and A is an n × n positive definite matrix, then \[ \left( \mathbf{x} \bullet \mathbf{y} \right)^2 \leqslant \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) . \]
Since A is positive definite, there exists a nonsingular matrix T such that A = T ′ T, where T ′ = TT is transpose matrix.Then its inverse is A−1 = T−1(T ′)−1. Defining u = Tx and v = (T ′)−1y, we find that \begin{align*} \left( \mathbf{x} \bullet \mathbf{y} \right)^2 &= \left( \mathbf{x} \bullet {\bf T}' \,{\bf T}'^{-1}\mathbf{y} \right)^2 = \left( {\bf T}\,\mathbf{x} \bullet {\bf T}'^{-1}\mathbf{y} \right)^2 \\ &= \left( \mathbf{u} \bullet \mathbf{v} \right)^2 \leqslant \left( \mathbf{u} \bullet \mathbf{u} \right) \left( \mathbf{v} \bullet \mathbf{v} \right) \\ &= \left( {\bf T}\,\mathbf{x} \bullet {\bf T}\,\mathbf{x} \right) \left( {\bf T}'^{-1} \mathbf{y} \bullet {\bf T}'^{-1} \mathbf{y} \right) = \left( \mathbf{x} \bullet {\bf T}'\,{\bf T}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf T}^{-1} {\bf T}'^{-1} \mathbf{y} \right) \\ &= \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1} \mathbf{y} \right) . \end{align*} We have equality if and only if one of the vectors u = Tx and v = (T ′)−1y is a scalar multiple of the other.
   
Example 7: First, we generate randomly two vectors of size 4:
   x = RandomInteger[{-9, 9}, 4]
   {-4, 3, 9, 5}
   y = RandomInteger[{-9, 9}, 4]
   {-3, 6, 0, -5}
\[ \mathbf{x} = \left( -4, 3, 9, 5 \right) , \qquad \mathbf{y} = \left( -3, 6, 0, -5 \right) , \] Their squared dot product is
   Dot[x, y]^2
   25
Let A be the following defective, but not positive definite matrix \[ \mathbf{A} = \begin{bmatrix} -257& -308& -260& 372 \\ 140& 169& 142& -206 \\ 191& 230& 193& -278 \\ 71& 86& 72& -105 \end{bmatrix} . \] Its inverse is \[ \mathbf{A}^{-1} = \begin{bmatrix} 111& 364& 116& -628 \\ -60& -199& -62& 342 \\ -83& -270& -87& 466 \\ -31& -102& -32& 175 \end{bmatrix} . \]
   A = {{-257, -308, -260, 372}, {140, 169, 142, -206}, {191, 230, 193, -278}, {71, 86, 72, -105}}; Inverse[A]
   {{111, 364, 116, -628}, {-60, -199, -62, 342}, {-83, -270, -87, 466}, {-31, -102, -32, 175}}
Its characteristic polynomial is det(λIA) = (λ² − 1)². Dot products with respect to matrix A of these two vectors are
   Dot[x, A.x]
   5031
   Dot[y, Inverse[A].y]
   24347
Using Theorem 3, we obtain inequality \[ 25 = \left( \mathbf{x} \bullet \mathbf{y} \right)^2 < \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) = 5031 \cdot 24347. \] If we choose y = kx, the inequality becomed equality.

Another matrix:    We repeat all previous calculations with another not positive definite matrix \[ \mathbf{A} = \begin{bmatrix} -249& -292& -276& 412 \\ 136& 161& 150& -226 \\ 185& 218& 205& -308 \\ 69& 82& 76& -115 \end{bmatrix} . \] Its characteristic polynomial is det(λIA) = (λ − 1)³(λ + 1). The corresponding dot products are

   Dot[x, A . x]
   4059
   Dot[y, Inverse[A].y]
   -25297
So we observe a violation of Theorem 3 because the right-hand side is negative: \[ 25 = \left( \mathbf{x} \bullet \mathbf{y} \right)^2 <??? \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) = 4059 \cdot \left( -25297 \right) < 0 . \]    ■
End of Example 7

The following assessment provides a matrix version of Cauchy's inequality. Recall that |·| = det(·) denotes the determinant of a square matrix.

Theorem 4 (Determinantal Cauchy Inequality for Matrices): Suppose that both A and B are m-by-n matrices with real entries, so A, B ∈ ℝm×n. Then \[ \left\vert \mathbf{A}^{\mathrm T}\mathbf{B} \right\vert^2 \leqslant \left\vert \mathbf{A}^{\mathrm T} \mathbf{A} \right\vert \cdot \left\vert \mathbf{B}^{\mathrm T} \mathbf{B} \right\vert , \] where | · | denotes the determinant. Equality holds if and only if
  • rank(A) < n or rank(B) < n;
  • B = A C for some nonsingular matrix C.
If |ATB| = 0, then the inequality is trivial. In this case equality holds precisely when at least one of the matrices A or B has rank < n, since then both sudes vanish.

For the remainder of the proof assume \[ \left\vert\mathbf{A}^{\mathrm T}\mathbf{B}\right\vert \neq 0. \] In particular, both A and B must have full column rank n.

Step 1: Singular value decompositions yields \[ \mathbf{A} = \mathbf{P}_1 \mathbf{D}_1 \mathbf{Q}_1 , \qquad \mathbf{B} = \mathbf{P}_2 \mathbf{D}_2 \mathbf{Q}_2 , \] where

  • P₁, P₂ are m × n matrices with orthonormal columns, i.e., PTP₁ = Iₙ and PTP₂ = Iₙ, the identity n×n matrix;
  • Q₁, Q₂ are n-by-n orthogonal matrices;
  • D₁, D₂ are diagonal n-by-n matrices with positive diagonal entries (the singular values).
Then \[ \mathbf{A}^{\mathrm T}\mathbf{B} = \mathbf{Q}'_1 \mathbf{D}_1 \mathbf{P}'_1 \mathbf{P}_2 \mathbf{D}_2 \mathbf{Q}_2 . \] Taking determinants and using |Qi| = ±1, \[ \det \left( \mathbf{A}^{\mathrm T}\mathbf{B} \right) = \left\vert \mathbf{A}^{\mathrm T}\mathbf{B} \right\vert = \left\vert \mathbf{Q}'_1 \mathbf{D}_1 \mathbf{P}'_1 \mathbf{P}_2 \mathbf{D}_2 \mathbf{Q}_2 \right\vert = \pm\left\vert \mathbf{D}_1 \right\vert \left\vert \mathbf{D}_2 \right\vert \left\vert \mathbf{P}'_1 \mathbf{P}_2 \right\vert . \] Thus \[ \left\vert \mathbf{A}^{\mathrm T}\mathbf{B} \right\vert^2 = \left\vert \mathbf{D}_1 \right\vert^2 \left\vert \mathbf{D}_2 \right\vert^2 \left\vert \mathbf{P}_1^{\mathrm T} \mathbf{P}_2 \right\vert^2 . \] On the other hand \[ \left\vert \mathbf{A}^{\mathrm T} \mathbf{A} \right\vert = \left\vert \mathbf{D}_1 \right\vert^2 , \qquad \left\vert \mathbf{B}^{\mathrm T} \mathbf{B} \right\vert = \left\vert \mathbf{D}_2 \right\vert^2 . \] Therefore, the desired inequality is reduced to showing \[ \left\vert \mathbf{P}_1^{\mathrm T} \mathbf{P}_2 \right\vert\leq 1. \]

Step 2: Why |PTP₂| ≤ 1.

The matrices P₁ and P₂ have orthonormal columns, so they each represent an orthonormal basis of an n-dimensional subspace of ℝm, W₁ ⊆ ℝm and W₂ ⊆ ℝm, respectively. The matrix \[ \mathbf{M} = \mathbf{P}_1^{\mathrm T} \mathbf{P}_2 \] is the change‑of‑basis matrix between these two orthonormal bases.

A fundamental fact from the theory of principal angles between subspaces states:

The singular values of M are cosθ₁ , … , cosθₙ, where θ₁, … , θₙ are the principal angles between the column spaces of P₁ and P₂.

Thus \[ |\mathbf{M}|=\prod _{k=1}^n\cos \theta _k. \] Since |cosθk| ≤ 1 for each k, \[ \left\vert \mathbf{P}_1^{\mathrm T}\mathbf{P}_2 \right\vert \leq 1. \]

We can also give an algebraic explanation of inequality |PTP₂| ≤ 1, which is based on Hadamard’s inequality: \[ |\mathbf{M}|\leq \prod _{j=1}^n \left\| \mathbf{M}_j \right\| , \qquad \forall \mathbf{M} \in \mathbb{R}^{n\times n} , \] where Mj are the column vectors. Apply this to M = PTP₂:

  • Each column of M is PTpj, where pj is a column of P₂.
  • Since P₁ has orthonormal columns, PT is a contraction: \[ \| \mathbf{P}_1^{\mathrm T}\mathbf{x}\| \leq \| \mathbf{x}\| . \]
  • Each pj has norm 1 (because columns of P₂ are orthonormal).
Thus, each column of M has norm ≤ 1, and Hadamard gives \[ |\mathbf{M}|\leq 1. \]

Step 3: Equality conditions |PTP₂| = 1 occurs if and only if \[ \cos \theta_1 = \cdots = \cos \theta_n = 1, \] i.e., all principal angles are zero. This means the two subspaces coincide and the two orthonormal bases differ only by an orthogonal transformation: \[ \mathbf{P}_2 = \mathbf{P}_1 \mathbf{C},\qquad \mathbf{C}\in O(n). \] Returning to the SVD expressions, this implies \[ \mathbf{B} = \mathbf{A}\left( \mathbf{Q}_1^{\mathrm T}\mathbf{C}{\bf Q}_2 \right) , \] and the matrix in parentheses is nonsingular. Thus, \[ \mathbf{B} = \mathbf{A\,C} \] for some nonsingular matrix C.

Combining this with the earlier rank‑deficient case completes the proof.

Conclusion: The inequality \[ \left\vert \mathbf{A}^{\mathrm T}\mathbf{B}\right\vert^2 \leq \left\vert \mathbf{A}^{\mathrm T}\mathbf{A}\right\vert \, \left\vert \mathbf{B}^{\mathrm T}\mathbf{B} \right\vert \] is a determinantal analogue of the Cauchy–Schwarz inequality, and equality holds precisely when the column spaces of A and B coincide (up to an invertible transformation or when one of the matrices fails to have full column rank.

   
Example 8: We use Mathematica to generate pseudorandom two matrices.
   matA = RandomInteger[{-6, 6}, {6, 6}]
   {{4, 5, -2, 1, 5, -5}, {3, 3, 5, 6, -2, -6}, {-6, -2, -6, -6, 3, 0}, {0, -3, -3, -1, 6, 2}, {-1, 2, -5, -6, 0, 3}, {-5, 4, -3, 1, 6, 5}}
\[ \mathbf{A} = \begin{bmatrix} 4& 5& -2& 1& 5& -5 \\ 3& 3& 5& 6& -2& -6 \\ -6& -2& -6& -6& 3& 0 \\ 0& -3& -3& -1& 6& 2 \\ -1& 2& -5& -6& 0& 3 \\ -5& 4& -3& 1& 6& 5 \end{bmatrix} , \]
   matB = RandomInteger[{-7, 7}, {6, 6}]
   {{-5, 3, -1, 1, 4, 0}, {-6, 3, -1, -1, -3, -3}, {-6, -2, 6, 2, -6, 5}, {-7, -6, -5, -5, -5, 4}, {7, 4, -1, -5, -5, 1}, {0, 5, 6, 0, -7, -7}} 5}}
\[ \mathbf{B} = \begin{bmatrix} -5& 3& -1& 1& 4& 0 \\ -6& 3& -1& -1& -3& -3 \\ -6& -2& 6& 2& -6& 5 \\ -7& -6& -5& -5& -5& 4 \\ 7& 4& -1& -5& -5& 1 \\ 0& 5& 6& 0& -7& -7 \end{bmatrix} , \] Now we calculate determinants:
   Det[Transpose[matA] . matB]^2
   3547645567098071
   Det[Transpose[matA] . matA] * Det[Transpose[matB] . matB]
   354764556709807104
Subtracting these two numbers, we get
   % - 3547645567098071
   351216911142709033
Since this number is positive, we conclude that Theorem 4 is valid for our 6×6 matrices.    ■
End of Example 8
Otto Hölder
A famous German mathematician Ludwig Otto Hölder (1859--1937) is credited for establishing the following inequality in 1889, which now bears his name. However, this inequality was first found by Leonard James Rogers (1888).

Its proof is based on Young’s inequality

\[ ab \leqslant \frac{a^p}{p} + \frac{b^q}{q} . \]
Theorem 5 (Hölder's inequality): For two vectors u and v in ℝn, and for any p, q > 1 such that \( \displaystyle \quad \frac{1}{p} + \frac{1}{q} = 1 , \quad \) the inequality \[ \sum_{i=1}^n \left\vert u_i \, v_i \right\vert \leqslant \left( \sum_{i=1}^n \left\vert u_i \right\vert^p \right)^{1/p} \left( \sum_{i=1}^n \left\vert v_i \right\vert^q \right)^{1/q} \] holds, with equality if and only if one of the vectors is a scalar multiple of the other.
Step 1: We begin with the basic inequality for nonnegative real numbers 𝑎, b ≥ 0: \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q}, \] where p, q > 1 and \( \displaystyle \quad \frac{1}{p}+\frac{1}{q}=1. \quad \)

This is a standard form of Young’s inequality. We provide three proofs of this inequality.

A. Proof of Young’s inequality based on convexity of tet.

Define \[ \phi (t)=e^t. \] This function is convex on ℝ, so for any λ ∈ [0, 1] and any x, y ∈ ℝ, \[ \phi (\lambda x+(1-\lambda )y)\; \leq \; \lambda \phi (x)+(1-\lambda )\phi (y). \] Take \[ \lambda =\frac{1}{p},\quad 1-\lambda =\frac{1}{q}, \] and choose \[ x=p\ln a,\quad y=q\ln b \] (with the convention that if 𝑎 = 0 or b = 0, the inequality is trivial). Then \[ \lambda x+(1-\lambda )y=\frac{1}{p}(p\ln a)+\frac{1}{q}(q\ln b)=\ln a+\ln b=\ln (ab). \] Applying convexity: \[ e^{\ln (ab)}=ab\; \leq \; \frac{1}{p}e^{p\ln a}+\frac{1}{q}e^{q\ln b}=\frac{1}{p}a^p+\frac{1}{q}b^q. \] This is exactly Young’s inequality.

Equality condition: Convexity of et is strict, so equality holds if and only if x = y, that is, \[ p\ln a=q\ln b\quad \Longleftrightarrow \quad \ln (a^p)=\ln (b^q)\quad \Longleftrightarrow \quad a^p=b^q. \]

B: Proof of Young’s inequality based on direct calculus.

Fix b ≥ 0 and define for 𝑎 ≥ 0: \[ f(a)=\frac{a^p}{p}+\frac{b^q}{q}-ab. \] We want to show f(𝑎) ≥ 0 for all 𝑎 ≥ 0. Compute the derivative: \[ f'(a)=a^{p-1}-b. \] Critical point: \[ f'(a)=0\quad \Longleftrightarrow \quad a^{p-1}=b\quad \Longleftrightarrow \quad a=b^{1/(p-1)}. \] Second derivative: \[ f''(a)=(p-1)a^{p-2}\geq 0, \] so f is convex and the critical point is a global minimum. Evaluate f at this point: \[ a=b^{1/(p-1)}. \] Note that q = p/(p-1). Then \[ f(a)=\frac{a^p}{p}+\frac{b^q}{q}-ab=\frac{b^q}{p}+\frac{b^q}{q}-b^{1/(p-1)}b=b^q\left( \frac{1}{p}+\frac{1}{q}-1\right) =b^q(1-1)=0. \] Since this is the global minimum and f(𝑎) ≥ 0 for all 𝑎, we obtain \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q}, \] with equality if and only if 𝑎p-1 = b, i.e., 𝑎p = bq.

C: Proof of Young’s inequality based on Legendre transformation because the Legendre transform gives this inequality almost immediately.

Consider the convex function \[ \phi (x)=\frac{x^p}{p},\qquad x\geq 0. \] Its Legendre transform φ is \[ \phi ^*(y)=\sup _{x\geq 0}\left( xy-\frac{x^p}{p}\right) . \] To find the maximizing x, differentiate: \[ \frac{\text d}{{\text d}x}\left( xy-\frac{x^p}{p}\right) =y-x^{p-1}. \] Set to zero: \[ y=x^{p-1}\quad \Longrightarrow \quad x=y^{1/(p-1)}. \] Plug this back into the expression: \[ \phi ^*(y)=xy-\frac{x^p}{p}=y^{\frac{1}{p-1}}y-\frac{1}{p}\left( y^{\frac{1}{p-1}}\right) ^p. \] Compute the exponents carefully. Since \[ \frac{1}{p-1}=\frac{q}{p}\quad \mathrm{because}\quad q=\frac{p}{p-1}, \] we have \[ y^{\frac{1}{p-1}}=y^{\frac{q}{p}}, \] and \[ \left( y^{\frac{1}{p-1}}\right) ^p=y^{\frac{p}{p-1}}=y^q. \] So \[ \phi ^*(y)=y^{\frac{q}{p}}y-\frac{1}{p}y^q=y^{\frac{q}{p}+1}-\frac{1}{p}y^q. \] But \[ \frac{q}{p}+1=\frac{q+p}{p}=\frac{\frac{p}{p-1}+p}{p}=\frac{\frac{p+p(p-1)}{p-1}}{p}=\frac{\frac{p+p^2-p}{p-1}}{p}=\frac{\frac{p^2}{p-1}}{p}=\frac{p}{p-1}=q, \] so \[ y^{\frac{q}{p}+1}=y^q. \] Therefore, \[ \phi ^*(y)=y^q-\frac{1}{p}y^q=\left( 1-\frac{1}{p}\right) y^q=\frac{1}{q}y^q. \] Hence, we have explicitly: \[ \phi (x)=\frac{x^p}{p},\qquad \phi ^*(y)=\frac{y^q}{q}. \] Now we apply the Fenchel–Young inequality \[ xy\; \leq \; \phi (x) +\phi ^*(y), \] which is valid for any convex function φ and its Legendre transform φ and for all x, y ≥ 0, with equality if and only if \[ b=a^{p-1}\quad \Longleftrightarrow \quad a^p=b^q. \] Applying this inequality with our φ and φ gives \[ xy\; \leq \; \frac{x^p}{p}+\frac{y^q}{q}. \]    ▣

We will apply Young’s inequality componentwise.

Step 2: Normalization and application to each component.

Assume first that both u and v are nonzero. Define \[ A=\left( \sum _{i=1}^n|u_i|^p\right) ^{1/p},\qquad B=\left( \sum _{i=1}^n|v_i|^q\right) ^{1/q}. \] If either A = 0 or B = 0, then one of the vectors is the zero vector and the inequality is trivial, with equality.

Now define normalized sequences \[ x_i=\frac{|u_i|}{A},\qquad y_i=\frac{|v_i|}{B}. \] Then \[ \sum _{i=1}^nx_i^p=1,\qquad \sum _{i=1}^ny_i^q=1. \] Apply Young’s inequality to each pair (xi, yi): \[ x_i y_i \leq \frac{x_i^p}{p}+\frac{y_i^q}{q}. \] Summing over i from 1 to n, we obtain \[ \sum _{i=1}^nx_iy_i\; \leq \; \frac{1}{p}\sum _{i=1}^nx_i^p+\frac{1}{q}\sum _{i=1}^ny_i^q=\frac{1}{p}\cdot 1+\frac{1}{q}\cdot 1=1. \] Thus, \[ \sum _{i=1}^n\frac{|u_i|}{A}\cdot \frac{|v_i|}{B}\; \leq \; 1, \] or equivalently, \[ \sum _{i=1}^n|u_iv_i|\; \leq \; AB=\left( \sum _{i=1}^n|u_i|^p\right) ^{1/p}\left( \sum _{i=1}^n|v_i|^q\right) ^{1/q}. \] This is precisely Hölder’s inequality.

Step 3: Equality conditions.

We now analyze when equality holds. From the derivation, equality in Hölder’s inequality requires equality in Young’s inequality for each index i with xiyi ≠ 0. For a fixed i, equality in \[ x_iy_i\leq \frac{x_i^p}{p}+\frac{y_i^q}{q} \] holds if and only if \[ x_i^p=y_i^q. \] Thus, for all i such that ui ≠ 0 and vi ≠ 0, we must have \[ \left( \frac{|u_i|}{A}\right) ^p=\left( \frac{|v_i|}{B}\right) ^q. \] Equivalently, \[ \frac{|u_i|^p}{A^p}=\frac{|v_i|^q}{B^q}. \] Since \( \displaystyle \quad\sum _i|u_i|^p = A^p \quad\mbox{and}\quad \sum _i|v_i|^q = B^q, \quad \) this condition implies that the ratios |ui|p and |vi|q are proportional across all indices where both are nonzero. In particular, there exists a constant λ > 0 such that \[ |u_i|^p=\lambda \, |v_i|^q\quad \mathrm{for\ all\ }i\mathrm{\ with\ }u_iv_i\neq 0. \] Taking p-th and q-th roots, this means \[ |u_i|=c\, |v_i|\quad \mathrm{for\ some\ }c>0\mathrm{\ and\ all\ relevant\ }i. \] Including signs, this says that u and v are proportional componentwise, i.e., there exists a scalar α ∈ ℝ such that \[ u_i=\alpha \, v_i\quad \mathrm{for\ all\ }i, \] with the understanding that if some components vanish, the relation still holds globally.

Thus equality in Hölder’s inequality holds if and only if one vector is a scalar multiple of the other (or one of them is the zero vector, which is a degenerate case of the same statement).

Summary:

  • Hölder’s inequality in \mathbb{R^{\mathnormal{n}}} follows from applying Young’s inequality componentwise to normalized sequences.
  • The inequality \[ \sum _{i=1}^n|u_iv_i|\; \leq \; \left( \sum _{i=1}^n|u_i|^p\right) ^{1/p}\left( \sum _{i=1}^n|v_i|^q\right) ^{1/q} \] is thus a direct consequence of convexity.
  • Equality holds precisely when u and v are linearly dependent, i.e. one is a scalar multiple of the other.
   
Example 9: To verify Hölder's inequality, we generate randomly two vectors of length 9:
u = RandomInteger[{-8, 8}, {1, 9}]
{{0, -5, 2, 8, -6, 7, -2, -6, -2}}
v = RandomInteger[{-8, 8}, {1, 9}]
{{-8, -6, 2, 4, -7, 2, -1, 1, 5}}
\begin{align*} \mathbf{u} &= \begin{pmatrix} 0& -5& 2& 8& -6& 7& -2& -6& -2 \end{pmatrix} , \\ \mathbf{v} &= \begin{pmatrix} -8& -6& 2& 4& -7& 2& -1& 1& 5 \end{pmatrix} . \end{align*} Observe that Mathematica represents these vectors as row matrices, so u and v ∈ ℝ1×9. Their dot product is \[ \mathbf{u} \bullet \mathbf{v} = 108 . \]
Dot[v . Transpose[u]]
{{108}}
For p = 2, we get the Euclidean norms: \[ \| \mathbf{u} \|_2 = \sqrt{222} , \qquad \| \mathbf{v} \|_2 = 10\sqrt{2} \]
Norm[u]
Sqrt[222]
Norm[v]
10 Sqrt[2]
and their product becomes \[ \| \mathbf{u} \|_2 \cdot \| \mathbf{v} \|_2 \approx 210.713 , \] which proves Cauchy's inequality (= Hölder's inequality with p = 2).
N[Norm[u]*Norm[v]]
210.713
For arbitrary p, say p = 4, we have \[ \| \mathbf{u} \|_{4} \cdot \| \mathbf{v} \|_{4/3} \approx 220.24 . \] We check whether row vectors u and v are vectors for Mathematica:
VectorQ[u]
False
VectorQ[v]
False
So Mathematica treats these vectors as matrices. We are forced to redefine these vectors as 9-tuples
u = {0, -5, 2, 8, -6, 7, -2, -6, -2}; v = {-8, -6, 2, 4, -7, 2, -1, 1, 5};
Now we ask Mathematica to calculate the product of norms:
N[Norm[u, 4]*Norm[v, 4/3]]
220.24
If we exchange vectors u and v, Hölder's inequality remains valid but with different numbers: \[ 108 = \mathbf{u} \bullet \mathbf{v} \leqslant \| \mathbf{v} \|_{4} \cdot \| \mathbf{u} \|_{4/3} \approx 226.995 . \]
N[Norm[v, 4]*Norm[u, 4/3]]
226.995
   ■
End of Example 9
Corollary 1: Suppose that A and B are square nonnegative definite matrices and α is a scalar satisfying 0 < α < 1. Then \[ \left\vert \mathbf{A} \right\vert^{\alpha} \left\vert \mathbf{B} \right\vert^{1-\alpha} \leqslant \left\vert \alpha\,\mathbf{A} + \left( 1 - \alpha \right) \mathbf{B} \right\vert , \] with equality if and only if A = B or αA + (1 − α)B is singular.
Since αA + (1 − α)B is also nonnegative definite, the Hölder's inequality for matrices formulated in Corollary 1 clearly holds when A or B is singular, with equality if and only if αA + (1 − α)B is also singular. For the remainder of the proof, we assume that both A and B are positive definite.
Using decomposition Theorem:
Let A and B be m × m symmetric matrices with B being positive definite. Let Λ = diag(λ₁, λ₂, … , λm), where λ₁, λ₂, … , λm are the eigenvalues of B−1A. Then a nonsingular matrix C exists, such that \[ \mathbf{C}\,\mathbf{A}\,\mathbf{C}^{\mathrm T} = \Lambda , \quad \mathbf{C}\,\mathbf{B}\,\mathbf{C}^{\mathrm T} = \mathbf{I}_m . \]

we can write A = TΛTt and B = TTt, where T is a nonsingular matrix, Λ = diag(λ₁, λ₂, … , λn), and λ₁, λ₂, … , λn are the eigenvalues of B−1A, and Tt is transpose matrix. Thus, the proof will be complete if we can show that \[ \left\vert \Lambda \right\vert^{\alpha} = \prod_{i=1}^n \lambda_i^{\alpha} \leqslant \left\vert \alpha \Lambda + \left( 1 - \alpha \right) \mathbf{I}_n \right\vert = \prod_{i=1}^n \left( \alpha \lambda_i + 1 - \alpha \right) , \] with equality if and only if Λ = In. This result is easily confirmed by showing the function g(λ) = αλ + 1 − α − λα is minimized at λ = 1 when 0 ≤ α ≤ 1.

   
Example 10: Using Mathematica, we randomly generate two matrices.
T = RandomInteger[{-7, 7}, {5, 5}]
{{5, 5, 6, 0, -7}, {-2, 5, 1, -7, 6}, {-3, -2, -3, 2, 3}, {-1, 6, 7, 3, 5}, {4, 4, 1, 3, 3}}
A = Transpose[T] . T
{{55, 31, 34, 17, -49}, {31, 106, 87, -9, 31}, {34, 87, 96, 11, -7}, {17, -9, 11, 71, -12}, {-49, 31, -7, -12, 128}}
\[ \mathbf{A} = \begin{bmatrix} 55& 31& 34& 17& -49 \\ 31& 106& 87& -9& 31 \\ 34& 87& 96& 11& -7 \\ 17& -9& 11& 71& -12 \\ -49& 31& -7& -12& 128 \end{bmatrix} ; \]
T = RandomInteger[{-7, 8}, {5, 5}]
{{-1, 1, 2, -6, 5}, {4, 7, 6, 7, 7}, {7, -2, -5, -7, -7}, {1, 7, 6, 5, -1}, {7, -6, -2, 1, 2}}
B = Transpose[T] . T
{{116, -22, -21, -3, -13}, {-22, 139, 108, 86, 49}, {-21, 108, 105, 93, 77}, {-3, 86, 93, 160, 65}, {-13, 49, 77, 65, 128}}
\[ \mathbf{B} = \begin{bmatrix} 116& -22& -21& -3& -13 \\ -22& 139& 108& 86& 49 \\ -21& 108& 105& 93& 77 \\ -3& 86& 93& 160& 65 \\ -13& 49& 77& 65& 128 \end{bmatrix} . \] Both symmetric matrices are positive definite because their eigenvalues are all positive:
N[Eigenvalues[A]]
{202.75, 161.15, 69.4332, 21.6113, 1.05561}
N[Eigenvalues[B]]
{378.699, 117.667, 86.0813, 61.9575, 3.59505}
Next, we calculate their determinants:
Det[A]
51753636
Det[B]
854392900
\[ \det\left( \mathbf{A} \right) = 51753636 , \qquad \det\left( \mathbf{B} \right) = 854392900 . \] Choosing α = ¼', we calculate the matrix C = ¼A + ¾B:
mat = (1/4)*A + (3/4)*B
{{403/4, -(35/4), -(29/4), 2, -22}, {-(35/4), 523/4, 411/4, 249/4, 89/ 2}, {-(29/4), 411/4, 411/4, 145/2, 56}, {2, 249/4, 145/2, 551/4, 183/4}, {-22, 89/2, 56, 183/4, 128}}
\[ \mathbf{C} = \frac{1}{4}\,\mathbf{A} + \frac{3}{4}\,\mathbf{B} = \begin{bmatrix} 403/4& -35/4& -29/4& 2& -22 \\ -35/4& 523/4& 411/4& 249/4& 89/2 \\ -29/4& 411/4& 411/4& 145/2& 56 \\ 2& 249/4& 145/2& 551/4& 183/4 \\ -22& 89/2& 56& 183/4& 128 \end{bmatrix} . \] Its determinant is \[ \det\left( \mathbf{C} \right) \approx 2.11706 \times 10^9 . \]
Det[mat]
541968030603/256
N[%]
2.11706*10^9
The left-hand side of the formula is \[ \left\vert \mathbf{A} \right\vert^{1/4} \cdot \left\vert \mathbf{B} \right\vert^{3/4} \approx 4.23866 \times 10^8 . \]
N[Det[A]^(1/4) * Det[B]^(3/4)]
4.23866*10^8
   ■
End of Example 10

 

Dot Product and Linear Transformations


The multiplication of scalars is an operation formally described by a mapping
\[ \mathbb{R}^2 \ni (x, y) \mapsto x\,y \in \mathbb{R} , \]
which is not linear. However, if we fix one variable say y = u, then the map xu x : ℝ ⇾ ℝ is linear. Hence, this multiplication gives rise to two families of linear maps, xu x, yv y : ℝ ⇾ ℝ. For this reason, we say that the dot product ℝ² = ℝ × ℝ ↦ ℝ is bilinear. The graph of this function is a surface containing two families of lines.
Plot3D[x*y, {x, -5, 5}, {y, -5, 5}]
Figure 1: The Dot Product in Plane Geometry

The fundamental significance of the dot product is that it is a linear transformation of vectors in each multiplier. This means that the function f(v) = uv is a linear functional for any fixed vector u. Then scalar product can be defined as a bilinear form:

\[ \left. \begin{array} {ccc} U \times V & \rightarrow & \mathbb{F} \\ (\mathbf{u}, \mathbf{v}) & \mapsto & \mathbf{u} \bullet \mathbf{v} \in \mathbb{F} \end{array} \right\} \qquad \mathbf{u} \in U, \ \mathbf{v} \in V . \]
Hence, the dot product is bilinear and this can be applied to arbitrary linear combinations, so that
\[ \left( \sum_i \alpha_i \mathbf{x}_i \right) \bullet \left( \sum_j \beta_j \mathbf{y}_i \right) = \sum_{i,j} \alpha_i \beta_j \left( \mathbf{x}_i \bullet \mathbf{y}_j \right) , \qquad \forall \alpha_i , \beta_j \in \mathbb{R} . \]
   
Example 11: We consider two (pseudo)randomly generated vectors from ℝ³: \[ \mathbf{a} = \begin{pmatrix} 0.116624 \\ 0.364083 \\ 0.571789 \end{pmatrix} , \quad \mathbf{b} = \begin{pmatrix} 0.864995 \\ 0.163004 \\ 0.717815 \end{pmatrix} . \]
a = RandomReal[{0, 1}, 3]
{0.116624, 0.364083, 0.571789}
b = RandomReal[{0, 1}, 3]
{0.864995, 0.163004, 0.717815}
Each of these vectors is a linear combination of standard unit vectors, \[ \mathbf{e}_1 = \mathbf{i} = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} , \quad \mathbf{e}_2 = \mathbf{j} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} , \quad \mathbf{e}_3 = \mathbf{k} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} . \] So \[ \mathbf{a} = 0.116624 \mathbf{i} + 0.364083 \mathbf{j} + 0.571789\mathbf{k} \] and \[ \mathbf{b} = 0.864995 \mathbf{i} + 0.163004 \mathbf{j} + 0.717815 \mathbf{k} . \] Since dot products of standard unit vectors are well known because they form orthonormal system, we can find the dot product of these vectors as linear combination: \begin{align*} \mathbf{a} \bullet \mathbf{b} &= \left( 0.116624 \mathbf{i} + 0.364083 \mathbf{j} + 0.571789\mathbf{k} \right) \\ &\bullet \left( 0.864995 \mathbf{i} + 0.163004 \mathbf{j} + 0.717815 \mathbf{k} \right) \\ &= \left( 0.116624 \right) \cdot \left( 0.864995 \right) \mathbf{i} \bullet \mathbf{i} + \left( 0.116624 \right) \cdot \left( 0.163004 \right) \mathbf{i} \bullet \mathbf{j} \\ & \quad + \left( 0.116624 \right) \cdot \left( 0.717815 \right) \mathbf{i} \bullet \mathbf{k} + \left( 0.364083 \right) \cdot \left( 0.864995 \right) \mathbf{j} \bullet \mathbf{i} \\ & \quad + \left( 0.364083 \right) \cdot \left( 0.163004 \right) \mathbf{j} \bullet \mathbf{j} + \left( 0.364083 \right) \cdot \left( 0.717815 \right) \mathbf{j} \bullet \mathbf{k} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.864995 \right) \mathbf{k} \bullet \mathbf{i} + \left( 0.571789 \right) \cdot \left( 0.163004 \right) \mathbf{k} \bullet \mathbf{j} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.717815 \right) \mathbf{k} \bullet \mathbf{k} \\ &= \left( 0.116624 \right) \cdot \left( 0.864995 \right) \mathbf{i} \bullet \mathbf{i} \\ & \quad + \left( 0.364083 \right) \cdot \left( 0.163004 \right) \mathbf{j} \bullet \mathbf{j} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.717815 \right) \mathbf{k} \bullet \mathbf{k} \\ &= 0.570664 . \end{align*} because \[ \mathbf{i} \bullet \mathbf{j} = \mathbf{i} \bullet \mathbf{k} = \mathbf{k} \bullet \mathbf{j} = 0 \] and \[ \mathbf{i} \bullet \mathbf{i} = \mathbf{j} \bullet \mathbf{j} = \mathbf{k} \bullet \mathbf{k} = 1 . \]
a . b
0.570664
   ■
End of Example 11

Applications

Scalar products are intimately associated with a variety of physical concepts. For example, if the vector is mean-centered---the average of all vector elements is subtracted from each element---then the dot product this vector with itself is called variance in statistics. So it provides a measurement of dispersion across a data set.

Vector fields assign a vector (magnitude and direction) to every point in a space, commonly representing physical quantities like wind velocity, fluid flow, or electromagnetic forces. For simplicity, let us consider a 2D vector field F(x, y) = = ⟨ P(x, y), Q(x, y) ⟩ (you can think of it as the velocity of fluid flowing over the plane) and let C be a smooth oriented curve in the plane. At each point of the directed path C we introduce a unit tangent vector t in the direction of the curve C, and a unit normal vector n pointing to our right as we travel along C. Vector field F can be decomposed into tangential and normal components:

\[ \mathbf{F} = \left( \mathbf{F} \bullet \hat{\bf t} \right) \hat{\bf t} + \left( \mathbf{F} \bullet \hat{\bf n} \right) \hat{\bf n} . \]
Only the second of these components carries fluid across C, known as the flux, but the first component is used for work determinantion when F represents a force field. The flux across C is
\[ \int _C \mathbf{F}\cdot \hat{\mathbf{n}}\, {\text d}s, \]
where ds is arc length. Suppose C is parametrized by
\[ \mathbf{r}(t)=\langle x(t),\, y(t)\rangle ,\qquad a\leq t\leq b. \]
Then tangent vector becomes
\[ \mathbf{r}'(t) =\langle x'(t),\, y'(t)\rangle , \]
and two perpendicular unit normals are
\[ \mathbf{n_{\mathrm{left}}}=\langle -T_y,\, T_x\rangle ,\quad \mathbf{n_{\mathrm{right}}}=\langle T_y,\, -T_x\rangle , \]
where \( \displaystyle \quad \mathbf{T}(t)=\frac{\mathbf{r}'(t)}{\| \mathbf{r}'(t)\| }. \quad \) You must choose one consistently—often “outward” relative to a region, or “left normal” if the curve is positively oriented.

The work done by a force applied at a point serves as a primary example of dot product because the work is defined as the product of the displacement and the component of the force in the direction of displacement (i.e., the projection of the force onto the direction of the displacement). Thus, the component of the force perpendicular to the displacement "does no work." If F is the force (in Newtons) and s is the displacement (in meters), then the work W is by definition equal to

\[ W = F_{\parallel} s = F\,s\,\cos\left( {\bf F}, {\bf s} \right) = {\bf F} \bullet {\bf s} \quad (in Joules). \]
Suppose the force makes an obtuse angle with the displacement, so that the force is "resisting." Then the work is regarded as negative, in keeping with formula above.    
Example 12: There are many physical examples of line integrals, but perhaps the most common is the expression for the total work done by a force F when it moves its point of application from a point A to a point B along a given curve C. We allow the magnitude and direction of F to vary along the curve. Let the force act at a point r and consider a small displacement dr along the curve; then the small amount of work done is dW = F • dr (note that dW can be either positive or negative). Therefore, the total work done in traversing the path C is \[ W_C = \int_C {\bf F} \bullet {\text d}{\bf r} . \]

Naturally, other physical quantities can be expressed in such a way. For example, the electrostatic potential energy gained by moving a charge q along a path C in an electric field E is     −qC E • dr. We may also note that Ampere's law concerning the magnetic field B associated with a current-carrying wire can be written as \[ \oint_C {\bf B} \bullet {\text d}{\bf r} = \mu_0 I , \] where I is the current enclosed by a closed path C traversed in a right-handed sense with respect to the current direction.    ■

End of Example 12
    The work W must of course be independent of the coordinate system in which the vectors F and x are expressed. The dot product as we know it from Eq.\eqref{EqDot.7} does not have this property. In general, using matrix transformation, we have
\[ s = {\bf A}\,\mathbf{x} \bullet {\bf A}\,\mathbf{y} = {\bf A}^{\mathrm T} {\bf A}\,\mathbf{x} \bullet \mathbf{y} . \]
Only if A−1 equals AT (i.e. if we are dealing with orthonormal transformations) s will not change. It appears as if the dot product only describes the physics correctly in a special kind of coordinate system: a system which according to our human perception is ‘rectangular’, and has physical units, i.e. a distance of 1 in coordinate x means indeed 1 meter in x-direction. An orthonormal transformation produces again such a rectangular ‘physical’ coordinate system. If one has so far always employed such special coordinates anyway, this dot product has always worked properly.

   
Example 13: In geometry, a barycentric coordinate system is a coordinate system in which the location of a point is specified by reference to a simplex (a triangle for points in a plane, a tetrahedron for points in three-dimensional space, etc.). Barycentric coordinates were invented by the German mathematician and theoretical astronomer August Ferdinand Möbius (1790--1868). He introduced them in his work "Der barycentrische Calcül" published in 1827.

2D case:    we start with flat (ℝ²).

The area of a 2D triangle whose vertices are 𝑎 = (x𝑎, y𝑎), b = (xb, yb), c = (xc, yc) (as shown in figure 1) is given by \[ \mbox{area} = \frac{1}{2} \begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix} . \]

pointA = Graphics[{Purple, Disk[{0, 0}, 0.02]}]; pointB = Graphics[{Purple, Disk[{1.1, 0.2}, 0.02]}]; pointC = Graphics[{Purple, Disk[{0.4, 0.9}, 0.02]}]; pointD = Graphics[{Purple, Disk[{0.45, 0.4}, 0.02]}]; line1 = Graphics[{Black, Thick, Line[{{0, 0.0}, {0.4, 0.9}, {1.1, 0.2}, {0, 0}}]}]; line2 = Graphics[{Brown, Thick, Line[{{0, 0.0}, {0.45, 0.4}, {0.4, 0.9}}]}]; line3 = Graphics[{Brown, Thick, Line[{{1.1, 0.2}, {0.45, 0.4}}]}]; txt = Graphics[{Black, Text[Style[Subscript[A, c], FontSize -> 18, Bold], {0.46, 0.2}], Text[Style[Subscript[A, b], FontSize -> 18, Bold], {0.3, 0.4}], Text[Style[Subscript[A, c], FontSize -> 18, Bold], {0.65, 0.45}], Text[Style["a", FontSize -> 18, Bold], {0.0, -0.1}], Text[Style["b", FontSize -> 18, Bold], {1.1, 0.0}], Text[Style["b", FontSize -> 18, Bold], {1.1, 0.1}], Text[Style["c", FontSize -> 18, Bold], {0.4, 1.0}]}]; Show[line1, pointA, pointB, pointC, pointD, line2, line3, txt]
Figure 1: Barycentric Coordinates

Note that this area is a "signed" area. In other words, it has a sign. To obtain the area the way we are used to define it, we take the absolute value. If the vertices live in 3D space, the area of the corresponding triangle is \[ \mbox{Area} = \frac{1}{2}\,\| (b-a) \times (c-b) \| . \] This always gives a positive answer.

Barycentric coordinates allow us to express the coordinates of p = (x, y) in terms of 𝑎, b, c. More specifically, the barycentric coordinates of p are the numbers β and γ such that \[ p = a + \beta\left( b-a \right) + \gamma \left( c -a \right) \] If we regroup 𝑎, b, c, we obtain \begin{align*} p &= a + \beta\,b - \beta\,a + \gamma\,c - \gamma\,a \\ &= \left( 1 - \beta - \gamma \right) a + \beta\,b + \gamma\,c . \end{align*} It is customary to define a third variable α by \[ \alpha = 1 - \beta - \gamma . \] Then we have \[ p = \alpha\,a + \beta\, b + \gamma\,c , \qquad \alpha + \beta + \gamma = 1 . \] The barycentric coordinates of the point p in terms of the points 𝑎, b, c are the numbers α, β, γ such that p = α𝑎 + βb + γc, with α + β + γ = 1.

Barycentric coordinate are defined for all points in the plane. They have several nice features:

  1. A point p is inside the triangle defined by 𝑎, b, c if and only if \[ 0 < \alpha < 1 , \quad 0 < \beta < 1 , \quad 0 < \gamma < 1 . \] This property provides an easy way to test if a point is inside a triangle.
  2. If one of the barycentric coordinates is 0 and the other two are between 0 and 1, the corresponding point p is on one of the edges of the triangle.
  3. If any barycentric coordinate is less than zero, then p must lie outside of the triangle.
  4. If two of the barycentric coordinates are zero and the third is 1, the point p is at one of the vertices of the triangle.
  5. By changing the values of α, β, γ between 0 and 1, the point p will move smoothly inside the triangle. This can (and will) be applied to other properties of the vertices such as color problem.
  6. The center of the triangle is obtained when α = β = γ = ⅓. If the triangle is made of a certain substance which is evenly distributed throughout the triangle, then these values of α, β, γ would give us the center of gravity.
Note that it is sufficient to find two parameters out of 𝑎, b, c because their sum is 1. One way of determination α, β, γ is to write equation \[ p = a + \beta\left( b-a \right) + \gamma \left( c -a \right) \] in terms of the coordinates of the various points involved. This gives us the following system \[ \begin{cases} x &= x_a + \beta \left( x_b - x_a \right) + \gamma \left( x_c - x_a \right) , \\y &= y_a + \beta \left( y_b - y_a \right) + \gamma \left( y_c - y_a \right) \end{cases} \] which can be solved using your favorite method.

Let A𝑎, Ab and Ac be as in figure 1 and let A denote the area of the triangle. Also note that the point inside the triangle on figure 1 is the point we called p. Consider the triangles in the figure. These are different triangles drawn for a fixed value of β. They have the same area since they have the same base and height. This area was denoted Ab on figure 1. Thus, we see that Ab only depends on β. Therefore, we have \[ A_b = C\beta . \] for some constant C. When p is on b that is when β = 1, we have Ab = A. Hence, A = C. Therefore we see that \[ \beta = \frac{A_b}{A} . \] Similarly, we have \[ \alpha = \frac{A_a}{A} , \qquad \gamma = \frac{A_c}{A} . \] In coordinates, these parameters become \[ \beta = \frac{\begin{vmatrix} x_a - x_c & x - x_c \\ y_a - y_c & y - y_c \end{vmatrix}}{\begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix}} , \] \[ \gamma = \frac{\begin{vmatrix} x_b - x_a & x - x_a \\ y_b - y_a & y - y_a \end{vmatrix}}{\begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix}} , \]

Let us assume that we are using the RGB model. That is all colors can be obtained by mixing R (red), G (green) and B (blue). Usually, with such a model, the level of each color channel is a number between 0 and 255. To specify the color at any point, we must specify a triplet (R, G, B) where R, G, and B are integers between 0 and 255. They indicate how much of red, green and blue is used in the color. When using Java, there is a built-in class to handle colors. It is called Color. This class has some built-in predefined colors. Here are some examples: Color.black, Color.blue, Color.cyan, Color.gray, Color.green, Color.magenta, Color.orange, Color.pink, Color.red, Color,white, Color.yellow. To get any other color, one uses a statement such as new Color(R,G,B) where R,G,B are integers between 0 and 255.

It is also possible to use a single integer to represent colors. Keeping in mind that an integer has 32 bits, bits 0 − 7 contain the R level, bits 8 − 15 contain the G level and bits 16−23 contain the B level. The remaining bits are unused and set to 0. Let us see now how barycentric coordinates can be used to smoothly color a triangle, given the color of its vertices. Using the notation above, let us assume that C𝑎 is the color of 𝑎, Cb is the color of b and Cc is the color of c. Each color is in fact a triplet. We will use the notation C𝑎 = (R𝑎, G𝑎, B𝑎) and similar notation for the remaining points. We would like to color the triangle so that there is a smooth coloring throughout the triangle. We use the fact that by changing the values of α, β, γ between 0 and 1, the point p = α𝑎 + βb + γc will move smoothly inside the triangle. In other words, small changes in α, β, γ will result in small changes in the location of p. We apply this to colors. We let \[ C = \alpha C_a + \beta C_b + \gamma C_c \] (we really do this for every color channel). Small changes in α, β, γ will result in small changes in the color. Therefore, the color will change smoothly as we move within the triangle. To color smoothly a triangle given the color of its vertices, we can use the following algorithm:

  1. For each point P = (x, y) inside the triangle, find α, β, γ.
  2. Use α, β, γ to interpolate the color of the point from the color of the vertices using relation \[ C = \alpha C_a + \beta C_b + \gamma C_c \]
  3. Plot the point with coordinates (x, y) and color computed above.

3D case:   

We use the same notation as in the 2D case. The only difference is that now points have three coordinates. So, we have 𝑎 = (x𝑎, y𝑎, z𝑎), b = (xb, yb, zb) and c = (xc, yc, zc). Barycentric coordinates are extended naturally to 3D triangles and they have the same properties. In other words, we have the same equation for point p = α𝑎 + βb + γc.

The only difference of 3D compared to 2D is that the area of the triangle is alwuas positive independently on orientation. We define the following quantities:

  • n is the normal to the triangle T with vertices (𝑎, b, c) in counterclockwise order. In other words, n = (b − 𝑎) × (c − 𝑎).
  • n𝑎 is the normal to T𝑎, the triangle with area A𝑎 as shown in figure 1. T𝑎 = (b, c, p) in counterclockwise order. Thus, n𝑎 = (cb) × (pb).
  • nb is the normal to Tb, the triangle with area Ab as shown in figure 1. Tb = (c, 𝑎, p) in counterclockwise order. Thus, nb = (𝑎 − c) × (p − c).
  • nc is the normal to Tc, the triangle with area Ac as shown in figure 1. Tc = (𝑎, b, p) in counterclockwise order. Thus, nc = (b − 𝑎) × (p − 𝑎).
The quantity \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \,\| \mathbf{n}_a \|} = 1 \quad \) if p is inside T, −1 otherwise. The same is true for \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_b}{\| \mathbf{n} \| \,\| \mathbf{n}_b \|} \quad \) and \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_c}{\| \mathbf{n} \| \,\| \mathbf{n}_c \|} .\quad \) Multiplying A𝑎 / A by \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \,\| \mathbf{n}_a \|} \quad \) will then give us a signed area, depending on whether p is inside T or outside. Since A𝑎 = ∥n𝑎∥ and A = ∥n∥, we have \begin{align*} \frac{A_a}{A} \,\frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \, \| \mathbf{n}_a \|} &= \frac{\| \mathbf{n}_a \|}{\| \mathbf{n} \|}\,\frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \, \| \mathbf{n}_a \|} \\ &= \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \|^2} . \end{align*} We obtain similar formulas for the other ratios. Thus, in the case of a 3D triangle, we can define the barycentric coordinates by: \[ \alpha = \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \|^2} , \quad\beta = \frac{\mathbf{n} \bullet \mathbf{n}_b}{\| \mathbf{n} \|^2} , \quad \gamma = \frac{\mathbf{n} \bullet \mathbf{n}_c}{\| \mathbf{n} \|^2} . \]    ■
End of Example 13

 

Dot product in floating point format


There are only finite many numbers in a computer. Besides integers, these are the so-called floating-point numbers:
\[ a = \pm\left( \frac{d_1}{b} + \frac{d_2}{b^2} + \cdots + \frac{d_t}{b^t} \right) \cdot b^{\alpha} . \]
Here α, t, b, d₁, d₂, … , dt are integers. The number b > 0 is said to be the base (usually b = 2 IEEE-754 binary format,, or its power in computations, but humans prefer b = 10) of a computer arithmetic. (Other bases (such as 4 or 8) were used in the early days of computer arithmetic, but studies performed in the 1970s showed that they are of little interest except 16.) The rational number in parentheses is the mantissa---it essentially dictates the precision of the floating-point number, and α is the exponent of a floating-point 𝑎. The numbers di ∈ {0, 1, … , b − 1} are termed digits with d₁ ≠ 0, and t is the length of the mantissa. After all, there are some integers, L and U, which are the bounds of α: L ≤ α ≤U. A special floating-point number is 𝑎 = 0.

All arithmetic operations performed by computers over floating point numbers are subject to round-off procedure, which is a mapping of real numbers into floating-point numbers. Let fl(x) denote the rounding-off result for x. For instance, fl(π) = 0.31415926 × 10−1. Then

\[ fl(x) = x \left( 1 + \varepsilon \right) , \]
where |ε| ≤ η as long as fl(x) ≠ 0. We define η as the lowest upper bound for |ε|:
\[ \eta = \frac{1}{2}\,b^{1-t} . \]

The dot product of two vectors, when calculated with floating-point numbers, can be affected by rounding errors. This is because floating-point arithmetic involves approximations, and repeated multiplication and addition can lead to accumulated errors, especially when dealing with very large or very small numbers (Avogadro's number = 6.023 x 1023 and Planck's constant = 6.626068 x 10-34).

Since xy does not necessarily have a representation in the floating-point format with bounded mantissa, there is no algorithm which always computes dot-products exactly. Thus, one seeks to construct an algorithm which computes a floating-point number with minimal deviation from the exact result where the deviation should not depend on the number of addends.

Although floating point error analysis done in numerical linear algebra, we consider a simple algorithm for evaluation of the dot product of two vectors xy:


s = 0;
for i = 1:n
    s = s + x(i)∗y(i);
end

At the first step, we have
\[ fl(x_1 y_1 ) = x_1 y_1 \left( 1 + \delta_1 \right) , \qquad\mbox{with} \quad |\delta_1 | \le \eta . \]
The next step is evaluated with 2 rounding errors: the multiplication: (1 + δ₂) and then the addition: (1 + ϵ₂).
\begin{align*} s_2 &= \left( s_1 + x_2 y_2 \,(1 + \delta_2 ) \right) \left( 1 + \epsilon_2 \right) \\ &= x_1 y_1 \left( 1 + \delta_1 \right)\left( 1 + \epsilon_2 \right) + x_2 y_2 \left( 1 + \delta_2 \right)\left( 1 + \epsilon_2 \right) . \end{align*}
The structure becomes clearer with s₃ = (s₂ + xy₃(1 + ϵδ₃))(1 + ₃) or
\begin{align*} s_3 &= x_1 y_1 \left( 1 + \delta_1 \right) \left( 1 + \epsilon_2 \right) \left( 1 + \epsilon_3 \right) \\ & \quad + x_2 y_2 \left( 1 + \delta_2 \right) \left( 1 + \epsilon_2 \right) \left( 1+ \epsilon_3 \right) \\ & \quad +x_3 y_3 \left( 1 + \delta_3 \right) \left( 1 + \epsilon_3 \right) . \end{align*}
Since |δi| ≤ η, |ϵi| ≤ η, we can “simplify” a bit:
\begin{align*} s_3 &= x_1 y_1 \left( 1+ \delta_1 + \epsilon_2 + \epsilon_3 \right) + x_2 y_2 \left( 1 + \delta_2 + \epsilon_2 + \epsilon_3 \right) \\ & \quad + x_3 y_3 \left( 1 + \delta_3 + \epsilon_3 \right) + O(\eta^2 ) . \end{align*}
Each term above has one δ and some ϵ’s. The emerging pattern is
\[ s_k = \sum_{i=1}^k x_i y_i \left( 1 + \mbox{up to $k$ rounding terms} \right) + O(\eta^2 ) . \]
The number of ϵ terms goes down as i increases (the last term will only have 1).

Now we apply the triangle inequality (and |δi|, |ϵi| ≤ η) to the difference between the computed and exact values:

\begin{align*} \left\vert s_n - \mathbf{x} \bullet \mathbf{y} \right\vert &= \left\vert \sum_{i=1}^n x_i y_i \left( 1 + \mbox{up to $n$ rounding terms} \right) - \sum_{i=1}^n x_i y_i + O(\eta^2 ) \right\vert \\ &= \left\vert \sum_{i=1}^n x_i y_i \left( \mbox{up to $n$ rounding terms} \right) + O(\eta^2 ) \right\vert \\ & \le \sum_{i=1}^n \left\vert x_i \right\vert \left\vert y_i \right\vert n \left\vert \mbox{rounding term bounded} \right\vert + O(\eta^2 ) \\ & \le \sum_{i=1}^n \left\vert x_i \right\vert \left\vert y_i \right\vert n \eta + O(\eta^2 ) \\ &= n\eta\,|\mathbf{x}| \bullet |\mathbf{y}| + O(\eta^2 ) , \end{align*}
where \( \displaystyle \quad |\mathbf{x}| \bullet |\mathbf{y}| = \sum_{i=1}^n \left\vert x_i \right\vert \left\vert y_i \right\vert \quad \) is the dot product of vectors composed of absolute values of their components. As long as xy ≠ 0, we can write this result as
\[ \frac{\left\vert \mathbf{x} \bullet \mathbf{y} - fl (\mathbf{x} \bullet \mathbf{y}) \right\vert}{\left\vert \mathbf{x} \bullet \mathbf{y} \right\vert} \le \eta n\,\frac{|\mathbf{x}| \bullet |\mathbf{y}|}{\left\vert \mathbf{x} \bullet \mathbf{y} \right\vert} + O( \eta^2 ) . \]
   
Example 14: We consider two vectors from ℚ4: \[ \mathbf{a} = \left( \frac{3}{7} , \ \frac{1}{3},\ \frac{12}{11} , \ \frac{7}{13} \right) , \quad \mathbf{b} = \left( \frac{7}{9} , \ \frac{3}{11} , \ \frac{11}{3} , \ \frac{13}{23} \right) . \] Their dot product in exact arithmetics is \[ \mathbf{a} \bullet \mathbf{b} = \frac{3589}{759} \approx 4.728590250329381 . \]
(3/7)*(7/9) + (1/3)*(3/11) + (12/11)*(11/3) + (7/13)*(13/23)
3589/759
If we round these vectors to two decimal places, we get \[ \mathbf{a}_2 = \left( 0.43 , \ 0.33,\ 1.09 , \ 0.54 \right) , \quad \mathbf{b}_2 = \left( 0.78 , \ 0.27 , \ 3.67 , \ 0.57 \right) . \] Their scalar product is \[ \mathbf{a}_2 \bullet \mathbf{b}_2 = 4.7326 . \]
0.43*0.78 + 0.33*0.27 + 1.09*3.67 + 0.54*0.57
4.7326
As you see, the second decimal place is not correct and error is −0.00400975. Now we consider the same vectors rounded to four decimal places: \[ \mathbf{a}_4 = \left( 0.4286 , \ 0.3333,\ 1.0909 , \ 0.5385 \right) , \quad \mathbf{b}_4 = \left( 0.7778 , \ 0.2727 , \ 3.6667 , \ 0.5652 \right) . \] Their dot product becomes \[ \mathbf{a}_4 \bullet \mathbf{b}_4 = 4.72862 . \] Its error is −0.0000289697.
0.4286*0.7778 + 0.3333*0.2727 + 1.0909*3.6667 + 0.5385*0.5652
4.72862
In order to convice you that dot product with at least four terms leads to possible (on average) modification in the last digit (of course, when all calculations are performed in floating point arithmetic), we consider another example.

We consider two numerical vectors with entries from ℚ4: \[ \mathbf{a} = \left( \frac{3}{7} , \ \frac{2}{5},\ \frac{6}{13} , \ \frac{11}{23} \right) , \quad \mathbf{b} = \left( \frac{7}{15} , \ \frac{5}{12} , \ \frac{23}{47} , \ \frac{5}{11} \right) . \] Their dot product in exact arithmetics is \[ \mathbf{a} \bullet \mathbf{b} = \frac{3481}{4230} \approx 0.8229314420803783 . \]

(3/7)*(7/15) + (2/5)*(5/12) + (6/13)*(13/27) + (11/23)*(23/47)
3481/4230
If we round these vectors to two decimal places, we get \[ \mathbf{a}_2 = \left( 0.43 , \ 0.4,\ 0.46 , \ 0.48 \right) , \quad \mathbf{b}_2 = \left( 0.47 , \ 0.42 , \ 0.48 , \ 0.49 \right) . \] Their scalar product is \[ \mathbf{a}_2 \bullet \mathbf{b}_2 = 0.8261 \approx 0.83 . \]
0.43*0.47 + 0.4*0.42 + 0.46*0.48 + 0.48*0.49
0.8261
As you see, the second decimal place is not correct and error is −0.00706856. Now we consider the same vectors rounded to four decimal places: \[ \mathbf{a}_4 = \left( 0.4286 , \ 0.4,\ 0.4615 , \ 0.4783 \right) , \quad \mathbf{b}_4 = \left( 0.4667 , \ 0.4167 , \ 0.4815 , \ 0.4894 \right) . \] Their dot product becomes \[ \mathbf{a}_4 \bullet \mathbf{b}_4 = 0.823 . \]
0.4286*0.4667 + 0.4*0.4167 + 0.4615*0.4815 + 0.4783*0.4894
0.823
   ■
End of Example 14

 

  1. Find the dot product of the following pairs of vectors. \[ {\bf (a)\ \ } \begin{pmatrix} 1 \\ -3 \\ 4 \end{pmatrix} , \quad \begin{pmatrix} 8 \\ 6 \\ 1 \end{pmatrix} \qquad {\bf (b)\ \ } \begin{pmatrix} 3 \\ 5 \\ 6 \end{pmatrix} , \quad \begin{pmatrix} -9 \\ 2 \\ 3 \end{pmatrix} \]
  2. how that for any vectors, x, y ∈ ℝn, we have \[ \| \mathbf{x} + \mathbf{y} \|^2 = \| \mathbf{x} \|^2 + 2\,\mathbf{x} \bullet \mathbf{y} + \| \mathbf{y} \|^2 . \]
  3. For vectors u, v ∈ ℝn, show that \( \displaystyle \quad (\mathbf{u} \bullet \mathbf{v}) = \frac{1}{4} \left( \| \mathbf{u} + \mathbf{v} \|^2 - \| \mathbf{u} - \mathbf{v} \|^2 \right) . \)
  4. Prove the parallelogram identity: \[ \| \mathbf{u} + \mathbf{v} \|^2 + \| \mathbf{u} - \mathbf{v} \|^2 = 2\, \| \mathbf{u} \|^2 + 2\,\| \mathbf{v} \|^2 . \]
  5. What is the angle between the vectors i + j and i + 3j?
  6. What is the area of the quadrilateral with vertices at (1, 1), (4, 2), (3, 7) and (2, 3)?
  7. Find cos(θ) where θ is the angle between the vectors \[ \left( 3, -2, 7 \right) \quad \mbox{and} \quad \left( 5,3,4 \right) . \]
  8. Find cos(θ) where θ is the angle between the vectors \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  9. Verify the Cauchy inequality for vectors \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  10. Find proju(v) where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  11. Find proju(v) where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  12. Decompose the vector v into v = v + v, where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  13. Show that \[ \mathbf{u} \bullet \left( \mathbf{v} - \mbox{proj}_u (\mathbf{v}) \right) = 0 \] and conclude every vector in &Ropf'n can be written as the sum of two vectors, one of which is orthogonal and another is parallel to the given vector.
  14. Are the vectors u = (2, 3, −1, 4) and v = (?, ?, ?, ?) T orthogonal?
   
  1. Aldaz, J. M.; Barza, S.; Fujii, M.; Moslehian, M. S. (2015), "Advances in Operator Cauchy—Schwarz inequalities and their reverses", Annals of Functional Analysis, 6 (3): 275–295, doi:10.15352/afa/06-3-20
  2. Bunyakovsky, Viktor (1859), "Sur quelques inequalities concernant les intégrales aux différences finies" (PDF), Mem. Acad. Sci. St. Petersbourg, 7 (1): 6
  3. Cauchy, A.-L. (1821), "Sur les formules qui résultent de l'emploie du signe et sur > ou <, et sur les moyennes entre plusieurs quantités", Cours d'Analyse, 1er Partie: Analyse Algébrique 1821; OEuvres Ser.2 III 373-377
  4. Deay, T. and Manogue, C.A., he Geometry of the Dot and Cross Products, Journal of Online Mathematics and Its Applications 6.
  5. Gibbs, J.W. and Wilson, E.B., Vector Analysis: A Text-Book for the Use of Students of Mathematics & Physics: Founded Upon the Lectures of J. W. Gibbs, Nabu Press, 2010.
  6. Magnus, J. R. (1988). Linear Structures. Charles Griffin, London.
  7. Marcus, M. & Minc, H. (1992). A Survey of Matrix Theory and Matrix Inequalities. Dover Publications. Corrected reprint of the 1969 edition.
  8. Schwarz, H. A. (1888), "Über ein Flächen kleinsten Flächeninhalts betreffendes Problem der Variationsrechnung" (PDF), Acta Societatis Scientiarum Fennicae, XV: 318, archived (PDF) from the original on 2022-10-09
  9. Solomentsev, E. D. (2001) [1994], "Cauchy inequality", Encyclopedia of Mathematics, EMS Press
  10. Steele, J. M. (2004). The Cauchy–Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities. Cambridge University Press.
  11. Vector addition