Introduction to Linear Algebra with Mathematica

# Preface

A typical Sturm--Liouville problem for a self-adjoint differential operator of second order (this operator is referred to as the Sturm--Liouville operator)
$L\left[ x, \texttt{D} \right] = - \texttt{D}\,p(x) \texttt{D} + q(x)\, \texttt{I} , \qquad \texttt{D} = \frac{\text d}{{\text d}x} , \quad \texttt{I} = \texttt{D}^0 ,$
consists of a differential equation on interval (𝑎, b) with a parameter λ
$$\label{EqOrtho.1} - \frac{\text d}{{\text d}x} \left( p(x)\,\frac{{\text d}y}{{\text d}x} \right) + q(x)\,y(x) = \lambda\,w(x)\,y(x) \qquad\mbox{or} \qquad \frac{\text d}{{\text d}x} \left( p(x)\,\frac{{\text d}y}{{\text d}x} \right) - q(x) \,y(x) + \lambda\,w(x)\,y(x) = 0 ,$$
subject to some boundary conditions
$$\label{EqOrtho.2} \left( \alpha_0 y(x) - \alpha_1 \frac{{\text d}y}{{\text d}x} \right)_{x=a} = 0, \quad \alpha_0^2 + \alpha_1^2 > 0, \qquad \left( \beta_0 y(x) + \beta_1 \frac{{\text d}y}{{\text d}x} \right)_{x=b} = 0 \quad \beta_0^2 + \beta_1^2 > 0.$$
Here p(x), q(x), and w(x) are some continuous functions of an interval (𝑎, b), and p(x) is also assumed to have continuous derivative on this interval. Boundary conditions of the third kind \eqref{EqOrtho.2} are chosen for illustration and could be of different type (for example, Fourier series expansion is based on periodic boundary conditions). What is important is that the operator generated by the Sturm--Liouville equation \eqref{EqOrtho.1} and boundary conditions \eqref{EqOrtho.2} should be self-adjoint. So any boundary conditions that support the Lagrange identity (that giarantees the operator to be self-adjoint) can be used instead of \eqref{EqOrtho.2}. The positive function w(x) is called the weight or density function.

The problem of finding a (real or complex) number λ such that the boundary value problem (BVP for short) \eqref{EqOrtho.1}, \eqref{EqOrtho.2} has a non-trivial solution is called the Sturm-Liouville problem. Such values of parameter λ are called eigenvalues and corresponding not identically equal to zero solutions are called eigenfunctions. When all parameters of the problem are not negative/positive, the Sturm--Liouville problem has nonnegative/positive eigenvalues. In this case, the (unbounded) linear differential operator $$L\left[ x, \texttt{D} \right]$$ is called nonnegative/positive.

Assuming that conditions on parameters of the problem \eqref{EqOrtho.1}, \eqref{EqOrtho.2} are fulfilled to guarantee existence of discrete egenvalues 0 ≤ λ0 < λ1 < λ2 < ··· , to which correspond eigenfunctions ϕ0(x), ϕ1(x), ϕ2(x), …. With this in hand, we consider the problem of approximating a given function f(x), 𝑎 ≤ xb, by a sum of linear combination of the eigenfunctions

$$\label{EqOrtho.3} f(x) = \sum_{n \ge 0} c_n \phi_n (x) , \qquad 0 ≤ x ≤ \ell .$$
Of course, it is always assumed that the function f(x) in expansion \eqref{EqOrtho.3} satisfies the homogeneous boundary conditions \eqref{EqOrtho.2}---otherwise convergence at endpoints fails. The word «approximate» can have a variety of meaning. Normally, we want to choose the coefficients cn in such a way that the value of the partial sum
$$\label{EqOrtho.4} S_N (f; x) = \sum_{n=0}^N c_n \phi_n (x)$$
is near that of f(x) for each value of x in the interval [𝑎, b]. In this case we speak of pointwise approximation, so
$\lim_{N\to\infty} S_N (f; x) = f(x) .$

Pointwise convergence is usually needed in order to compute the value of a solution by the method of separation of variables (see next part). However, it is much easier to find coefficients cn in such a way that SN(f; x) only approximate f(x) in the sense of least squares or, more briefly, in the mean.

When separation of variable method (see next chapter) is applied to partial differential equations, it leads to a corresponding Sturm--Liouville problem. However, it is only one part in constructing the solution to the (initial) boundary value problem. The second important part is representation of a function as a series over eigenfunctions. In many problems for second order partial differential equations, the set of eigenfunctions is orthogonal and coefficients in Eq.\eqref{EqOrtho.3} could be calculated explicitly. In the majority of equations of order higher than two, eigenfunctions are either not orthogonal or not complete and, therefore, separation of variable approach is not successful. Nevertheless, we are going to treat a wide class of partial differential equations of the second order for which separation of variables reduces the problems under consideration to the Sturm--Liouville problems having orthogonal complete set of eigenfunctions.

# Orthogonality

Let us consider a set X of real- or complex-valued functions on a finite interval [𝑎, b]. For positive weight function w and two arbitrary functions f and g from X, we can define an inner product by
$$\label{EqOrtho.5} \langle f\,\vert\, g \rangle = \langle f\,, \, g \rangle = \int_a^b f(x)^{\ast} g(x)\,w(x)\,{\text d} x = \int_a^b \overline{f(x)}\, g(x)\,w(x)\,{\text d} x ,$$
where asterisk or overline denote complex conjugate operation. Note that there are common ways to denote the inner product either with verical bar (in physics) or comma (in mathematics). So if $$f(x) = u(x) + {\bf j} v(x)$$ is a complex-valued function, then $$\overline{f(x)} = u(x) - {\bf j} v(x)$$ is its complex conjugate. With such introduced inner product and corresponding norm, we can treat the functions in 𝔏² as if they were vectors, establishing a clear analogy with the Euclidean space.

Once an inner oriduct is given, it generate the norm:

If f(x) is a function defined on an interval (finite or not) |𝑎, b|, we define the norm of f to be
$$\label{EqOrtho.6} \| f \| = \left\langle f(x) , f(x) \right\rangle^{1/2} = \left( \int_a^b \overline{f(x)}\,f(x)\,w (x) \,{\text d}x \right)^{1/2} .$$
The set of all functions on the interval |𝑎, b| having finite norm \eqref{EqOrtho.6} is denoted by 𝔏²([|𝑎, b], w) or simply 𝔏².
The complete set of functions with respect to the norm above is denoted by 𝔏²([𝑎, b], w), L²([𝑎, b], w), or simply 𝔏²([𝑎, b]) when weight function w is known.

Recall that the integral of a function over an interval [𝑎, b] does not depend on the values of the function at discrete number of points because they do not affect the integral to compute. Therefore, when one integrate a function over [𝑎, b], it does not matter what are the values of the function at the end points and even whether the function is defined at these points. When we don't care about end points, we use a special and convenient notation: |𝑎, b| which represent one of the following cases:

$|a,b| \quad\mbox{ means } \quad [a,b] \quad\mbox{or}\quad [a,b) \quad\mbox{or}\quad (a,b] \quad\mbox{or}\quad (a,b) .$
Two functions f(x) and g(x) from 𝔏² defined on some interval [𝑎, b] are called orthogonal, if the integral of the product of functions over the interval is zero:
$\langle f\,\vert\, g \rangle = \left\langle f\,, g \right\rangle = \int_a^b. \overline{f(x)}\,g(x)\,w(x)\,{\text d}x = \int_a^b. {f(x)}^{\ast} g(x)\,w(x) \,{\text d}x =0 .$
This property of functions to be oethogonal is abbreviated asfg.

Generally speaking, 𝔏²([𝑎, b], w) is the completion of this space,but we will not go into that here. To guarantee completeness of 𝔏²([𝑎, b], w), we need to utilize integration in the definition of norm ∥·∥ to be in Lebesgue sense, rather than Riemann one. Actually, the set of functions 𝔏² is a Hilbert space and its norm (a measure of the size of the function) is generated by the inner product \eqref{EqOrtho.5}.

𝔏²([|𝑎, b], w) is an example of an infinite-dimensional linear vector space, whose elements cannot be built up by linear combinations of a finite number of basis functions.) For example, Eq.\eqref{EqOrtho.3} is an infinite sum; when does it converge and does a converged sum always give the right answer? Not every infinite collection of orthogonal functions is rich enough to expand every function in 𝔏²([|𝑎, b], ρ); this is the question of completeness. How do we generate suitable sets of orthogonal functions? How do we get the coefficients cn when we know the function f and the basis functions { ϕn(x) }? We answer these questions in a row.

We answer the last question for arbitrary set of functions { ϕn(x) } require only the orthogonality property.

A collection of functions { ϕ0(x), ϕ1(x), … , ϕn(x), … } defined on |𝑎, b| is called orthogonal on |𝑎, b| if
$\left\langle \phi_i , \phi_j \right\rangle = \int_a^b \overline{\phi_i (x)}\,\phi_j (x) \,\rho (x)\,{\text d}x = 0 \qquad \mbox{for } i \ne j.$
A collection of functions { ϕ0(x), ϕ1(x), … , ϕn(x), … } defined on |𝑎, b| is called orthonormal on |a,b| if
$\left\langle \phi_i , \phi_j \right\rangle = \int_a^b \overline{\phi_i (x)}\,\phi_j (x) \,\rho (x)\,{\text d}x = \begin{cases} 0, & \ \mbox{ if } i \ne j , \\ 1 , & \ \mbox{ when } i=j . \end{cases}$

# Least Squares Approximation

Let us denote by 𝔏²([𝑎, b], w) a set of real or complex-valued functions f(x) having finite
$\| f(x) \|_2 = \left( \int_a^{b} \left\vert f(x) \right\vert^2 w (x)\,{\text d} x \right)^{1/2} , \qquad w(x) > 0 \quad x\in [a,b]. \tag{6}$

The least squares approximation means that the integral

$\| f(x) - S_N (f; x) \|^2 = \int_0^{\ell} \left\vert f(x) - S_N (f; x) \right\vert^2 w (x)\,{\text d} x$
approaches zero as N → ∞. The smallness of this integral does not imply that SN(f; x) is close to f(x) for all x. It does mean that SN(f; x) is close to f(x) except for a set of intervals whose total length is small. If
$\lim_{N\to \infty} \| f(x) - S_N (f; x) \|^2 = \lim_{N\to \infty} \int_0^{\ell} \left\vert f(x) - S_N (f; x) \right\vert^2 w (x)\,{\text d} x = 0,$
we say that the sequence SN(f; x) converges in the mean to f(x). This integral is called the mean square deviation of SN(f; x) from f(x).

We consider the problem of approximating f(x) in the sense of least squares for a restrictive class of eigenfunctions { ϕn(x) }n≥0. In particular, we assume that this class of eigenfunction is complete in 𝔏²([0, ℓ], w) and the eigenfunctions corresponding to distinct eigenvalues are orthogonal on the interval (0, ℓ) with respect to the positive weight function ρ(x):

$\langle \phi_n (x), \psi_m (x) \rangle = \int_a^{b} \phi_n (x)\,\psi_m (x) \,w (x)\,{\text d}x = 0, \qquad n\ne m.$

Example 1: We consider a set of Zernike polynomials that are orthogonal on the unit disk. They are named after optical physicist Frits Zernike (1888--1966), winner of the 1953 Nobel Prize in Physics and the inventor of phase-contrast microscopy. These polynomials play important roles in various optics branches such as beam optics and imaging.

There are even and odd Zernike polynomials. The odd Zernike polynomials are defined as

$Z_n^{-m} (r, \theta) = R_n^m (r)\,\sin (m\theta )$
(odd function over the azimuthal angle θ, where m and n are nonnegative integers with n integers with nm≥0, and even Zernike polynomials are defined as
$Z_n^m (r, \theta) = R_n^m (r)\,\cos (m\theta ) .$
Here 0 ≤r≤1, and
End of Example 1
Using orthogonality of eigenfunctions, we get
$\| f(x) - S_N (x) \|^2 = \int_0^{\ell} \left\vert f(x) - \sum_{k=0}^N c_k \phi_k (x) \right\vert^2 \rho(x)\,{\text d} x = \int_0^{\ell} \left\vert f(x) \right\vert^2 \rho(x)\,{\text d} x -2 \sum_{k=0}^N c_k \int_0^{\ell} \overline{f(x)}\,\phi_k (x) \rho(x)\,{\text d} x + \sum_{k=0}^N c_k^2 \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho(x)\,{\text d} x .$
If we complete the square, the right-hand side becomes
$\sum_{k=0}^N \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho(x)\,{\text d} x \left\{ c_k - \| \phi_k \|^{-2} \int_0^{\ell} \overline{f(x)}\,\phi_k (x) \rho(x)\,{\text d} x \right\} + \| f(x) \|^2 + \sum_{k=0}^N \| \phi_k \|^{-2} \int_0^{\ell} \overline{f(x)}\,\phi_k (x) \rho(x)\,{\text d} x .$
We rewrite this expression using the inner product:
$\sum_{k=0}^N \| \phi_k (x) \|^2 \left\{ c_k - \frac{1}{\| \phi_k \|^2}\,\langle f(x), \phi_k (x) \rangle \right\} + \| f(x) \|^2 - \sum_{k=0}^N \frac{\langle f(x), \phi_k \rangle}{\| \phi_k (x) \|^2} .$
Coefficients ck occur only in the first sum. Since this is a sum of squares, it is clearly minimized by making all its terms zero. In other words, we are forced to set
$$\label{EqOrtho.7} c_k = \frac{\langle f(x), \phi_k \rangle}{\| \phi_k (x) \|^2_2} = \frac{1}{\| \phi_k (x) \|^2_2} \,\int_0^{\ell} \overline{f(x)}\,\phi_k (x) \rho(x)\,{\text d} x , \qquad \| \phi_k (x) \|^2 = \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho(x)\,{\text d} x .$$
Therefore, this choice of the coefficients minimizes the error $$\| f(x) - \sum_{k=0}^N \phi_k (x) \|^2 .$$ We have shown that the formula \eqref{EqOrtho.7} provides the best approximation to f(x) in the sense of least squares. This choice of coefficients is independent on N, the number of terms. This conclusion follows from orthogonality of eigenfunctions. The coefficients ck given by \eqref{EqOrtho.7} are called the Fourier coefficients of f(x) with respect to orthogonal functions{ ϕk(x) }. The series $$\displaystyle f(x) \sim \sum_{k\ge 0} c_k \phi_k (x)$$ is called the Fourier series of f(x). Since this series may or may not converge (its meaning we will discuss separately), we cannot claim that $$\displaystyle f(x) = \sum_{k\ge 0} c_k \phi_k (x) .$$ It is a common to use a tilde notation:
$f(x) \sim \sum_{k\ge 0} c_k \phi_k (x)$
simply to denote the fact that ck are Fourier coefficients defined by Eq.\eqref{EqOrtho.7}, and the Fourier series gives a procedure to recover the function f(x) from its Fourier coefficients.

Example 2:

End of Example 2
Note:Formula \eqref{EqOrtho.7} for the Fourier coefficients can be derived directly if the series $$\displaystyle \sum_{k\ge 0} c_k \phi_k (x)$$ converges to f(x) uniformly. If this would be true, we could multiply the series by ϕn(x) ρ(x) and integrate from 0 to ℓ term by term. Because of orthogonality, all the integrals are zero except for k = n. Hence
$\langle f(x), \phi_n (x)\rangle = \int_0^{\ell} \overline{f(x)} \rho (x)\,{\text d}x = c_n \int_0^{\ell} \left\vert \phi_n (x) \right\vert^2 \rho (x) \,{\text d}x = \| \phi_n (x) \|^2_2 ,$
which gives Eq.\eqref{EqOrtho.7}.

# Completeness and the Parseval identity

By substituting the Fourier coefficients \eqref{EqOrtho.7} into the error estimate, we see that
$\| f - S_N \|^2_2 = \int_0^{\ell} \left\vert f(x) - \sum_{k= 0}^N c_k \phi_k (x) \right\vert^2 \rho (x)\,{\text d}x = \int_0^{\ell} \left\vert f(x) \right\vert^2 \rho (x)\,{\text d}x - \sum_{k= 0}^N c_k^2 \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho (x)\,{\text d}x = \| f(x) \|^2_2 - \sum_{k= 0}^N c_k^2 \| \phi_k (x) \|^2_2 .$
Since the left-hand side is nonnegative, we have
$\sum_{k= 0}^N c_k^2 \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho (x)\,{\text d}x \le \int_0^{\ell} \left\vert f(x) \right\vert^2 \rho (x)\,{\text d}x = \| f(x) \|^2_2$
for any N. When N approaches infinity, we get so called Bessel's inequality:
$$\label{EqOrtho.8} \sum_{k\ge 0} c_k^2 \| \phi_k (x) \|^2_2 = \sum_{k\ge 0} c_k^2 \int_0^{\ell} \left\vert \phi_k (x) \right\vert^2 \rho (x)\,{\text d}x \le \int_0^{\ell} \left\vert f(x) \right\vert^2 \rho (x)\,{\text d}x = \| f(x) \|^2_2 .$$
The partial sums corresponding the left-hand side are nondecreasing with N and are bounded from above by the norm ∥f(x)∥². Therefore, it converges if the norm ∥f(x)∥² is finite, so if f(x) ∈ 𝔏². We may allow f(x) or ρ(x) to be discontinuous or even unbounded at some points as long as this integral (possibly defined as an improper integral) is finite.

Example 3: Consider the collection of cosine functions { 1, cos(x), cos(2x), cos(3x), … } on an interval of length 2π, which we know from the previous sections is orthogonal. We have

$\| 1 \|^2 = \int_{-\pi}^{\pi} 1^2 {\text d}x = 2\pi \qquad\Longrightarrow \qquad \| 1 \| = \sqrt{2\pi} .$
For k ≥ 1, we have
$\| \cos (kx) \|^2 = \int_{-\pi}^{\pi} \cos^2 (kx)\,{\text d}x = \int_{-\pi}^{\pi} \frac{1 + \cos (2kx)}{2}\,{\text d}x = \left[ \frac{x}{2} + \frac{\sin (4kx)}{4k} \right]_{x=-\pi}^{x=\pi} = \pi .$
Hence, $$\| \cos (kx) \| = \sqrt{\pi} .$$ It follows that the collection of functions
$\left\{ \frac{1}{\sqrt{2\pi}} , \frac{\cos x}{\sqrt{\pi}} , \frac{\cos 2x}{\sqrt{\pi}} , \frac{\cos 3x}{\sqrt{\pi}} , \ldots \right\}$
is orthonormal on any interval of length 2π. ■
An orthogonal system of functions { ϕk(x) }k≥0 defined on an interval (finite or not) |𝑎, b| is said to be complete relative to a class of functions if there is no nontrivial function that is orthogonal to each member of the collection of the functions. In other words, if f(x) on |𝑎, b| satisfies $$\langle f, \phi_k \rangle =0$$ for all k ≥ 0 implies that f(x) ≡ 0 on [𝑎, b], or, more precisely, that $$\displaystyle \| f \|^2_2 = \int_a^b \left\vert f (x) \right\vert^2 \rho (x)\, {\text d}x =0 .$$

The term "complete" was introduced in 1910 by the Russian mathematician Vladimir Andreevich Steklov (1864--1926), a student of Alexander Lyapunov.

Example 4: Considered previously the collection of cosine functions { 1, cos(x), cos(2x), cos(3x), … } on an interval of length 2π in not complete, but it is orthogonal. Indeed, if f is any odd function (for instance, x or sinx), then $$\displaystyle f(-x) = -f(x) .$$ Calculations show that $$\langle f, \cos (kx) \rangle =0$$ because the product of odd function and an even function is an odd function. ■

Theorem 1: If $$\displaystyle \left\{ \phi_k (x) \right\}_{k\ge 0}$$ is a complete orthogonal system of functions on the interval |𝑎, b|, then every (piecewise continuous or, more generally, square integrable) function f(x) on |𝑎, b| has the expansion
$f(x) = \sum_{n\ge 0} \frac{\langle f, \phi_n \rangle}{\| \phi_n \|^2_2} \, \phi_n (x) . \tag{3}$
The series in the right hand side converges in 𝔏² sense to f(x) that means
$\lim_{N \to \infty} \int_a^b \left\vert f(x) - \sum_{k=1}^N \frac{\langle f, \phi_k \rangle}{\| \phi_k \|^2}\,\phi_k (x) \right\vert^2 {\text d}x =0 .$
An important remark:

The orthogonal expansion $$\displaystyle f(x) \sim \sum_{n\ge 0} c_n f_n (x)$$ for a complete orthogonal system on [𝑎, b], holds in 𝔏² sense, but not necessarily pointwise, i.e. for a fixed x∈[𝑎, b] the series on the right hand side might not necessarily converge and, even if it does, it might not converge to f(x).

Theorem 2: Let ϕ0, ϕ1, ϕ2, …, ϕn, … be the eigenfunctions of the self-adjoint Sturm--Liouville problem \eqref{EqOrtho.1},\eqref{EqOrtho.2} Let f and its derivative $$f'$$ be piecewise continuous on the closed interval $$0 \le x \le \ell .$$ Then the series
$f(x) = \sum_{n\ge 1} c_n \phi_n (x)$
whose coefficients cn are given by Eq.\eqref{EqOrtho.7} converges to $$\displaystyle \frac{1}{2} \left[ f(x+0) + f(x-0) \right]$$ at each point in the open interval (0,ℓ).

Example 5: Considered previously the collection of cosine functions { 1, cos(x), cos(2x), cos(3x), … } on an interval of length 2π in not complete, but it is orthogonal. Indeed, if f is any odd function (for instance, x or sinx), then $$\displaystyle f(-x) = -f(x) .$$ Calculations show that $$\langle f, \cos (kx) \rangle =0$$ because the product of odd function and an even function is an odd function. ■

Theorem 3: Let ϕ0, ϕ1, ϕ2, …, ϕn, … be the eigenfunctions of the self-adjoint Sturm--Liouville problem \eqref{EqOrtho.1},\eqref{EqOrtho.2} Let f and its derivative $$f'$$ be piecewise continuous on the closed interval $$0 \le x \le \ell .$$ Then the series
$f(x) = \sum_{n\ge 1} c_n \phi_n (x)$
whose coefficients cn are given by Eq.\eqref{EqOrtho.7} converges to $$\displaystyle \frac{1}{2} \left[ f(x+0) + f(x-0) \right]$$ at each point in the open interval (0,ℓ).

Example 6: There are known other orthogonal and complete sets of functions that are used in other than differential equations areas. In particular, Li--Torney system of step function is very useful in computer science. The collection of Walsh functions form a complete orthogonal set of functions that can be used to represent any discrete function---just like trigonometric functions can be used to represent any continuous function in Fourier analysis. These functions as well as the Walsh--Hadamard code are named after the American mathematician Joseph L. Walsh (1895--1973). Applications of the Walsh functions can be found wherever digit representations are used, including speech recognition, medical and biological image processing, and digital holography. The Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis. The Haar sequence was proposed in 1909 by the Hungarian mathematician Alfréd Haar (1885--1933).

The Faber–Schauder system is a Schauder basis for the space C([0, 1]) of continuous functions on [0, 1]. The Franklin system is obtained from the Faber--Schauder system by the Gram–Schmidt orthonormalization procedure.

Theorem (Parseval): Let $$\displaystyle \left\{ \phi_k (x) \right\}_{k\ge 0}$$ be a complete orthogonal system of functions on the interval |𝑎, b|, and f ∈ 𝔏²([𝑎, b], ρ). Then
$\sum_{k\ge 0} c_k^2 \| \phi_k (x) \|_2^2 = \| f(x) \|_2^2 \qquad\mbox{or} \qquad \sum_{k\ge 0} c_k^2 \int_a^b \left\vert \phi_k (x) \right\vert^2 \rho (x) \,{\text d} x = \int_a^b \left\vert f(x) \right\vert^2 \rho (x) \,{\text d} x .$
This theorem originated in the 1799 work of the French mathematician Marc-Antoine Parseval (1755--1836). It is also known as Rayleigh's energy theorem, or Rayleigh's Identity because the integral $$\displaystyle E[f] = \frac{1}{\ell} \int_{-\ell}^{\ell} \left\vert f (x) \right\vert^2 {\text d}x = \| f \|^2 / \ell$$ usually represents the energy in applications. Therefore, the Parseval theorem could be interpreted as the energy in f equals the energy in its Fourier coefficients. The function space formed by all functions with finite energy (= square integrable, is usually denoted by L²) fill so called the Hilbert space.

Parseval's identity holds if and only if

$\lim_{N\to\infty} \int_a^b \left\vert f(x) - \sum_{n=0}^N c_n \phi_n (x) \right\vert^2 \rho (x) \,{\text d}x = 0.$

Example 7: Expand the function

$f(x) =x, \qquad 0\le x \le 1,$
in terms of the normalized eigenfunctions { φn } of the Sturm--Liouville problem
$y'' + \lambda\,y =0, \qquad y(0) -y'(0) =0, y'(1) =0.$
This problem occurs, for instance, in the heat conduction problem in a bar or rod of unit length. The boundary condition at x = 0 corresponds to a rate of heat flow that is proportional to the temperature, and units are chosen so that the constant of proportionality is 1. The boundary condition at x = 1 corresponds to a perfect insulation there.

The solution of the differential equation $$y'' + \lambda\,y =0$$ may have one of three forms, depending on λ, so it is necessary to consider these cases. Let us start with λ = 0, the general solution becomes a linear function

$y = a+ b\,x ,$
with some constants a and b. The boundary conditions require that
$a-b =0 \qquad\mbox{and}\qquad b=0,$
respectively. The only solution is a = b = 0; hence, the boundary value problem has only trivial solution. So λ = 0 is not an eigenvalue.

If λ is negative, we can let λ = -μ² so that μ > 0. Then the differential equation for y becomes

$y'' - \mu^2 y =0$
and its general solution is
$y(x) = a\,\cosh (\mu x) + b\,\sinh (\mu x) ,$
with some constants a and b. From the boundary conditions
$\begin{split} y(0) - y'(0) =0 \qquad \Longrightarrow \qquad a - \mu\,b =0 , \\ y'(1) =0 \qquad \Longrightarrow \qquad a\,\sinh (\mu ) + b\,\cosh (\mu ) =0 . \end{split}$
We express a through b from the first equation, a = μ b, and substitute into the last equation:
$b\,\mu\,\sinh (\mu ) + b\,\cosh (\mu ) =0 \qquad \Longrightarrow \qquad \mu = - \coth (\mu ) .$
From the following figure
Plot[{x, -Coth[x]}, {x, 0, 3}, PlotStyle -> Thick]
it is clear that these two graphs do not intersect.

Next consider λ > 0, then the general solution is

$y(x) = a\,\cos \left( \sqrt{\lambda}x \right) + b\, \sin \left( \sqrt{\lambda}x \right) , \qquad \lambda > 0.$
The boundary condition at x = 0 requires
$a -b\,\sqrt{\lambda} =0 \qquad \Longrightarrow \qquad a = b\,\sqrt{\lambda} .$
From the boundary condition at x = 1 we obtain
$-a\,\sin \left( \sqrt{\lambda} \right) +b\, \cos \left( \sqrt{\lambda} \right) =0 \qquad \Longrightarrow \qquad \sqrt{\lambda} \,\sin \left( \sqrt{\lambda} \right) = \cos \left( \sqrt{\lambda} \right) .$
For nontrivial solution y we must have b ≠ 0 and λ must satisfy
$\sqrt{\lambda} = \cot \sqrt{\lambda} .$
Solutions of the above transcendent equation can be determined numerically.
FindRoot[Cot[Sqrt[x]] == Sqrt[x], {x, 1}]
{x -> 0.740174}
They can also be found approximately by sketching the graphs of $$f(\mu ) = \cot (\mu )$$ and $$g(\mu ) = \mu ,$$ where λ² = μ > 0.
Plot[{x, Cot[x]}, {x, 0, 10}, PlotStyle -> Thick]
Then taking square roots of the abscissas of points of intersection, we determine the roots. Using Mathematica, we find several first roots:
$\lambda_1 \approx 0.740174, \quad \lambda_2 \approx 11.7349, \quad \lambda_3 \approx 41.4388 , \quad \lambda_4 \approx 90.8082, \ldots .$
The other roots are given with reasonable accuracy by
$\lambda_n \approx \left[ (n-1)\pi \right]^2 , \quad n= 5,6,\ldots .$
Finally, the eigenfunction corresponding to the eigenvalue λ2 is
$\phi_n (x) = \sqrt{\lambda_n} \,\cos \left( \sqrt{\lambda_n} \,x \right) + \sin \left( \sqrt{\lambda_n} \,x \right) , \qquad n=1,2,\ldots .$

To find the expansion for f in terms of the eigenfunctions, we write

$f(x) = \sum_{n\ge 1} c_n \phi_n (x) = \sum_{n\ge 1} c_n \left[ \sqrt{\lambda_n} \,\cos \left( \sqrt{\lambda_n} \,x \right) + \sin \left( \sqrt{\lambda_n} \,x \right) \right] ,$
where the coefficients are given by
$\| \phi_n \|^2 c_n = \int_0^1 f(x)\,\phi_n (x) \,{\text d}x = \int_0^1 x \left[ \sqrt{\lambda_n} \,\cos \left( \sqrt{\lambda_n} \,x \right) + \sin \left( \sqrt{\lambda_n} \,x \right) \right] {\text d}x$
Mathematica will do this tedious job for you:
Integrate[x*(Sqrt[r]*Cos[Sqrt[r]*x] + Sin[Sqrt[r]*x]), {x, 0, 1}]
(-Sqrt[r] + (1 + r) Sin[Sqrt[r]])/r
Integrate[(Sqrt[r]*Cos[Sqrt[r]*x] + Sin[Sqrt[r]*x])^2, {x, 0, 1}]
1/4 (2 + 2 (1 + r) - 2 Cos[2 Sqrt[r]] + ((-1 + r) Sin[2 Sqrt[r]])/ Sqrt[r])
Therefore,
$c_n = 4\,\frac{\left( \frac{1 + \lambda_n}{\lambda_n} \right) \cos \sqrt{\lambda_n} -1}{\left( \lambda_n -1 \right) \sin \left( 2 \sqrt{\lambda_n} \right) + 2 \sqrt{\lambda_n} + 2 \sqrt{\lambda_n} \left( 1 + \lambda_n \right) - 2\sqrt{\lambda_n} \,\cos \left( 2 \sqrt{\lambda_n} \right)} , \quad n=1,2,\ldots .$

# Riemann--Lebesgue Lemma

Bessel's inequality shows that for any square integrable function f ∈ 𝔏²([0, ℓ], ρ), the series
$\sum_{n\ge 0} c_n \phi_n (x) , \qquad c_n = \frac{\langle f(x), \phi_n (x) \rangle}{\| \phi_n (x) \|^2} ,$
converges. Here { ϕn(x) \}n≥0 is any orthogonal system in 𝔏²([0, ℓ], ρ). A general term in any convergent series tends to zero:
$c_n = \frac{\langle f(x), \phi_n (x) \rangle}{\| \phi_n (x) \|^2} \,\to 0 \qquad \mbox{as} \quad n \to\infty .$
We present a slight extension of this conclusion.
Riemann--Lebesgue Lemma: If a normalized orthogonal sequence $$\displaystyle \frac{\phi_n (x)}{\| \phi_n (x) \|_2}$$ is uniformly bounded
$\left\vert \frac{\phi_n (x)}{\| \phi_n (x) \|_2} \right\vert \le K$
and iif f(x) is such that the (possibly improper) integral
$\| f(x) \|_1 = \int_a^b \left\vert f(x) \right\vert \rho (x)\,{\text d} x$
is finite, then
$\lim_{n\to\infty} \frac{\langle f(x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} \,\to 0 \qquad \mbox{as} \quad n \to\infty .$
For a function f(x) ∈ 𝔏¹([𝑎, b], ρ), we define its truncated version for any positive A:
$f_A (x) = \begin{cases} f(x) , & \ \mbox{ if} \quad |f(x) | \le A , \\ 0, & \ \mbox{ if} \quad |f(x) | > A. \end{cases}$
Then by definition of an improper integral
$\lim_{A \to +\infty} \int_a^b \left\vert f(x) - f_A (x) \right\vert \rho (x)\,{\text d}x = 0.$
Therefore, given an ϕ > 0, we can find an A such that
$\int_a^b \left\vert f(x) - f_A (x) \right\vert \rho (x)\,{\text d}x \le \frac{\epsilon}{2K} .$
Then
$\left\vert\frac{\langle f(x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} - \frac{\langle f_A (x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} \right\vert < \frac{\epsilon}{2}$
for all n. Furthermore, since |fA(x)| ≤ A, we have
$\int \left\vert f_A (x) \right\vert^2 \rho (x)\,{\text d} x \le A \int \left\vert f_A (x) \right\vert \rho (x)\,{\text d} x \le A \int \left\vert f (x) \right\vert \rho (x)\,{\text d} x ,$
which is finite. So we have
$\left\vert \frac{\langle f_A (x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} \right\vert \le \frac{\epsilon}{2} .$
Combining this with establishing previously
$\left\vert\frac{\langle f(x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} - \frac{\langle f_A (x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} \right\vert < \frac{\epsilon}{2}$
we obtain
$\left\vert \frac{\langle f (x), \phi_n (x) \rangle}{\| \phi_n (x) \|_2} \right\vert \le \epsilon$
for n sufficiently large.

# Convergence

Formula \eqref{EqOrtho.7} tells us that there is a mapping
$ℱ: f(x) \mapsto {\bf c} = \left[ c_0 , c_1 , c_2 , \ldots \right]$
of the Hilbert space 𝔏²([𝑎, b], ρ) of square integrable functions into the set ℓ² of all sequences (infinite vectors) such that $$\displaystyle \sum_{k\ge 0} c_k^2 \| \phi_k (x) \|_2^2$$ is finite. Then infinite series
$\sum_{k\ge 0} c_k \phi_k (x) = \sum_{k\ge 0} \frac{\langle f(x), \phi_k (x) \rangle}{\| \phi_k (x) \|_2^2}\, \phi_k (x) \tag{3}$
provides the inverse transform ℱ−1 that assigns to every infinite vector from ℓ² a function. Restoration of a function from its sequences of Fourier coefficients is an ill-posed problem. Therefore, there are known several regularizations to define an infinite series (3).

It is most natural to consider pointwise convergence, where we examine the discrepancy at every point. It is difficult (may be impossible) to provide general results about pointwise convergence for general orthogonal functions, but for Fourier series a lot has been done. We have to ask that f(x) be a lot smoother than just belonging to 𝔏², which is a "big" space of functions. You might think having f(x) continuous would be enough, but that obviously does not work at Gibbs phenomenon shows. Here is a slightly different statement of the conditions copied from Keener:

If function f(x) has continuous first derivatives on the interval, except possibly at a finite number of points at which there is a jump in f(x), where left and right derivatives must exist. Then the Fourier series of f converges to ½(f(x+)+f(x−)) for every point in the open interval (0,ℓ). At x=0 and ℓ, the series converges to ½(f(0+)+f(ℓ−)).

The most remarkable result regarding pointwise convergence almost everywhere was obtained by Hans Rademacher (1922) and Dmitrii Menshov (1923).

Rademacher–Menshov Theorem: If the coefficients cn of a Fourier series $$\displaystyle \sum_n c_n \phi_n (x)$$ with respect to an orthogonal system { ϕn(x) } on an interval satisfy
$\sum_{n\ge 0} \left\vert c_n \right\vert^2 \log (1+n)^2 < \infty ,$
then the Fourier series converges almost everywhere.

Another form of convergence is uniform convergence. This used to be the gold standard of convergence. For continuous functions, you can measure the maximum difference between the series and the function written:

$\lim_{N\to\infty} \| f(x) - S_N (f;x) \|_{\infty} = \lim_{N\to\infty} \sup_x \left\vert f(x) - S_N (f;x) \right\vert = 0$
The maximum discrepancy is another kind of norm, called the uniform norm (also called the sup norm, for supremum, which is the mathematically way of defining a maximum for discontinuous functions). Pointwise convergence does not imply uniform convergence! The Gibbs’ phenomenon, which is discussed later, is the most famous example of this. But uniform convergence does include pointwise.

# Where Do Orthogonal Functions Come From?

The first thing to be looked at is the Gram-Schmidt process.
=================================================

# Orthogonal Expansions

There are significant differences between the behavior of Fourier- and power-series expansions. A power series is essentially an expansion about a point, using only information from that point about the function to be expanded (including, of course, the values of its derivatives). We already know that such expansions only converge within a radius of convergence defined by the position of the nearest singularity. However, a Fourier series (or any expansion in orthogonal functions) uses information from the entire expansion interval, and therefore can describe functions that have “nonpathological” singularities within that interval. However, we also know that the representation of a function by an orthogonal expansion is only guaranteed to converge in the mean. This feature comes into play for the expansion of functions with discontinuities, where there is no unique value to which the expansion must converge. However, for Fourier series, it can be shown that if a function f(x) satisfying the Dirichlet conditions is discontinuous at a point x0, its Fourier series evaluated at that point will be the arithmetic average of the limits of the left and right approaches.

The system of trigonometric functions
$\left\{ 1, \cos x, \sin x , \cos 2x, \sin 2x, \cos 3x, \sin 3x , \ldots \right\}$
is a complete orthogonal system on [-π, π]. ■

Bessel inequality: Let H be an inner product space or Hilbert space and let { e1, e2, … , en, … } be an orthonormal sequence. Then for all elements fH from the space, we have
$\sum_{k\ge 1} \left\vert \langle e_k , f \rangle \right\vert^2 \le \| f \|^2 = \langle f, f \rangle ,$
where $$\langle \cdot , \cdot \rangle$$ denotes the inner product in the Hilbert space.
Corollarily (Bessel inequality): Consider the Fourier series associated with a square integrable on a symmetric interval [-ℓ,ℓ] function f, namely, $$\displaystyle f(x) \sim \frac{a_0}{2} + \sum_{n\ge 1} \left[ a_n \cos \cos \left( \frac{n\pi x}{\ell} \right) + b_n \sin \left( \frac{n\pi x}{\ell} \right) . \right]$$ Then
$\frac{a_0^2}{2} + \sum_{n\ge 1} \left( a_n^2 + b_n^2 \right) \le \frac{1}{\ell}\,\| f \|^2 = \frac{1}{\ell}\,\langle f, f \rangle = \frac{1}{\ell}\,\int_{-\ell}^{\ell} \left\vert f(x) \right\vert^2 {\text d}x = E[f] ,$
where E[f] is known as the energy of the 2ℓ-periodic function fL²[-ℓ,ℓ].

1. Huaien Li and David C. Torney, A complete system of orthogonal step functions, Proceedings of the American Mathematical Society, 132, No 12, 2004, pp. 3491--3502.
2. Menchoff, D.E., (1923), "Sur les séries de fonctions orthogonales. (Première Partie. La convergence.).", Fundamenta Mathematicae (in French), 4: 82–105, doi:10.4064/fm-4-1-82-105
3. Rademacher, Hans (1922), "Einige Sätze über Reihen von allgemeinen Orthogonalfunktionen", Mathematische Annalen, Springer Berlin / Heidelberg, 87: 112–138, doi:10.1007/BF01458040
4. Titchmarsh, E.C., Eigenfunction Expansions Associated with Second-order Differential Equations. Part I (1946); 2nd. edition (1962).
5. Titchmarsh, E.C., Eigenfunction Expansions Associated with Second-order Differential Equations. Part II (1958).
6. J. L. Walsh, A closed set of normal orthogonal functions, American Journal of Mathematics, 45, (1923), 5--24.