Preface

This section gives an introduction to a particular class of iterative root finding methods, called open methods. As opposed to closed or bracketing methods (discussed in the previous section), open methods do not necessarily bracket a root; therefore, they may diverge as computation progresses, but when they do converge, open methods are usually so much faster than bracketing methods. The root finding methods are applied for solving nonlinear equations of the form f(x) = 0. These are the problems of applied mathematics that occur most often for which explicit solutions are not available. We present some iterative algorithms for the determination of roots of these equations to any specified degree of accuracy. There are two large classes of iterative numerical methods for root finding, namely those that either involve a usage of derivatives or not.

We begin with Newton's method and its discretization---the secant method, which provide root-finding algorithms that use a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite-difference approximation of Newton's method. The section also contains some iterative algorithms to find a root of a nonlinear equation f(x) = 0 of order two and larger.

In this section, we consider iterative methods for finding roots of the equation f(x) = 0, where f(x) is a sufficiently smooth real-valued function. If this equation has a solution, it is called a zero/null of the function f. Every root finding problem can be transformed into any number of fixed point problems. Consider the following example.

Example: Let $$f(x) = x^5 -5x+3$$ and we try to find its null, so we need to solve the equation $$x^5 -5x+3 =0 .$$ Mathematica provides two real positive roots:

FindRoot[x^5 - 5 x + 3 == 0, {x, 0.5}]
Out[1]= {x -> 0.618034}
FindRoot[x^5 - 5 x + 3 == 0, {x, 1.5}]
Out[2]= {x -> 1.27568}
Solving for x, we convert the original equation into
$x = \frac{x^5 +3}{5} \qquad \Longrightarrow \qquad x_{k+1} = \frac{x^5_k +3}{5} , \qquad k=1,2,\ldots .$
When you try the starting point $$x_0 = 0.5 ,$$ the fixed point iteration will provide you a very good approximation to the null x = 0.618034. However, when you start with $$x_0 = 1.5 ,$$ the fixed point iteration will diverge.

Let us add and subtract x from the equation: $$x^5 -5x+3+x =x .$$ Expressing x, we derive another fixed point formula

$x = x^5 -4x +3 \qquad \Longrightarrow \qquad x_{k+1} = x^5_k -4x_k +3 , \qquad k=1,2,\ldots .$
Whatever the starting point was chosen, the above fixed point iteration diverges. ■

The iterative function ϕ(x) is consistent with the equation f(x) = 0 if its solution is also a solution of the fixed-point problem x = ϕ(x). The iterative function ϕ(x) is completely consistent with the equation f(x) = 0 if and only if these two problems x = ϕ(x) and f(x) = 0 are equivalent.

I. Newton’s Method or Newton--Raphson Method

Solving nonlinear equations is one of the most important topic in applied mathematics and numerical analysis. The following iterative method, named after the English mathematicians Isaac Newton (1642--1726) and Joseph Raphson (1648--1715), is used for solving the equation f(x) = 0:
$x_{k+1} = x_k - \frac{f(x_k)}{f' (x_k )} , \qquad k=0,1,2,\ldots . \tag{Newton.1}$

A concept of the the algorithm (Newton.1) was introduced by Isaac Newton in 1669, who used it to solve a cubic equation numerically. Before Newton other people also suggested analogous methods. The essential ideas of the method can be found in manuscripts of the fourteenth century written by the Persian astronomer and mathematician Ghiyāth al-Kāshī (1380--1429). His method of root-finding was not known by his contemporary mathematicians. We know that the English mathematician Henry Briggs (1561--1630) also published a work involving the numerical solution of polynomial equation in 1633. In 1685, John Wallis (1616--1703) described a method which---Willis said---was invented by I.Newton. However, the first systematic description of the method was given by J.Raphson in Aequationum Universalis, seu Ad Aequationes Algebraicas Resolvendas Methodus Generalis, & Expedita, Ex Nova Infinitarum Serierum Methodo, Deducta et Demonstrata (London, 1690).

The basic idea behind Newton's method is quite straightforward. Let xn denote the most recent approximation to zero, α, of the function f. Replace f by its tangent line approximation that goes through the point xn and takes the x-intercept of the tangent line as the next approximation xn+1 to the root.

 plot = Plot[1*x^2, {x, 0, 1}, AspectRatio -> 1, PlotStyle -> {Thickness[0.01], Blue}, Axes -> False, Epilog -> {PointSize[0.03], Pink, Point[{{0, 0}, {2, 2.8}}]}, PlotRange -> {{-0.5, 1.2}, {-0.2, 1.4}}]; arrow = Graphics[{Arrowheads[0.07], Arrow[{{-0.2, 0.3}, {1.2, 0.3}}]}]; l1 = Graphics[{Thick, Dashed, Line[{{1, 1}, {0.75, 0.3}}]}]; l2 = Graphics[{Thick, Dashed, Line[{{0.75, 0.55}, {0.6, 0.3}}]}]; l3 = Graphics[{Thick, Dashed, Line[{{1, 1}, {1, 0.3}}]}]; l4 = Graphics[{Thick, Dashed, Line[{{0.75, 0.55}, {0.75, 0.3}}]}]; t2 = Graphics[{Black, Text[Style[Subscript[x, 0], 16], {1, 0.25}]}]; t3 = Graphics[{Black, Text[Style[Subscript[x, 1], 16], {0.75, 0.25}]}]; t4 = Graphics[{Black, Text[Style[Subscript[x, 2], 16], {0.6, 0.25}]}]; t5 = Graphics[{Black, Text[Style[$Alpha], 16], {0.5, 0.35}]}]; t9 = Graphics[{Black, Text[Style["f(x)", Bold, 16], {0.66, 0.7}]}]; Show[arrow, plot, l1, l2, l3, l4, t2, t3, t4, t5, t9, Epilog -> {PointSize[0.02], Purple, Point[{{0.6, 0.3}, {0.75, 0.555}, {0.75, 0.3}, {1, 0.3}, {1, 1}}]}] Newton's method. Mathematica code The tangent line in Newton's method can be seen as the first degree polynomial approximation of f(x) at x = xn. Though, Newton's formula is simple and fast with quadratic convergence. It may fail miserably if at any stage of computation the derivative of the function f(x) is either zero or very small. Therefore, it has stability problems as it is very sensitive to the initial guess. ■ If the root being sought has multiplicity greater than one, the convergence rate is merely linear. However, if the multiplicity m of the root is known, the following modified algorithm preserves the quadratic convergence rate \[ x_{n+1} = x_n - m\,\frac{f\left( x_n \right)}{f'\left( x_n \right)} .$

Newton's method can be realized with the aid of the FixedPoint command:

NewtonZero[f_, x0_] := FixedPoint[# - f[#]/f'[#] &, x0]
Newton's method can be implemented in the Wolfram Language as
NewtonsMethodList[f_, {x_, x0_}, n_] :=
NestList[# - Function[x, f][#]/Derivative[1][Function[x, f]][#] &, x0, n]

Example: Suppose we want to find a square root of 4.5. First, we apply NewtonZero command:

NewtonZero[#^2 - 4.5 &, 2.0] (* to solve quadratic equation *)
Out[1]= 2.12132
which is actually 2.1213203435596424

We can verify our calculations with some Mathematica commands.

A = 6;
newton2[x_] := N[1/2 (x + A/x)]
NestList[newton2, 2.0, 10]
■

When Newton's method is applied to find a square root of a positive number A, we get the Babylonian formula
$p_{k+1} = \frac{1}{2} \left( p_k + \frac{A}{p_k} \right) , \qquad k=0,1,2,\ldots . \tag{Newton.2}$
newton2[x_] := N[1/2 (x + A/x)]
NestList[newton2, x0, n]
Ancient Mesopotamia was a civilization that existed in the area of modern Turkey, Syria, Iraq and Iran, between the Mediterranean Sea and the Persian Gulf. In the period 1900--1600 BC, Babylon was the capital city of Mesopotamia and the mathematics recorded at this time came to be known as Babylonian Mathematics. It turns out that Babylonian mathematicians discovered the Pythagorean theorem about 1000 years before the ancient Greek mathematician Pythagoras (570--495 BC).

The Babylonians had an accurate and simple method for finding the square roots of numbers. This method is also known as Heron's method, after the Greek mathematician who lived in the first century AD. Indian mathematicians also used a similar method as early as 800 BC. The Babylonians are credited with having first invented this square root method, possibly as early as 1900 BC.

Example: We use Newton's method to find a positive square root of 6. Starting with two initial guesses, we get

\begin{align*} p_0 &=2, \quad &p_0 =3.5 , \\ p_1 & = \frac{5}{2} = 2.5, \quad &p_1 = 2.60714, \\ p_2 &= \frac{49}{20} ={\bf 2.4}5 , \quad &p_2 = {\bf 2.4}5426 , \\ p_3 &= \frac{4801}{1960} \approx {\bf 2.4494897}, \quad &p_3 = {\bf 2.4494}943716069653 , \\ p_4 &= \frac{46099201}{18819920} \approx {\bf 2.44948974278317}9 , &p_4 = {\bf 2.44948974278}75517, \\ p_5 &= \frac{4250272665676801}{1735166549767840} \approx {\bf 2.449489742783178} , & p_5 = {\bf 2.449489742783178}. \end{align*}
As we see it converges pretty fast. In fact, this method converges quadratically---the number of correct decimal places approximately doubles with every step!
p[0] = 2
Do[Print["k= ", k, " " , "p= " , p[k] = (p[k - 1] + 6/p[k - 1])/2], {k, 1, 5}]
We can convert the square root problem to a fixed point iteration:
$p_{k+1} = \frac{p_k +6}{p_k +1} , \qquad k=0,1,2,\ldots ;$
which converges to $$\sqrt{6} = 2.449489742783178\ldots ,$$ but very slowly. However, another suggested iteration
$p_{k+1} = 3\,\frac{p_k^2 -1}{2\,p_k} , \qquad k=0,1,2,\ldots ;$
diverges.    ■

Newton's method is an extremely powerful technique—in general, the convergence is quadratic: when the algorithm approaches the root; the difference between the root and the approximation is squared (the number of accurate digits roughly doubles) at each step. It has been shown that both, Newton's method as well as Steffensen's method, are optimal algorithms of the second order that require the least possible number of arithmetic operations. Therefore, the Newton--Raphson iteration scheme is very popular in applications and this method is usually applied first when root should be determined. However, for a multiple root, Newton's method has only linear order of convergence. A procedure is known to speed up its order (see Varona's paper).

Theorem: Let f be twice continuously differentiable function on the interval [𝑎,b] with $$\alpha \in \left( a, b \right) \quad\mbox{and} \quad f(\alpha ) = 0 .$$ Suppose that $$f' (\alpha ) \ne 0.$$ Then there exists a positive number δ such that for any $$x_0 \in \left[ \alpha - \delta , \alpha +\delta \right] ,$$ the sequence $$\left\{ x_n \right\}$$ generated by Newton's algorithm converges to α.

Theorem: Suppose that a real-valued function f(x) of a real variable has two continuous derivatives, fC², in some neighborhood of α, where f(α) = 0 and the derivative f'(α) ≠ 0. Then if x0 is chosen sufficiently close to α, the iterates xk, k ≥ 0 of Newton's iteration scheme (Newton.1) will converge to α and $$\displaystyle \lim_{k\to\infty} \,\frac{\alpha - x_{k+1}}{\left( \alpha - x_k \right)^2} = - \frac{f'' (\alpha )}{2\,f' (\alpha )} .$$

Sometimes Newton’s method does not converge; the first above theorem guarantees that δ exists under certain conditions, but it could be very small. Newton's method is a good way of approximating solutions, but applying it requires some intelligence. You must beware of getting an unexpected result or no result at all. So why would Newton’s method fail? Well, the derivative may be zero at the root (degenerate when the function at one of the iterated points has zero slope); the function may fail to be continuously differentiable; one of the iterated points xn is a local minimum/maximum of f; and you may have chosen a bad starting point, one that lies outside the range of guaranteed convergence. It may happen that the Newton--Raphson method may start at an initial guess close to one root, but further iterates can jump to a location of one of several other roots. Depending on the context, each one of these may be more or less likely. Degenerate roots (those where the derivative is 0) are "rare" in general and we do not consider this case.

 Finally, there is a chance that Newton's method will cycle back and forth between two value and never converge at all. This failure is illustrated in the Figure below. a1 = {Arrowheads[Medium], Arrow[{{2.5, 6.875}, {3.5, 4}}]} a2 = {Arrowheads[Medium], Arrow[{{3.5, 7.125}, {2.5, 4}}]} b1 = Graphics[{Dashed, Red, a1}] b2 = Graphics[{Dashed, Blue, a2}] f[x_] = x^3 - 5*x^2 + 3.05*x + 15 graph = Plot[f[x], {x, 0.5, 4}, PlotRange -> {4, 16}, PlotStyle -> Thick] v1 = {Arrowheads[Medium], Arrow[{{2.5, 4}, {2.5, 6.875}}]} v2 = {Arrowheads[Medium], Arrow[{{3.5, 4}, {3.5, 7.125}}]} b11 = Graphics[{Dashed, Red, v1}] b22 = Graphics[{Dashed, Blue, v2}] Show[graph, b1, b2, b11, b22] Newton's method may loop. Mathematica code

For example, take a function $$f(x) = x^2 -2x +1$$ and start with x0 =0. The first iteration produces 1 and the second iteration returns back to 0, so the sequence will oscillate between the two without converging to a root. Newton's method diverges for the cube root, which is continuous and infinitely differentiable, except for x = 0, where its derivative is undefined.

Example: This example is a cautionary one. Consider the function $$f(x) = x\,e^{-x^2}$$ that obviously has a root at x = 0. However, if your starting point exceeds the root of the equation $$f'(x) = 0 ,$$ which is $$x_0 = 1/\sqrt{2} \approx 0.707107 ,$$ then Newton's algorithm diverges.

 Finally, there is a chance that Newton's method will cycle back and forth between two values and never converge at all. This failure is illustrated in Figure below. f[x_]= x*Exp[-x^2] D[x*Exp[-x^2], x] FindRoot[E^-x^2 - 2 E^-x^2 x^2 == 0, {x, 1}] Plot[x*Exp[-x^2], {x, -3, 3}, PlotStyle -> Thick, PlotLabel -> x*Exp[-x^2]] p[0] = 0.8 Do[Print["k= ", k, " " , "p= " , p[k] = p[k - 1] - p[k - 1]/(1 - 2*p[k - 1])], {k, 1, 15}] Graph of the exponential function. Mathematica code

The derivative of f(x) is zero at $$x = 1/\sqrt{2} \approx 0.707107 ,$$ and negative for x > 0.707107.
Simplify[D[f[x], x]*Exp[x^2]]
1 - 2 x^2
N[1/Sqrt[2]]
0.707107
For any initial guess x0 > 0.707107, the Newton's iterates move away from the zero; so it diverges.    ■

Example: Consider the cubic function $$f(x) = x^3 - 0.926\,x^2 + 0.0371\,x + 0.043 .$$ First we plot the function, and then define the range of 'x' where you want to see its null.

f[x_] := x^3 - 0.926*x^2 + 0.0371*x + 0.043
Plot[f[x], {x, 0, 1}, PlotStyle -> {Thick, Blue}]
xbegin = 0; xend = 0.5;
curve = Plot[f[x], {x, xbegin, xend}, PlotLabel ->
"Function on given interval", PlotStyle -> {Thick, FontSize -> 11}]
Taking initial guess
x0 = 0.3
we start Newton's iteration. Because this method uses a line tangent to the function at the initial guess, we must calculate the derivative of the function to find the slope of the line at this point. We will denote by g(x) the derivative of the function f(x).
g[x_] := f'[x]
maxi = f[xend]; mini = f[xbegin];
step = (xend - xbegin)/10;
Do[ If[f[i] > maxi, maxi = f[i]];
If[f[i] < mini, mini = f[i]], {i, xbegin, xend, step}];
tot = maxi - mini
mini = mini - 0.1*tot; maxi = maxi + 0.1*tot;
We begin with the first Newton's iteration.
x1 = x0 - f[x0]/g[x0]
Out[23]= 0.291107
Then we plot the tangent line along with the given function and bounds for the root.
ea = Abs[(x1 - x0)/x1*100]
Out[24]= 3.05502
tanline[x_]:=f[x0]+((0-f[x0])/(x1-x0))*(x-x0)
tline = Plot[tanline[x], {x, xbegin, xend}, PlotStyle -> {Thick, Orange}]
Show[Graphics[Line[{{x0, maxi}, {x0, mini}}]], curve,
Graphics[Line[{{x1, maxi}, {x1, mini}}]], tline, Axes -> True,
PlotLabel -> "Function with upper and lower guesses and estimated root", PlotStyle -> {FontSize -> 10}]
Now we make the second iteration.
x2=x1-f[x1]/g[x1]
Out[25]= 0.291096
Then we compare its value with the actual root obtained using a standard Mathematica command.
s = FindRoot[f[x] == 0, {x, 0.3}]
NumberForm[N[s], 20]
Out[26]= {x -> 0.291096}
which is a more accurate approximation to the root: 0.291095502695723. Indeed,
f[0.291095502695723]
Out[28]= 0.
This is the zero within Mathematica accuracy. So we see that two iterations are enough to get at least six accurate decimal places. For instance, the third iteration yields
x3=x2-f[x2]/g[x2]
Out[29]= 0.291095502695723
■

Example: We reconsider the function $$f(x) = e^x\, \cos x - x\, \sin x$$ that has one root within the interval [0,3]. We define the range of x:

xa = 0; xb = 3;
We break the guess interval [0,3] into 10 uniform parts and calculate the values of the given function at each end point of these subintervals.
maxi = f[xa]; mini = f[xb];
step = (xb - xa)/10;
x[0] = xa;
Do[ x[i] = x[i - 1] + step; If[f[x[i]] > maxi, maxi = f[x[i]]; If[f[x[i]] < mini, mini = f[x[i]]]];
Print["i= ", i, ", x[i]= ", x[i], ", f[x[i]]= ", N[f[x[i]]]]; , {i, 1, 10}]
So we see that the first root is within the interval [6/5, 3/2]. Let us take the middle of the interval as our first guess:
x0 = (6/5 + 3/2)/2
Out[2]= 27/20
which is 1.35. We set the maximum number of iterations to be 4:
nmaximum = 4
Then we define the derivative of the function f(x) which we denote by g(x).
f[x_] := Exp[x]*Cos[x] - x*Sin[x]
g[x_] := f'[x]
Now we find the actual solution using Mathematica (this is an approximation as well, but with good accuracy)
s = FindRoot[f[x] == 0, {x, 1.35}]
actual = x /. s
NumberForm[N[actual], 20]
Out[7]= 1.2253937841236207
We apply the Newton--Raphson method to generate the values of the roots, true error, absolute relative true error, approximate error, absolute relative approximate error, and the least number of significant digits that are correct in the estimated root as a function of number of iterations.
Clear[xr]; SetPrecision[xr, 40]
Array[xr, nmaximum];
We generate the sequence of root approximations.
For[i = 1; xr[0] = N[x0], i <= nmaximum, i++,
xr[i] = xr[i - 1] - f[xr[i - 1]]/g[xr[i - 1]]]
Absolute true error:
Array[Et, nmaximum];
For[i = 1, i <= nmaximum, i++, Et[i] = Abs[actual - xr[i]]]
Absolute relative true error:
Array[et, nmaximum];
For[i = 1, i <= nmaximum, i++, et[i] = Abs[Et[i]/actual*100]]
Absolute approximate error:
Array[Ea, nmaximum];
For[i = 1, i <= nmaximum, i++,
If[i <= 1, Ea[i] = Abs[xr[i] - x0], Ea[i] = Abs[xr[i] - xr[i - 1]]]]
Absolute relative approximate error:
Array[ea, nmaximum];
For[i = 1, i <= nmaximum, i++, ea[i] = Abs[Ea[i]/xr[i]*100]]
Significant digits at least correct.
Array[sigdig, nmaximum];
For[i = 1, i <= nmaximum, i++,
If[(ea[i] >= 5) || (i <= 1), sigdig[i] = 0,
sigdig[i] = Floor[(2 - Log[10, Abs[ea[i]/0.5]])]]]
Graphs:
xrplot = Table[xr[i], {i, 1, nmaximum}];
ListPlot[xrplot, Joined -> True, PlotRange -> All,
AxesOrigin -> {1, Min[xrplot]},
PlotLabel -> "Estimated root as a function of number of iterations"]
■

Example: This example shows that Newton's method may converge slowly due to an inflection point occurring in the vicinity of the root. Let us consider the function $$f(x) = (x-0.5)^3 .$$ First we define the function and its derivative:

f[x_] := (x-0.5)^3
g[x_] := f'[x]
Then we plot the function:
xbegin = -2; xend = 3;
curve = Plot[f[x], {x, xbegin, xend},
PlotLabel -> "Function on given interval", PlotStyle -> {FontSize -> 11, Blue}]
Next we define the initial guess and maximum number of iterations.
x0 = -1; nmaximum = 10;
First iteration.
x1 = x0 - f[x0]/g[x0]
Out[16]= -0.5
ea = Abs[(x1 - x0)/x1*100]
Out[17]= 100
tanline[x_] := f[x0] + ((0 - f[x0])/(x1 - x0))*(x - x0)
tline = Plot[tanline[x], {x, -1.5, 0.0}, PlotStyle -> {Orange}];
Show[curve, tline, Graphics[Line[{{x0, 2}, {x0, -15}}]],
Graphics[Line[{{x1, 2}, {x1, -15}}]], Axes -> True, PlotRange -> All,
PlotLabel -> "Function with upper and lower guesses and estimated root", PlotStyle -> {FontSize -> 10}]
We start the second iteration.
x2 = x1 - f[x1]/g[x1]
Out[24]= -0.166667
ea = Abs[(x2 - x1)/x2*100]
Out[25]= 200
Then we plot tangent line.
tanline[x_]:=f[x1]+((0-f[x1])/(x2-x1))*(x-x1)
tline = Plot[tanline[x], {x, -1, 1}, PlotStyle -> {Orange}];
Show[Graphics[Line[{{x1, 0.6}, {x1, -1}}]], curve,
Graphics[Line[{{x2, 0.6}, {x2, -1}}]], tline, Axes -> True,
PlotLabel -> "Second iteration and estimated root", PlotStyle -> {FontSize -> 10}, PlotRange -> {{-1, 1}, {-1, 0.7}}]
Third iteration,
x3 = x2 - f[x2]/g[x2]
Out[38]= 0.0555556
ea = Abs[(x3 - x2)/x3*100]
Out[39]= 400
and then fourth iteration.
x4=x3-f[x3]/g[x3]
Out[40]= 0.203704
ea = Abs[(x4 - x3)/x4*100]
Out[41]= 72.7273
So we see that convergence is very slow. ■    ■
Newton's method diverges for the cube root, which is continuous and infinitely differentiable, except for x = 0, where its derivative is undefined:
$x_{k+1} = x_k - \frac{x_k^{1/3}}{(1/3)\,x_k^{-2/3}} = -2\,x_k , \qquad k=0,1,2,\ldots ;$
To avoid divergence of Newton's method, it is possible to slightly modify it at no extra cost using the same two evaluations of the function and its derivative at the point.
$x_{k+1} = x_k - \frac{f(x_k)}{f' (x_k ) - p\,f (x_k )} , \qquad k=0,1,2,\ldots ; \tag{Newton.3}$

where p is a real number. This algorithm coincides with Newton's method for p = 0, and it is of order 2 for general p. However, for $$\displaystyle p = \frac{f'' (x_k )}{2\,f' (x_k )}$$ it is of order 3 and it is known as Halley's method.

Example: Consider the Hermite polynomial of degree 10:

$H_{10} (x) = 1024\,x^{10} - 23040\, x^8 + 161280\,x^6 - 403200\,x^4 + 302400 \,x^2 - 30240.$
 We plot the Hermite polynomial with the following Mathematica code: Plot[HermiteH[10, x], {x, -2, 2}, PlotStyle -> Thick, PlotLabel -> Style["Hermite polynomial of degree 10", FontSize -> 16], Background -> LightBlue] Hermite polynomial of degree 10. Mathematica code

Since these polynomials were discovered by P.Chebyshev, they are called also Chebyshev--Hermite polynomials. Since the Hermite polynomial has negative value at x = 0
HermiteH[10, 0]
-30240
and positive value at x = 1
HermiteH[10, 1]
8224
we know that one of its roots belongs to the interval [0,1]. So we start with first two approximations x0 = 0 and x1 = 1. The middle point is obviously u = 0.5. According to Ridders' formula, our next iteration is
$x_2 = \frac{1}{2} - \frac{1}{2} \cdot \frac{H_{10}(0.5)}{\sqrt{H_{10}^2 (0.5) - H_{10}(0) \cdot H_{10}(1)}} = 0.090012 .$
x2 = 1/2 - 1/2*HermiteH[10, 0.5]/Sqrt[HermiteH[10, 0.5]^2 - HermiteH[10, 0]*HermiteH[10, 1]]
0.090012
■

II. Secant method

Suppose that we are given a smooth real-valued function f : ℝ → ℝ, and its derivative does not change too widely. Our objective is to determine a root α so f(α) = 0 with the secant method that requires availability of two initial values. These two initial points x0 and x1 do not have to satisfy the bracketing condition $$f(x_0)\cdot f(x_1) < 0 .$$ The iteration we present is based on the zero-crossing of the secant line passing through the two points f(x0) and f(x1), instead of their middle point.

A palpable drawback of Newton's method is that it requires us to have a formula for the derivative of f. For classroom examples this is not an issue, but in the "real world" it can be. Suppose, for example, that f is instead "defined" by a separate subprogram that uses about 1,000 lines of involved computer code. Even if we could theoretically write a formula for the derivative from which f' could be constructed, is this a practical task to set for ourselves?

One obvious way to deal with this problem is to use an approximation to the derivative with the Newton formula

$f' (x) \approx \frac{f(x+h) - f(x)}{h} ,$
$x_{k+1} = x_k - f(x_k ) \left[ \frac{h}{f(x_k +h) - f(x_k )} \right] .$
So instead of the derivative, we use a secant line (i.e., one passing through two points xk and xk + h on the curve instead of just one), then no derivative would be required. To start this approach, we need two initial points—that serve as two initial guesses—and consider the line that passes through these points. Our next iterate will be the point where such a secant line crosses the abscissa. The formula involved in the secant method is very close to the one used in regula falsi:

$x_{k+1} = x_k - \frac{f(x_k ) \left( x_k - x_{k-1} \right)}{f (x_k ) - f(x_{k-1} )} = \frac{x_{k-1} f (x_k ) - x_k f(x_{k-1} )}{f (x_k ) - f(x_{k-1} )}, \qquad k=0,1,2,\ldots . \tag{Secant.1}$
 plot = Plot[0.7*x^2, {x, 0, 2}, AspectRatio -> 1, PlotStyle -> {Thickness[0.01], Blue}, Axes -> False, Epilog -> {PointSize[0.03], Pink, Point[{{0, 0}, {2, 2.8}}]}, PlotRange -> {{-0.5, 2.2}, {-0.2, 4.4}}]; line = Graphics[{Thick, Dashed, Black, Line[{{0, 0}, {2, 2.8}}]}]; arrow = Graphics[{Arrowheads[0.07], Arrow[{{-0.2, 0.3}, {2.2, 0.3}}]}]; l1 = Graphics[{Thick, Dashed, Line[{{0, -0.1}, {0, 0.3}}]}]; l2 = Graphics[{Thick, Dashed, Line[{{0.22, 0.03}, {2, 2.8}}]}]; l3 = Graphics[{Thick, Dashed, Line[{{2, 0.3}, {2, 2.8}}]}]; l4 = Graphics[{Thick, Dashed, Line[{{0.22, 0.03}, {0.22, 0.3}}]}]; l5 = Graphics[{Thick, Dashed, Line[{{0.4, 0.11}, {0.4, 0.3}}]}]; l6 = Graphics[{Thick, Dashed, Line[{{0.4, 0.11}, {2, 2.8}}]}]; t1 = Graphics[{Black, Text[Style[Subscript[x, 0], 16], {0, 0.45}]}]; t2 = Graphics[{Black, Text[Style[Subscript[x, 1], 16], {2, 0.2}]}]; t3 = Graphics[{Black, Text[Style[Subscript[x, 2], 16], {0.22, 0.45}]}]; t4 = Graphics[{Black, Text[Style[$Alpha], 16], {0.7, 0.23}]}]; t5 = Graphics[{Black, Text[Style[Subscript[x, 3], 16], {0.4, 0.45}]}]; t8 = Graphics[{Black, Text[Style["f(b)", {1.08, 1}]]}]; t8 = Graphics[{Black, Text[Style["f(b)", {1.08, 1}]]}]; Show[arrow, plot, line, l1, l2, l3, l4, l5, l6, t1, t2, t3, t4, t5, t8, t9, Epilog -> {PointSize[0.02], Purple, Point[{{0, 0}, {0, 0.3}, {0.22, 0.3}, {0.22, 0.03}, {0.4, 0.3}, {0.4, 0.11}, {0.52, 0.3}, {2, 0.3}}]}] Secant algorithm. Mathematica code The Secant algorithm can be implemented in Mathematica: SecantMethodList[f_, {x_, x0_, x1_}, n_] := NestList[Last[#] - {0, (Function[x, f][Last[#]]* Subtract @@ #)/Subtract @@ Function[x, f] /@ #}&, {x0, x1}, n] The secant algorithm generates a sequence { xn }n≥-1 that starts with two initial values x-1, x0 from some interval and develops as \[ x_n = S \left( x_{n-1} , x_{n-2} \right) = \frac{x_{n-2} f \left( x_{n-1} \right) - x_{n-1} f \left( x_{n-2} \right)}{f \left( x_{n-1} \right) - f \left( x_{n-2} \right)} , \qquad n=1,2,\ldots .$

The secant method is consistent with Newton's method, and its error is proportional to h = xk+1 - xk. Thus, if we assume that the iteration is converging (so that h = xk+1 - xk → 0), then the secant method becomes more and more like Newton's method. Hence, we expect rapid convergence of the secant iterates near the root α. The secant method has a number of advantages over Newton's method. Not only does it not require the derivative, but it can be coded in such a way as to require only a single function evaluation per iteration. Newton requires two: one for the function and one for the derivative. Thus, the secant method is about half as costly per step as Newton's method.

When the secant method is applied to find a square root of a positive number A, we get the formula

$p_{k+1} = p_k - \frac{p_k^2 -A}{p_k + p_{k-1}} , \qquad k=1,2,\ldots . \tag{Secant.2}$
We can also print out the residual values so the convergence will be seen explicitly.
Secant[x0_,x1_,max_]:=Module[{},k=1;p0=N[x0];p1=N[x1];
p2=p1; p1=p0;
While[k < max, p0=p1;p1=p2; p2=p1-(f[p1](p1-p0))/(f[p1]-f[p0]);
k=k+1;
Print["p =",NumberForm[p2,16]];
Print[" Δp=±"Abs[p2-p1]];
Print["f[p]=",NumberForm[f[p2],16]];]

Input: x0 and x1
Set y0=f(x0) and y1=f(x1)
Do
x=x1-y1*(x1-x0)/(y1-y0)
y=f(x)
Set x0=x1 and x1=x
Set y0=y1 and y1=y
Until |y1| < tol

The order of convergence of the secant method is equal to the golden ratio: $$\varphi = \left( 1 + \sqrt{5} \right) /2 \approx 1.618033988749895 ,$$ which is better than the linear convergence of the bisection search, but worse than quadratic convergence of the Newton-Raphson method. Unfortunately, because the root is not bracketed, the convergence is not guaranteed. If at any iteration stage, f(xn) = f(xn+1), the approximated derivative is zero, and xn+1 cannot be found. One way to resolve this issue is to combine the bisection search with the secant method. This yields Dekker's method:
$x_{k+1} = \begin{cases} x_k - \frac{f(x_k ) \left( x_k - x_{k-1} \right)}{f (x_k ) - f(x_{k-1} )} , & \ \mbox{if } f(x_k ) \ne f(x_{k-1} ) , \\ \frac{x_k + x_{k-1}}{2} , & \mbox{ otherwise}. \end{cases}$

Example: Let us find a positive square root of 6. To start secant method, we need to pick up two first approximations,which we choose by obvious bracketing: $$x_0 =2, \quad x_1 =3 .$$

\begin{align*} x_2 &= 3 - \frac{9-6}{3+2} = \frac{12}{5} =2.4 , \\ x_3 &= 2.4 - \frac{2.4^2 -6}{2.4+3} = \frac{22}{9} = 2.44444 , \\ x_4 &= \frac{22}{9} - \frac{(22/9)^2 -6}{22/9 + 12/5} = \frac{267}{109} \approx 2.44954 , \\ \vdots & \quad \vdots , \\ x_9 &= \frac{1691303970864713862076027918}{690471954760262617049295761} \approx 2.449489742783178 , \end{align*}
which is a number with 16 correct decimal places. We check with our subroutine:
g[x_] = x^2 - 6; SecantMethodList[g[x], {x, 3, 2}, 9]
{{3, 2}, {2, 12/5}, {12/5, 27/11}, {27/11, 218/89}, {218/89, 11760/ 4801}, {11760/4801, 2563707/1046629}, {2563707/1046629, 20099463098/ 8205571449}, {20099463098/8205571449, 103058268481156812/ 42073361925598085}, {103058268481156812/42073361925598085, 2071415864280787851147887283/ 845651985432356931038013959}, {2071415864280787851147887283/ 845651985432356931038013959, 142317688184784610132207753701235906515210962/ 58100952904207433228450975572478447754863921}}

Example: We consider the function $$f(x) = e^x\, \cos x - x\, \sin x$$ that has one root within the interval [0,3]. We define the range of x:

f[x_] := Exp[x]*Cos[x] - x*Sin[x]
xa = 0; xb = 3;
and then plot the function:
curve = Plot[f[x], {x, xa, xb},
PlotLabel -> "Function on given interval", PlotStyle -> {FontSize -> 11, Blue}]
We make lower initial guess and upper initial guess:
xguess1 = 1.0; xguess2 = 2.0;
and then plot the function along with two guess bounds:
Show[Graphics[Line[{{xguess2, maxi}, {xguess2, mini}}]], curve,
Graphics[Line[{{xguess1, maxi}, {xguess1, mini}}]], Axes -> True,
PlotLabel -> "Function with upper and lower guesses",
PlotStyle -> {FontSize -> 10}, PlotRange -> {{0.7, 2}, {-0.2, 0.4}}]
We make the first iteration:
x1 = xguess2 - (f[xguess2]*(xguess1 - xguess2))/(f[xguess1] - f[xguess2])
1.11361
Plot of the line through two guess points:
m = (f[xguess2] - f[xguess1])/(xguess2 - xguess1);
secantline[x_] = m*x + f[xguess2] - m*xguess2;
sline = Plot[secantline[x], {x, xa, x1}, PlotStyle -> {Orange}]
Show[Graphics[Line[{{xguess2, maxi}, {xguess2, mini}}]],
Graphics[Line[{{x1, maxi}, {x1, mini}}]], curve,
Graphics[Line[{{xguess1, maxi}, {xguess1, mini}}]], sline,
Axes -> True, PlotLabel -> "Function with upper and lower guesses and estimated root",
PlotStyle -> {FontSize -> 10}, PlotRange -> {{0.5, 2}, {-0.25, 0.9}}]
It should be noted that these two guesses do not have to bracket the root. We have called the two initial guesses in the way which we have because that will be the format for subsequent iterations. It does not matter which guess is which. Try switching the two initial guesses to see what happens. One should converge faster than the other.

We make the second iteration.

x2 = x1 - (f[x1]*(xguess2 - x1))/(f[xguess2] - f[x1])
1.17199
Then we plot bounds.
m = (f[x1] - f[xguess2])/(x1 - xguess2);
secantline[x_] = m*x + f[x1] - m*x1;
sline = Plot[secantline[x], {x, 1, 1.3}, PlotStyle -> {Orange}]
Show[Graphics[Line[{{xguess2, maxi}, {xguess2, mini}}]],
Graphics[Line[{{x1, maxi}, {x1, mini}}]], curve,
Graphics[Line[{{xguess1, maxi}, {xguess1, mini}}]], sline,
Axes -> True, PlotLabel -> "Function with upper and lower guesses and estimated root",
PlotStyle -> {FontSize -> 10}, PlotRange -> {{0.5, 2}, {-0.25, 0.9}}]
Third iteration gives.
x3=x2-(f[x2]*(x1-x2))/(f[x1]-f[x2])
1.23113
■

It is possible to combine the secant method and weighed Newton's method:

$\begin{cases} z_k = x_k - \frac{f(x_k )}{f' (x_k ) - p\, f(x_k )} , & \ \mbox{Newton's part} , \\ x_{k+1} = x_k - \frac{z_k - x_k}{f( z_k ) - f(x_k )} \, f(x_k ) , & (\mbox{ Secant part}). \end{cases} \tag{Secant.3}$
where p is a real number (parameter). We refer to equations (Secant.3) as the leap-frog Newton's method. The two-point algorithm above is of order 3 whenever $$\displaystyle \frac{p\,f(x_k )}{f' (x_k )} < 1 ,$$ xk being the iterates of zeroes of f.
 plot = Plot[1*x^2, {x, 0, 1}, AspectRatio -> 1, PlotStyle -> {Thickness[0.01], Blue}, Axes -> False, Epilog -> {PointSize[0.03], Pink, Point[{{0, 0}, {2, 2.8}}]}, PlotRange -> {{-0.5, 1.2}, {-0.2, 1.4}}]; arrow = Graphics[{Arrowheads[0.07], Arrow[{{-0.2, 0.3}, {1.2, 0.3}}]}]; l1 = Graphics[{Thick, Dashed, Line[{{1, 1}, {0.75, 0.3}}]}]; l2 = Graphics[{Thickness[0.01], Red, Line[{{1, 1}, {0.59, 0.3}}]}]; l3 = Graphics[{Thick, Dashed, Line[{{1, 1}, {1, 0.3}}]}]; l4 = Graphics[{Thick, Dashed, Line[{{0.75, 0.55}, {0.75, 0.3}}]}]; t2 = Graphics[{Black, Text[Style[Subscript[x, 0], 16], {1, 0.25}]}]; t3 = Graphics[{Black, Text[Style[Subscript[z, 1], 16], {0.75, 0.25}]}]; t4 = Graphics[{Black, Text[Style[Subscript[x, 1], 16], {0.6, 0.25}]}]; t5 = Graphics[{Black, Text[Style[$Alpha], 16], {0.5, 0.35}]}]; t9 = Graphics[{Black, Text[Style["f(x)", Bold, 16], {0.66, 0.7}]}]; Show[arrow, plot, l1, l2, l3, l4, t2, t3, t4, t5, t9, Epilog -> {PointSize[0.02], Purple, Point[{{0.6, 0.3}, {0.75, 0.555}, {0.75, 0.3}, {1, 0.3}, {1, 1}}]}, PlotLabel -> "Leap-frog Newton alsorithm", LabelStyle->{FontSize->16,Black} ] Leap-frog Newton's algorithm. Mathematica code Example: Consider the Chebyshev polynomial of the first kind that is the solution of the initial value problem \[ \left( 1 - x^2 \right) y'' -x\,y' + n^2 y = 0, \qquad y(0) = \begin{cases} 1, & \ \mbox{ for } \ n = 4k, \\ -1 , & \ \mbox{ for } \ n = 4k + 2, \\ 0, & \ \mbox{ for } n = 2k+1, \end{cases}$
These polynomials are denoted by Tn(x):
$T_n (x) = \begin{cases} \cos \left( n\,\mbox{arccos}\,x \right) , & \ \mbox{ if } \ |x| ≤ 1, \\ \cosh \left( n\,\mbox{arccosh}\,x \right) , & \ \mbox{ if } \ x ≥ 1, \\ (-1)^n \cosh \left( n\,\mbox{arccosh}\,(-x) \right) , & \ \mbox{ if } \ x ≤ -1 . \end{cases}$
We choose n = 12 and plot the chebyshev polynomial T12(x).
 Plot[ChebyshevT[12, x], {x, -1, 1}, PlotStyle -> Thick, PlotLabel -> Style["Chebyshev polynomial of the first kind for n=12", FontSize -> 16, Background -> Yellow]] Chebyshev polynomial of the first kind. Mathematica code

■

III. Steffensen's Iteration Method

We present another optimal two-point iteration scheme without memory having quadratic rate of convergence. This algorithm is named after the Danish mathematician, statistician, and actuary Johan Frederik Steffensen (1873--1961). The main advantage of Steffensen's method is that it does not require evaluation of the function's derivative and uses only two function evaluations per iteration step. On the other hand, the secant method needs only one function evaluation per step. Since it is of the second order method, the number of correct digits in the answer doubles with each step. The crucial weakness in Steffensen's method is sensitivity to the starting value x0

Suppose that we need to find a root α of a smooth function f, so f(α) = 0. Given an adequate starting value x0, a sequence of values x1, x2, … , xn, … can be generated using the Steffensen's formula below

$x_{k+1} = x_k - \frac{f^2 \left( x_k \right)}{f \left( x_k + f \left( x_k \right) \right) - f \left( x_k \right)} , \qquad k=0,1,2,\ldots . \tag{Steffensen.1}$

Johan's revolutionary idea is based on applying Newton's iteration scheme but replacing the derivative in (Newton.1) by the forward approximation:    $$\displaystyle f' \left( x_k \right) \approx \frac{f \left( x_k + f( x_k )\right) - f \left( x_k \right)}{ f \left( x_k \right)} = \frac{f\left( x_k + h \right) - f\left( x_k \right)}{h} ,$$ where h = f(xk) plays the role of the infinitesimal increment, which is close to zero when xk is near the root α. Now it is clear why Steffensen's method is very sentitive to the initial quess value x0 because f(x0) must be very small. However, a more accurate approximation is obtained by utilizing a symmetric difference quotient

$x_{k+1} = x_k - \frac{2\, f^2 \left( x_k \right)}{f \left( x_k + f \left( x_k \right) \right) - f \left( x_k - f \left( x_k \right)\right)} , \qquad k=0,1,2,\ldots . \tag{Steffensen.2}$
SteffensonMethod[f_, x0_, n_] := NestList[# - (f[#])^2/(f[# + f[#]] - f[#]) &, x0, n]
The symmetric version is listed below:
symSteffensonMethod[f_, x0_, n_] := NestList[# - 2*(f[#])^2/(f[# + f[#]] - f[# - f[#]]) &, x0, n]

Note that only the head of the function is specified when it is passed to the SteffensonMethod subroutine. Steffensen's algorithm can be implemented within the Wolfram language with the aid of NestWhileList command. The nesting iteration will be terminated when the last two iterates become the same, but no more than 20 times (by default n ≤ 20):

SteffensonMethod2[f_, x0_, n_ : 20] := NestWhileList[# - (f[#])^2/(f[# + f[#]] - f[#]) &, x0, UnsameQ, 2, n]

Although Steffensen's method works very well, because of machine limitations, there could be situations when the denominator may lose significant places due to subtraction of two almost equal floating point terms. To avoid such situations, Steffensen's algorithm can be modified with either Aitken's delta-squared process or introducing a real parameter p:

$x_{k+1} = x_k - \frac{f^2 \left( x_k \right)}{f \left( x_k + f \left( x_k \right) \right) - f \left( x_k \right) - p\,f^2 (x_k )} , \qquad k=0,1,2,\ldots . \tag{Steffensen.3}$
pSteffenson[f_, x0_, n_, p_] := NestList[# - (f[#])^2/(f[# + f[#]] - f[#] - p*(f[#])^2) &, x0, n]
However, combining Steffensen's algorithm with the secant method leads to the two-point method of order 3:
$\begin{cases} z_k = x_k - \frac{f^2 (x_k )}{f \left( (x_k ) + f(x_k ) \right) - f (x_k ) - p\,f^2 (x_{k} )} , & \ \mbox{Steffensen's part} , \\ x_{k+1} = x_k -\frac{z_k - x_{k}}{f( z_k ) - f(x_k )} \, f(x_k ) , & (\mbox{ Secant part}). \end{cases} \tag{Steffensen.4}$
This two-point algorithm is of order 3 for any real value of parameter p. Note that there are many other iterative methods devoted to finding roots of nonlinear equation without derivatives (see a family of such algorithms in the paper by Kung and Taub). For example, with a real parameter γ,
$x_{k+1} = x_k - \frac{\gamma \,f^2 \left( x_k \right)}{f \left( x_k + \gamma \,f \left( x_k \right) \right) - f \left( x_k \right)} , \qquad k=0,1,2,\ldots . \tag{Steffensen.5}$

Example: Consider the Chebyshev polynomial of the second kind that is the polynomial solution of the initial value problem

$\left( 1 - x^2 \right) y'' -3x\,y' + n \left( n +2 \right) y = 0, \qquad y(0) = \begin{cases} \phantom{-}1, & \ \mbox{ for } \ n = 4k, \\ -1 , & \ \mbox{ for } \ n = 4k + 2, \\ \phantom{-}0, & \ \mbox{ for } n = 2k+1, \end{cases}$
These polynomials are denoted by Un(x):
$U_n (x) = \frac{\sin \left( (2n+1)\,\mbox{arccos} x \right)}{\sin \left( \mbox{arccos} x \right)} , \qquad n=0,1,2,\ldots .$
We choose n = 11 and plot the chebyshev polynomial U11(x).
 Plot[ChebyshevU[11, x], {x, -1, 1}, PlotStyle -> Thick, PlotLabel -> Style["Chebyshev polynomial of the second kind", FontSize -> 16, Background -> Cyan]] Chebyshev polynomial of the second kind. Mathematica code

First, we bracket a positive root of the Chebyshev polynomial:
N[ChebyshevU[11, 1/10]]
-0.937464
N[ChebyshevU[11, 1/3]]
0.854319
So we know that a positive zero of U11(x) is somewhere in the interval [0.1, 0.33]. Then we find its root with the standard Mathematica command:
FindRoot[ChebyshevU[11, x], {x, 0.3}]
{x -> 0.25881904510252074}

We make a few steps manually to see how Steffensen's algorithm works. With an initial approximation x0 = 0.3, we get

f[x_] = ChebyshevU[11, x]; a = 0.3; p0=f[a];
x1 = a - p0^2/(ChebyshevU[11, a + p0] - p0)
0.0426704
Since the values of the function U22(0.3) = 0.516061 and U22(0.1) = -0.937464 at the endpoints cannot be considered as «infinitesimal», we expect Steffensen's method to diverge. We make the second step and obtain
p0 = ChebyshevU[11, x2]; p02 = p0^2; x3 = x2 - p02/(ChebyshevU[11, x2 + p0] - p0 - 6*p02)
1.04959
So we see that if we go far away from the root, Steffensen's method diverges. If we use a symmetric version (Steffensen.2), the algorithm also diverges, but slowly.
f[x_] = ChebyshevU[11, x]; a = 0.1;
i = 1; NN = 30; tol = 1/10^7; p0 = N[f[a]]; p02 = p0*p0; x0 = a;
While[i < NN,
x1 = x0 - 2*p02/(f[x0 + p0] - f[x0 - p0]);
x0 = x1; p0 = N[f[x0]]; p02 = (f[x0])^2;
Print["i= ", i, ", x= ", x1, ", f= ", p0]; i++]
Instead of the classical algorithm (Steffensen.1), we consider a modified one (Steffensen.3), with p = 6.

On the first step, we get

x1 = 0.1 - p0^2/(ChebyshevU[11, 0.1 + p0] - p0 - 6*p0^2)
0.261254
The second step gives
p0=ChebyshevU[11, x1]; p02=p0^2;
x2 = x1 - p02/(ChebyshevU[11, x1 + p0] - p0 - 6*p02)
0.25874
The third step yields
p0=ChebyshevU[11, x2]; p02=p0^2;
x3 = x2 - p02/(ChebyshevU[11, x2 + p0] - p0 - 6*p02)
0.258819
Since we get eight correct decimal places, there is no need for further iteration and we claim that three iterations with the modified Steffensen's method is sufficed to obtain the root of the equation U11(x = 0.

We write a loop:

f[x_] = ChebyshevU[11, x]; a = 0.1;
i = 1; NN = 5; tol = 1/10^7; p0 = N[f[a]]; p02=p0*p0; x0 = a;
While[i < NN,
x1 = x0 - p02/(f[x0 + p0] - p0 - 6*p02);
x0 = x1; p0 = N[f[x0]]; p02 = (f[x0])^2;
Print["i= ", i, ", x= ", x1, ", f= ", p0]; i++
]
Next, we check the Steffensen-secant algorithm (Steffensen.4) with p = 6:
f[x_] = ChebyshevU[11, x]; a = 0.1;
i = 1; NN = 5; tol = 1/10^7; p0 = N[f[a]]; p02=p0*p0; x0 = a;
While[i <= NN,
z1 = x0 - p02/(f[x0 + p0] - p0 - 6*p02);
x1=x0 - (z1-x0)*p0/(f[z1]-p0);
x0 = x1; p0 = N[f[x0]]; p02 = (f[x0])^2;
Print["i= ", i, ", x= ", x1, ", f= ", p0]; i++
]
So we get the correct 6 decimal places on the second step. However, with p = 0, Steffensen-secant algorithm diverges because the secant line becomes horizontal.

We compare our results with Newton's method (Newton.1):

NewtonsMethodList[ChebyshevU[11, x], {x, 0.1}, 5]
{0.1, -0.109996, 0.2054, 0.26945, 0.258806, 0.258819}
Hence, we see that Newton's method makes five iterations to achieve the required accuracy.

Now we investigate the sensitivity of Steffensen's algorithm to the starting value. Utilizing ListLinePlot facilitates to see how quickly the solution stabilizes. The dashed thick red line shows the solution as determined by Mathematica built-in command FindRoot.

 (* making a list of some starting values *) x0List = Range[0.2, 0.3, 0.02] {0.2, 0.22, 0.24, 0.26, 0.28, 0.3} Then we map the function across the list of starting values. The maximum number of iterations is 10 simply for aesthetic purposes. root = FindRoot[f[x], {x, 0.3}]; ListLinePlot[ SteffensonMethod2[f, #, 10] & /@ x0List, (* the root determined by FindRoot is shown as a thick red dashed line *) Epilog -> {Thick, Dashed, Red, Line[{{1, x}, {8, x}}] /. root}, (* Adding labels, legends, making the plot look nicer *) PlotLabels -> x0List, ImageSize -> Large, AxesLabel -> {"Iterations", "Result"}, PlotRange -> {{1, Automatic}, Automatic}, PlotLegends -> x0List, PlotLabel -> "Solution from FindRoot is " <> ToString[x /. root] ] Sensitivity of Steffensen's algorithm to the starting value. Mathematica code

■

IV. Chebyshev Iteration Scheme

A success in the quadratically convergent Newton's method is based on the employment of a derivative in the iteration scheme for determining the solutions of the equation f(x) = 0. Studying calculus at college, P. Chebyshev became an avid follower of Isaac Newton. Interestingly, Chebyshev used seven different spellings of his last name that had the effect of mirroring Newton who was known for concealing his name in many of his letters and publications. In 1838, Chebyshev proposed an improvement in Newton's iterative method (a fact not published until 1951) by using additionally a second derivative. This iteration scheme has a cubic rate of convergence. Later, this was extended to systems of nonlinear equations and now bears the name of the Russian mathematician Pafnuty Chebyshev (1821--1894).
$x_{k+1} = x_k - \frac{f \left( x_k \right)}{f' \left( x_k \right)} - \frac{f^2 \left( x_k \right) f'' \left( x_k \right)}{2\left( f' \left( x_k \right) \right)^3} , \qquad k=0,1,2,\ldots . \tag{Chebyshev.1}$
This method admits its geometric derivation from a parabola in the form $$a\,y^2 + y + b\, x + c = 0.$$ See details in the article by V.Kanwar et al. It can be implemented in the Wolfram language as
ChebyshevMethodList[f_, {x_, x0_}, n_] :=
NestList[# - Function[x, f][#]/Derivative[1][Function[x, f]][#] -
Function[x, f^2][#]* Derivative[2][Function[x, f]][#]/
2/(Derivative[1][Function[x, f]][#])^3 &, x0, n]
When Chebyshev's iteration scheme is applied to solve a quadratic equation $$x^2 -A =0,$$ it leads to
$p_{k+1} = \frac{1}{2} \left( p_k + \frac{A}{p_k} \right) - \frac{\left( p_k^2 -A \right)^2}{8\,p_k^3} , \qquad k=0,1,2,\ldots . \tag{Chebyshev.2}$

Example: Let us reconsider the problem of determination of a positive square root of 6 using Chebyshev iteration scheme:

$p_{k+1} = \frac{1}{2} \left( p_k + \frac{6}{p_k} \right) - \frac{\left( p_k^2 -6 \right)^2}{8\,p_k^3} , \qquad k=0,1,2,\ldots .$
\begin{align*} p_0 &=2, \qquad &p_0 =3, \\ p_1 &= \frac{39}{16} = 2.4375 , \quad &p_1 = \frac{59}{24} \approx 2.4583\overline{3} , \\ p_2 &= \frac{2066507}{843648} \approx {\bf 2.449489}597557 , \quad & p_2 = \frac{32196721}{13144256} \approx {\bf 2.4494897}999 , \\ p_3 &= \frac{48631344989193667537677361}{19853663454796665627720704} \approx {\bf 2.449489742783178}, \quad &p_3 = \frac{8596794666982120560353042061123}{3509626726264305284645257635328} \approx {\bf 2.449489742783178} . \end{align*}
p[0] = 2
Do[Print["k= ", k, " " , "p= " , p[k] = (p[k - 1] + 6/p[k - 1])/ 2 - (p[k - 1]^2 - 6)^2/(8*p[k - 1]^3)], {k, 1, 5}]
So we see that the third iteration gives an approximation with 16 correct decimal places.
For the cube root, we apply the Chebyshev method (Chebyshev.1) to the function f(x) = x³ -A and obtain
$x_{k+1} = \frac{2\,x_k^3 +A}{3\,x^2_k} - \frac{x_k \left( x_k^3 - A \right)^2}{9\,x_k^{6}} , \qquad k=0,1,2,\ldots . \tag{Chebyshev.3}$

Example: Let us find a root of the Laguerre polynomial L21(x) of order 21 that can be defined by the Rodrigues formula

$L_n (x) = \frac{e^x}{n!}\,\frac{{\text d}^n}{{\text d} x^n} \left( e^{-x} x^n \right) = \frac{1}{n!} \left( \frac{\text d}{{\text d}x} - 1 \right)^n x^n , \qquad n=0,1,2,\ldots .$
 We plot the Laguerre polynomial to identify the interval containing the root. Plot[LaguerreL[21, x], {x, 0, 3}, PlotStyle -> {Thick, Magenta}] Graph of the Laguerre polynomial L21(x). Mathematica code

Since these polynomials were discovered by P.Chebyshev, they are called also Chebyshev--Laguaere polynomials, after Edmond Laguerre (1834–1886).

To find its first positive root, we use the standard Mathematica command

FindRoot[LaguerreL[21, x], {x, 0}]
{x -> 0.0672578}
Now we apply the Chebyshev iterative procedure:
N[ChebyshevMethodList[LaguerreL[21, x], {x, 0}, 3]]
{0., 0.0589569, 0.0672194, 0.0672578}
and with three iterations, we obtain the correct answer with 10 decimal places. However, with Newton's method,
N[NewtonsMethodList[LaguerreL[21, x], {x, 0}, 4]]
{0., 0.047619, 0.0649839, 0.0672227, 0.0672578}
we obtain 9 correct decimal places with four iterations.    ■

V. Halley's Method

The English astronomer, geophysicist, mathematician, meteorologist, and physicist Edmund/Edmond Halley (1656--1741) discovered in 1694 the following iteration algorithm for solving the equation f(x) = 0:
$x_{k+1} = x_k - \frac{f\left( x_k \right)}{f'\left( x_k \right)} - \frac{f \left( x_k \right) f'' \left( x_k \right)}{2\left( f' \left( x_k \right)\right)^2 - f \left( x_k \right) f'' \left( x_k \right) } \cdot \frac{f\left( x_k \right)}{f'\left( x_k \right)} = x_k - \frac{2\,f \left( x_k \right) f' \left( x_k \right)}{2\left( f' \left( x_k \right)\right)^2 - f \left( x_k \right) f'' \left( x_k \right) } , \qquad k=0,1,2,\ldots . \tag{Halley.1}$
HalleyMethodList[f_, {x_, x0_}, n_] :=
NestList[# - Function[x, f][#]/Derivative[1][Function[x, f]][#] -
Function[x, f^2][#]* Derivative[2][
Function[x, f]][#]/(2*(Derivative[1][Function[x, f]][#])^2 -
Function[x, f][#] * Derivative[2][Function[x, f]][#])/
Derivative[1][Function[x, f]][#] &, x0, n]
The iteration algorithm above is called the Halley's method. Edmund Halley is well known for first predicting the orbit of the comet that bears his name. Usually Halley's formula, which is of the third order, makes the convergence of the process of iteration extremely rapid when the first approximation is fairly close to the null. For example, application of Halley's method to solving the quadratic equation $$x^2 -A =0$$ leads to the recurrence
$x_{k+1} = \frac{1}{2} \left( x_k + \frac{A}{x_k} \right) - \frac{\left( x_k^2 -A \right)^2}{2x_k \left( 3\,x_k^2 + A \right)} , \qquad k=0,1,2,\ldots . \tag{Halley.2}$
The idea behind the Halley's method is to interpolate the function f(x) as a hyperbola in the form $$a\,xy + y + b\,x + c = 0 .$$ See details in the article by V.Kanwar et al.

There is another version of the preceding algorithm that is of order 3 and it is called the super-Halley:

$x_{k+1} = x_k - \frac{f\left( x_k \right)}{f'\left( x_k \right)} - \frac{f(x_k )\, f'' (x_k )}{\left( f' (x_k ) \right)^2 - f(x_k )\, f'' (x_k )} \cdot \frac{f\left( x_k \right)}{2\,f'\left( x_k \right)} , \qquad k=0,1,2,\ldots . \tag{Halley.3}$
superHalleyList[f_, {x_, x0_}, n_] :=
NestList[# - Function[x, f][#]/Derivative[1][Function[x, f]][#] -
Function[x, f^2][#]* Derivative[2][Function[x, f]][#]/
2/((Derivative[1][Function[x, f]][#])^2 -
Function[x, f][#] * Derivative[2][Function[x, f]][#])/
Derivative[1][Function[x, f]][#] &, x0, n]
For example, application of the super-Halley's method to solving the quadratic equation $$x^2 -A =0$$ leads to the recurrence
$x_{k+1} = \frac{1}{2} \left( x_k + \frac{A}{x_k} \right) - \frac{\left( x_k^2 -A \right)^2}{4x_k \left( x_k^2 + A \right)} , \qquad k=0,1,2,\ldots . \tag{Halley.4}$

Example: Let us find a few iterations for determination of a square root of 6:

\begin{align*} p_0 &=2, \qquad &p_0 =3, \\ p_1 &= \frac{22}{9} \approx 2.4\overline{4} , \quad &p_1 = \frac{27}{11} \approx 2.45\overline{45} , \\ p_2 &= \frac{21362}{8721} \approx {\bf 2.4494897}37 , \quad & p_2 = \frac{26163}{10681} \approx {\bf 2.44948974}81 , \\ p_3 &= \frac{19496458483942}{7959395846169} \approx {\bf 2.449489742783178}, \quad &p_3 = \frac{23878187538507}{9748229241971} \approx {\bf 2.449489742783178} . \end{align*}
p[0] = 2
Do[Print["k= ", k, " " , "p= " , p[k] = p[k - 1] - 2*p[k - 1]*(p[k - 1]^2 - 6)/(3*p[k - 1]^2 + 6)], {k, 1, 5}]
So we see that the third iteration gives an approximation with 16 correct decimal places.

Example: Let us find a root of the Legendre polynomial P20(x) of order 20 that can be defined by the Rodrigues formula

$P_n (x) = \frac{1}{2^n n!} \,\frac{{\text d}^n}{{\text d} x^n} \left( x^2 - 1 \right)^n , \qquad n=0,1,2,\ldots .$
 Legendre polynomials are named after Adrien-Marie Legendre (1752--1833), who discovered them in 1782. We plot the Legendre polynomial to identify the interval containing the root. Plot[LegendreP[20, x], {x, 0, 3}, PlotStyle -> {Thick, Magenta}] Graph of the Legendre polynomial P20(x). Mathematica code

First, we find the third positive zero of the Legendre polynomial with the aid of Mathematica standard command
FindRoot[LegendreP[20, x], {x, 0.2}]
{x -> 0.227786}
Then we apply the Halley method
N[HalleyMethodList[LegendreP[20, x], {x, 0.2}, 3]]
{0.2, 0.225813, 0.227785, 0.227786}
So three iterations is enough to get the correct answer with 16 correct decimal places. To obtain the same accuracy, it requires four iterations with Newton's method. On the other hand, super-Halley scheme gives only 15 correct decimal places with 3 iterations.    ■

VI. Dekker's Method

As we saw previously, combination of different iteration algorithms may lead to acceleration of convergence or make it more robust. It was a Dutch mathematician, called Theodorus Jozef Dekker (born in 1927), who proposed an algorithm in 1960 to combine the open iterative method (not restricted to an interval, but usually converges quickly without any guarantee of finding a root) and the closed iterative method (that uses a bounded interval and usually converges slowly to a root if it exists). In particular, he suggested to alternate the robust bisection method (which is slow in computations) with the fast open end secant iterative algorithm. Unfortunately, his original paper is not available on the web, but his next paper published with J.Bus in 1975 can be seen on the Internet. Theodorus Jozef Dekker completed his Ph.D degree from the University of Amsterdam in 1958. Later, Dekker's method was modified, and its latest version, known as Brent or van Wijngaarden--Dekker--Brent method is discussed in the next subsection. Our presentation provides a rough outline of the basic algorithm and does not include possible improvements.

The advantages of Dekker's method were outlined by the founder of the computer science department at Stanford university George E. Forzythe (who was a Ph.D adviser of Cleve Moler, an inventor of matlab):

• it is as good as the bisection method, so it always converges to the root;
• there are no assumptions about the second derivative of the function;
• it is as fast as the secant method;
• it is not concerned with multiple roots unlike the secant method.

Suppose that we want to solve the equation f(x) = 0, where f:  ℝ → ℝ is a smooth real-valued function. As with the bisection method, we need to initialize Dekker's method with two points, say 𝑎0 and b0, such that $$f \left( a_0 \right) \quad\mbox{and} \quad f \left( b_0 \right)$$ have opposite signs. If f is continuous on $$\left[ a_0 , b_0 \right] ,$$ the intermediate value theorem guarantees the existence of a solution between a0 and b0.

The Algorithm

Two points are involved in every iteration: 𝑎 and b, where b is the current guess for the root α of f(x).

• Dekker's method starts with two points that embrace a root α, for which f(α) = 0. So there exists an interval that includes the root:    α ∈ [𝑎, b], with f(𝑎) × f(b) < 0.
• 𝑎k is the "contrapoint," i.e., a point such that $$f \left( a_k \right) \quad\mbox{and} \quad f \left( b_k \right)$$ have opposite signs, so the interval $$\left[ a_k , b_k \right]$$ contains the solution. Furthermore, $$\left\vert f \left( b_k \right) \right\vert$$ should be less than or equal to $$\left\vert f \left( a_k \right) \right\vert ,$$ so that bk is a better guess for the unknown solution than 𝑎k.
• At every step of iteration, the following conditions are maintained:
$f\left( a_k \right) \times f\left( b_k \right) \le 0, \quad \left\vert f\left( b_k \right) \right\vert \le \left\vert f\left( a_k \right) \right\vert , \quad \left\vert a_k - b_k \right\vert \le 2\,\delta (b) ,$
where δ is the required tolerance.
• bk-1 is the previous iterate (for the first iteration, we set bk-1 = 𝑎0).
• Calculate two values: one using the secant method
$s = b_k - \frac{b_k - b_{k-1}}{f\left( b_k \right) - f\left( b_{k-1} \right)}\, f\left( b_k \right) , \qquad \mbox{if } f\left( b_k \right) \ne f\left( b_{k-1} \right) ,$
and another midpoint
$m = \frac{f(a_k ) + f(b_k )}{2} \qquad \mbox{if } f\left( b_k \right) = f\left( b_{k-1} \right) .$
• If the value s from the secant algorithm lies strictly between bk and m, then it becomes the next iterate: bk+1 = s.
• Otherwise, bk+1 = m.
• The new contrapoint is chosen such that f(𝑎k+1) and f(bk+1) have opposite signs. If f(𝑎k) and f(bk+1) have opposite signs, then the contrapoint remains the same: 𝑎k+1 = 𝑎k. Otherwise, f(bk+1) and f(bk) have opposite signs, so the new contrapoint becomes 𝑎k+1 = bk.
• Finally, if |f(𝑎k+1)| < |f(bk+1)|, then 𝑎k+1 is probably a better guess for the required zero than bk+1, and hence the values of 𝑎k+1 and bk+1 are swapped.
■     ■     ■

Example: We demonstrate the Dekker's method with the following pathological example for a function that has a singularity. So we consider a problem for finding the inverse value of an arbitrary number 𝑎:

$f(x) = \frac{1}{x-k} - a = 0 \qquad \Longrightarrow \qquad x = k + \frac{1}{a} .$

To simplify our job, we set k = 0 and 𝑎 = 1. We embrace the root α = 1 by the interval [0.1, 2]. The function $$f(x) = x^{-1} -1$$ has the following values at its endpoints:

f[x_] = 1/x - 1; f[0.1]
9.
f[2.0]
-0.5

The Dekker method improves the secant method to make it more stable by combining it with the bisection method. This combination allows the Dekker method to converge for any continuous function. For each iteration of the algorithm, the two methods are evaluated. If the estimate of the root obtained by the secant method lies between the last value calculated by the algorithm and the new estimate obtained by the bisection method, the value of the secant method is used as one of the boundaries for the next iteration. If not, it may indicate that the secant method is diverging and the bisection iteration value is used as a boundary for the next iteration. The bisection method will need a sign check for the two initial values such that the function values both have the opposite sign, and the Dekker algorithm needs to include this check as well to make sure that the bisection will be a valid method in the next iteration.

Initially, we choose b = 2 and 𝑎 = 0.1 because the value of the function f is smaller at right endpoint. To make the first step, we calculate their mean

m = (0.1+2)/2
1.05
and secant value:
a=0.1; b=2;
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.9
Comparising the values of the function at these two points,
f[m]
-0.047619
f[s]
-0.473684
Since f(m) is less than f(s), we set b = m = 1.05 and 𝑎 = 0.1 remains the same to bracket the root. On the second step. we calculate the mean
b=1.05; a=0.1; m = (a+b)/2
0.575
and secant value:
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.045
Comparison of the values of the function at these two points
f[m]
0.73913
f[s]
-0.0430622
reveals that s is closer to the root. Hence, we set b2 = s = 1.045 and 𝑎 = 0.575. The third iterative step gives
b=1.045; a=0.575; m = (a+b)/2
0.81
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.01913
Comparison of the values of the function at these two points
f[m]
0.234568
f[s]
-0.0187661
shows that s is closer to the root. We set b3 = s = 1.01913 and 𝑎 = m = 0.81. Next iteration gives:
b = s; a = m; m = (a+b)/2
0.914563
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.00363
Comparison of the values of the function at these two points, we set b4 = s = 1.00363 and 𝑎 = m = 0.914563. Next iteration gives:
b = s; a = m; m = (a+b)/2
0.959098
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.00031
Upon setting b5 = s = 1.00031, we obtain four correct decimal places on the fifth iteration. Our numerical experiment shows almost linear convergence.    ■
As a rule, the Dekker method has bad behavior if there is more than one root in given interval.

Example: Use the Bus--Dekker method to find a root of the equation

$e^{0.1\,x} - e^{0.7\,x -1} = 0$
in the interval [1,3]. First, we plot the given function f
 Plot[Exp[0.1*x]-Exp[0.7*x-1], {x,1,2}, PlotStyle->{Thick, Magenta}] Figure 1: Jordan region for m > 0. Mathematica code

We observe that f(1) is positive while f(2) is negative, and hence that there is at least one root in [1,2]. The basis of the Bus--Dekker algorithm is to use the secant method but at the same time ensure a reduction of an interval containing a zero. We start with guesses x = 0 and x = 1.5. Since f(1.5) is positive, we can immediately reduce the interval containing a zero from the initial interval [1,2] to [1.5,2]. So we set b1 = 1.5 and 𝑎 = 2.
f[x_] = Exp[0.1*x] - Exp[0.7*x - 1];
b=1.5; a=2;
m = (a+b)/2
1.75
and secant value:
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.6451
Comparison of the values of the function at these two points
f[m]
-0.0610765
f[s]
0.0151544
reveals that these points bracket the root. So we set for the second iteration b2 = s = 1.6451 and 𝑎 = m = 1.75. Next iteration gives
b = s; a = m;
m = (a+b)/2
1.69755
and secant value:
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.66596
Then we evaluate the values at these points:
f[s]
0.000504238
f[m]
-0.0221636
Since these two points embrace the root, we set b3 = s = 1.66596 and 𝑎 = m = 1.69755. The third iteration gives
b = s; a = m;
m = (a+b)/2
1.68175
and secant value:
s = (a*f[b] - b*f[a])/(f[b] - f[a])
1.66666
So with the three iterations, we get six correct decimal places of the root:
FindRoot[f[x], {x, 0.6}]
1.6666666666666667
■

VII. Brent's Method

Dekker's method performs well if the function f is reasonably well-behaved. However, there are circumstances in which every iteration employs the secant method, but the iterates bk converge very slowly (in particular, $$\left\vert b_k - b_{k-1} \right\vert$$ may be arbitrarily small). Dekker's method requires far more iterations than the bisection method in this case. Although this algorithm guarantees the convergence and the asymptotic behavior is completely satisfactory, the number of function evaluations required by this algorithm may be prohibitively large, in particular, when the zero appears to be multiple one.

In 1971, the Australian mathematician Richard Pierce Brent (born in 1946, Melbourne) proposed a modification of Dekker's method. With the additional inclusion of the inverse quadratic interpolation, Brent's root-finding algorithm makes it completely robust and usually very efficient. The method is the basis of matlab’s fzero routine (see its explanation on Cleve’s Corner). The algorithm tries to use the potentially fast-converging secant method or inverse quadratic interpolation if possible, but it falls back to the more robust bisection method if necessary. The method underwent several modifications, improvements, and implementations. Consequently, the algorithm is also known as the van Wijngaarden-Deker-Brent method.

For Brent's original algorithm, the upper bound of the number of function evaluations needed equals (t + 1)² - 2, where t is the number of function evaluations needed by bisection. Later, J.C.P. Bus & T.J. Dekker proposed another modification of Brent's algorithm, but requiring at most 4t function evaluations. This is achieved by inserting steps in which rational interpolation or bisection is performed. Some other variations of Brent's method are known with a similar asymptotic behavior. Our presentation outlines the basic ideas and does not follow any version of Brent's algorithm. The example presented illustrates the Brent's method by showing all steps.

The inverse quadratic interpolation method approximates the function y = f(x) by a quadratic curve that goes through three consecutive points { xi, f(xi) }, i = (k, k-1, k-2). As the function may be better approximated by a quadratic curve rather than a straight line, the iteration is likely to converge more quickly. Such a quadratic approximation follows:

$y = q(x) = \frac{\left( x - x_1 \right)\left( x - x_2 \right)}{\left( x_0 - x_1 \right)\left( x_0 - x_2 \right)} \, y_0 + \frac{\left( x - x_0 \right)\left( x - x_2 \right)}{\left( x_1 - x_0 \right)\left( x_1 - x_2 \right)} \, y_1 + \frac{\left( x - x_0 \right)\left( x - x_1 \right)}{\left( x_2 - x_0 \right)\left( x_2 - x_1 \right)} \, y_2 .$
However, here we use the interpolation $$x = q^{-1} (y)$$ to approximate the inverse (horizontal) function $$x = f^{-1} (y) :$$
$x = q^{-1} (x) = \frac{\left( y - y_1 \right)\left( y - y_2 \right)}{\left( y_0 - y_1 \right)\left( y_0 - y_2 \right)} \, x_0 + \frac{\left( y - y_0 \right)\left( y - y_2 \right)}{\left( y_1 - y_0 \right)\left( y_1 - y_2 \right)} \, x_1 + \frac{\left( y - y_0 \right)\left( y - y_1 \right)}{\left( y_2 - y_0 \right)\left( y_2 - y_1 \right)} \, x_2 .$
This form is also called a Lagrange polynomial. When y = 0, the corresponding x is the zero-crossing of $$x = f^{-1} (y) :$$
$x_3 = \left. q^{-1} (y) \right\vert_{y=0} = \frac{y_1 y_2}{\left( y_0 - y_1 \right)\left( y_0 - y_2 \right)}\, x_0 + \frac{y_0 y_2}{\left( y_1 - y_0 \right)\left( y_1 - y_2 \right)} \, x_1 + \frac{y_0 y_1}{\left( y_2 - y_0 \right)\left( y_2 - y_1 \right)} \, x_2 .$

The Algorithm

Three points are involved in every iteration: 𝑎, b, and c, where b is the current guess for the root α of f(x). Since there are many checks at every iteration step, we give only basical ideas involved in Brent's algorithm.

• Brent's method starts with two points that embrace a root α, for which f(α) = 0. So there exists an interval that includes the root:    α ∈ [b, c], with f((b) × f(c) < 0.
• Initiation: out of two endpoints that embrace the root α, set b to be that which gives the the smallest function value:    |f(b)| ≤ |f(c)|.
• Set 𝑎 = c and determine a new iterate b by linear (secant) interpolation
$b= b_1 = \frac{a\,f(b) - b\,f(a)}{f(b) - f(a)} .$
Then set c to be the previous value of b if the function f changes the sign. If not, swap with 𝑎.
• If the three current points 𝑎, b, and c are distinct, find another point i by inverse quadratic interpolation.
• If it fails, try to compute a secant intercept s using b and c. Test whether s is between b and (3𝑎+b)/4. Otherwise set s = m, the midpoint.
• Since b is the most recent approximation to the root and 𝑎 is the previous value of b, apply the linear interpolation if |f(b)| ≥ |f(𝑎)|. Otherwise, do bisection.
• Make a new estimate of the root using the obtained points through either the bisection, secant, or IQI methods, and set b to that closest to the root. Use bisection when when the condition |s - b| < 0.5 |b - c| fails.
• Check for convergence at the new calculated value. If reached, end the algorithm.
• Check whether f(new value) and f(b) have different signs. If so, they will be the two estimates for the next iteration. If not, the values would be 𝑎 and the newly calculated value.
• Refresh values of points so that abs(f(b)) < abs(f(𝑎)).
• Start next iteration.

Brent's method contains many comparisons and conditions that make it complicated to follow. It has various implementations; for instance, Bus & Dekker suggest to insert the bisection method at every 5-th iterate. A great deal of attention has been paid to considerations of floating-point arithmetic (overflow and underflow, accuracy of computed expressions, etc.). Let us outline the main check-in points.

Two inequalities must be simultaneously satisfied: Given a specific numerical tolerance δ if the previous step used the bisection method, the inequality

$|\delta | < \left\vert b_k - b_{k-1} \right\vert$
must hold to perform interpolation, otherwise the bisection method is performed and its result is used for the next iteration.

If the previous step performed interpolation, then the inequality

$|\delta | < \left\vert b_{k-1} - b_{k-2} \right\vert$
is used instead to perform the next action (to choose) interpolation (when inequality is true) or the bisection method (when inequality is not true).

Also, if the previous step used the bisection method, the inequality

$\left\vert s- b_k \right\vert < \frac{1}{2} \left\vert b_k - b_{k-1} \right\vert$
must hold, otherwise the bisection method is performed and its result used for the next iteration. If the previous step performed interpolation, then the inequality
$\left\vert s- b_k \right\vert < \frac{1}{2} \left\vert b_{k-1} - b_{k-2} \right\vert$

Brent proved that his method requires at most N² iterations, where N denotes the number of iterations for the bisection method. If the function f is well-behaved, then Brent's method will usually proceed by either inverse quadratic or linear interpolation, in which case it will converge superlinearly.

Furthermore, Brent's method uses inverse quadratic interpolation (IQI for short) instead of linear interpolation (as used by the secant method). If $$f(b_k ), \ f(a_k ) , \mbox{ and } f(b_{k-1})$$ are distinct, it slightly increases the efficiency. As a consequence, the condition for accepting i (the value proposed by either linear interpolation or inverse quadratic interpolation) has to be changed: s has to lie between $$\left( 3 a_k + b_k \right) /4$$ and bk.

Example: Consider the Airy function f(-x) = Ai(-x) of the first kind for negative argument:

$\mbox{Ai}(-x) = \frac{1}{\pi} \int_0^{\infty} \cos \left( \frac{t^3}{3} - xt \right) {\text d} t = \frac{1}{\pi} \sqrt{\frac{-x}{3}} \,K_{1/3} \left( \frac{2}{3} \,(-x)^{3/2} \right) ,$
where Kν(z) is the modified Bessel function of second kind or Macdonald function. We first plot this function.
 Plot[AiryAi[-x], {x, 0, 4}, PlotStyle -> Thick, PlotLabel -> Style["Airy function Ai(-x) of the first kind", FontSize -> 16, Background -> Yellow]] Airy function Ai(-x) of the first kind. Mathematica code

First, we find the root using the standard Mathematica command
FindRoot[AiryAi[-x], {x, 2}]
{x -> 2.33811}
From the graph of the Airy function, we identify the first root belongs to the interval [1, 3]. Upon calculating the values of the function at these endpoints, we set b = 3 and c = 𝑎 = 1.
f[x_] = AiryAi[-x];
N[f[1]]
0.535561
N[f[3]]
-0.378814
In the first iteration, we use linear interpolation between (1, 0.535561) and (3, -0.378814), which yields i = 2.17142.
b = (N[f[3]] - 3*f[1])/(f[3] - f[1])
2.17142
The value of the function at this point is
b2=f[b]
0.115663
In the second iteration, we use inverse quadratic interpolation (IQI) between three points
$(a,f(a)) = (1,0.535561), \quad (b, f(b)) = (2.17142, 0.115663), \quad (c,f(c)) = (3,-0.378814) .$
a = 1; c = 3;
A = f[b]*f[c]*a/(a - b)/(a - c) + f[a]*f[c]*b/(b - a)/(b2 - c) + f[a]*f[b]*c/(c - a)/(c - b2)
0.180425
Since
f[A]
0.401355
is larger than f(b), we use the bisection method.
m=(b+c)/2; f[m]
-0.169288
So we bracket the root with points b = 2.17142 and c = 2.58571, with f(c) = -0.169288. Then we use the IQI and set
a=3; b=2.17142; c=m;
A = f[b]*f[c]*a/(a - b)/(a - c) + f[a]*f[c]*b/(b - a)/(b2 - c) + f[a]*f[b]*c/(c - a)/(c - b2)
0.00762649
Since
f[%]
0.441646
we use linear interpolation with b = 2.17142 and c = 2.58571:
b = (N[f[3]] - 3*f[1])/(f[3] - f[1]); c=2.58571;
s = (c*f[b] - b*f[c])/(f[b] - f[c])
2.33959
Since
f[s]
-0.00103677
is negative, we set b = s = 2.33959, c = 2.17142, and 𝑎 = 2.58571. Then with the aid of IQI, we determine the the approximate
a= 2.58571; b = 2.33959; c=(N[f[3]] - 3*f[1])/(f[3] - f[1]);
A = f[b]*f[c]*a/(a - b)/(a - c) + f[a]*f[c]*b/(b - a)/(b - c) + f[a]*f[b]*c/(c - a)/(c - b)
1.10925
Since
f[%]
0.533359
exceeds the previous f(b) = 0.00103961, we use again the linear approximation
s= (c*f[b]-b*f[c])/(f[b]-f[c])
2.33809
with
f[s]
0.0000108358
positive, and f(2.33959) negative, we know that these two points embrace the root. Using linear approximation, we get
b= 2.33809; c=2.33959; s= (c*f[b]-b*f[c])/(f[b]-f[c])
2.33811
Since
f[s]
-9.18718*10^-12
we claim that the root is α = 2.33811, which is obtained on the fourth iteration.    ■

VIII. Ostrowski's Method

Alexander Markowich Ostrowski (1893---1986) was a talented Russian mathematician who was gifted with a phenomenal memory. Alexander was born to Jewish family in Kiev, Russian Empire. Alexander Ostrowski attended the Kiev College of Commerce, not a high school, and thus had an insufficient qualification to be admitted to university. However, his talent did not remain undetected: Ostrowski's mentor, professor at the University of Kiev, Dmitry Grave, wrote to German mathematicians Edmund Landau (1877--1938) and Kurt Hensel (1861--1941) for help. Subsequently, Ostrowski began to study mathematics at Marburg University in Germany under Hensel's supervision in 1912. When World War I broke out, he was therefore interned as a hostile foreigner. During this period, he was able to obtain limited access to the university library. After the war he resumed his studies at the more prestigious University of Göttingen and was awarded a doctorate in mathematics in 1920. He graduated summa cum laude and worked in different universities. He eventually moved to Switzerland to teach at the University of Basel, before the outbreak of World War II. Ostrowski fared well living in that neutral country during the war. He taught in Basel until he retired in 1958. He remained very active in math until late in his life.

This gifted and prolific mathematician was engaged in various mathematical problems. The advent of computers catapulted Ostrowski to delve into numerical analysis. He studied iterative solutions of large linear systems. Ostrowski passed away in 1986 in Montagnola, Lugano, Switzerland. He had lived there with his wife during his retirement.

Ostrowski tackled the problem of calculating a root for a single-variable non-linear function in a new way. Most of the methods we know perform a single refinement, for the guess of the root, in each iteration. Ostrowski’s novel approach was to obtain an interim refinement for the root and then further enhance it by the end of each iteration. The two-step iterations in Ostrowski’s method use the following two equations:

$\begin{split} y_{n} &= x_n - f(x_n )/f' (x_n ) , \\ x_{n+1} &= y_n - \frac{f(y_n ) \left( x_n - y_n \right)}{\left( f(x_n ) - 2\,f(y_n ) \right)} = x_n - \frac{f\left( x_n \right)}{f' \left( x_n \right)} \cdot \frac{f\left( x_n \right) - f\left( y_n \right)}{f\left( x_n \right) - 2\,f\left( y_n \right)} , \end{split} \qquad n=0,1,2,\ldots . \tag{Ostrowski.1}$

The first equation for yn applies the basic Newton algorithm as the first step to calculate yn which Ostrowski uses as an interim refinement for the root. The iteration’s additional refinement for the root comes by applying the latter equation. Ostrowski’s algorithm has a fourth-order convergence, compared to Newton’s algorithm which has a second-order convergence. This is an optimal method by the Kung-Traub conjecture. Thus, Ostrowski’s method converges to a root faster than Newton’s method. There is the extra cost of evaluating an additional "function call" to calculate f(yn) in Ostrowski’s method. This extra cost is well worth it, since, in general, the total number of function calls for Ostrowski’s method is less than that for Newton’s method.

Ostrowski’s method has recently inspired quite a number of new algorithms that speed up convergence. These algorithms calculate two and even three interim root refinements in each iteration. While these methods succeed in reducing the number of iterations, they fail to consistently reduce the total number of function calls, compared to the basic Ostrowski’s method. For example, the second equation can be chosen as

$x_{n+1} = y_n - \frac{\delta \left( y_n - x_n \right) f(y_n )}{f(y_n ) - \gamma\,f(x_n )}$
for some real numbers δ and γ. In particular, if δ = γ = 1, we get the Newton-Secant method. Taking δ = γ = 1/2 leads to the well-known Ostrowski's method. For example, if we apply Newton's method again, we obtain the eighth order three-step iterative method with five function evaluations:
\begin{align*} y_{n} &= x_n - \frac{f(x_n )}{f' (x_n )} , \\ z_n &= x_n - \frac{f\left( x_n \right)}{f' \left( x_n \right)} \cdot \frac{f\left( x_n \right) - f\left( y_n \right)}{f\left( x_n \right) - 2\,f\left( y_n \right)} , \tag{Ostrowski.2} \\ x_{n+1} &= z_n - \frac{f(z_n )}{f' (z_n )} , \qquad n=0,1,2,\ldots . \end{align*}

Example: Let us consider the Jacobi elliptic cosine function cn(x, k), which is a solution of the boundary value problem

$\frac{{\text d}^2 y}{{\text d} x^2} = \left( 2 - k^2 \right) y - 2\,y^3 , \qquad y(0,k) = y(K(k)) = 0,$
where
$K(k) = \int_0^1 \frac{{\text d}t}{\sqrt{\left( 1 - t^2 \right) \left( 1 - k^2t^2 \right)}} = \int_0^{\pi/2} \frac{{\text d}\theta}{\sqrt{1 - k^2 \sin^2 \theta}}$
is the complete elliptic integral. The Jacobi elliptic cosine function can also be defined implicitly:
$\mbox{cn}(u,k) = \cos \phi , \qquad u = \int_0^{\phi} \frac{{\text d}\theta}{\sqrt{1 - k^2 \sin^2 \theta}} \qquad 0 < k < 1.$
The parameter k is referred to as the elliptic modulus of u and ϕ, which we set k = 0.05.
 Upon plotting the elliptic cosine function we determine an interval [1,2] containing its zero. Plot[JacobiCN[x, 1/20], {x, 0, 2}, PlotStyle -> {Thickness[0.01], Magenta}, PlotLabel -> Style["Jacobi elliptic function cn(x,0.05)", FontSize -> 16, Background -> Yellow]] Jacobi elliptic cosine function cn(-x,0.05). Mathematica code

First, we find its root using the standard Mathematica command
FindRoot[JacobiCN[x, 1/20], {x, 2}]
1.5910034537907924
Since we want to apply Ostrowski's method, we need the derivative of the elliptic cosine function:
D[JacobiCN[x, 1/20], x]
-JacobiDN[x, 1/20] JacobiSN[x, 1/20]
$\frac{{\text d}\,{\mbox cn}(x,k)}{{\text d}x} = - \mbox{sn}(x,k)\,{\mbox dn}(x,k) .$
We choose an initial guess point to be x0 = 1 and find the first iterate.
f[x_] = JacobiCN[x, 1/20];
f1[x_] = D[JacobiCN[x, 1/20], x];
Newton's approximation is
y1=1.0-f[1.0]/f1[1.0]
1.6635 + 0. I
Then the first value becomes
x1 = 1.0 -f[1.0]/f1[1.0]*(f[1.0]-f[y1])/(f[1.0]-2*f[y1])
1.59533 + 0. I
For the second iterate, we have
y2=x1-f[x1]/f1[x1]
1.591 + 0. I
and
x2 = x1 -f[x1]/f1[x1]*(f[x1]-f[y2])/(f[x1]-2*f[y2])
1.5910034537907236 +0. I
So we use only two iterations, which allows to obtain 14 correct decimal places!

If we apply the Chebyshev iteration scheme, we need four iterations to obtain the same accuracy:

f[x_] = JacobiCN[x, 1.0/20];
ChebyshevMethodList[f[x], {x, 1}, 4]
Re[%]
{1, 1.5277, 1.59097, 1.591, 1.591}
On the other hand, Newton's method requires five iterations:
NewtonsMethodList[f[x], {x, 1.0}, 5]
Re[%]
{1., 1.6635, 1.59089, 1.591, 1.591, 1.591}
■
• Gautschi, Walter, "Alexander M. Ostrowski (1893–1986): His life, work, and students", can be downloaded from http://www.cs.purdue.edu/homes/wxg/AMOengl.pdf.
• Grau-Sánchez, M., Grau, A., Noguera, M., "Ostrowski type methods for solving systems of nonlinear equations," Applied Mathematics and Computation, 2011, Vol. 218, pp. 2377–2385.
• Shammas, N., Ostrowski’s Method for Finding Roots

IX. Muller’s Method

Muller’s method is a generalization of the secant method, in the sense that it does not require the derivative of the function. Muller's method finds a root of the equation $$f(x) =0 .$$ It is an iterative method that requires three starting points $$\left( p_0 , f(p_0 ) \right) , \ \left( p_1 , f(p_1 ) \right) , \ \left( p_2 , f(p_2 ) \right) .$$ A parabola is constructed that passes through the three points; then the quadratic formula is used to find a root of the quadratic for the next approximation. If the equation f(x) = 0 has a simple root, then it has been proved that Muller’s method converges faster than the secant method and almost as fast as Newton’s method. The method can be used to find real or complex zeros of a function and can be programmed to use complex arithmetic.

David Eugene Muller (1924--2008) was a professor of mathematics and computer science at the University of Illinois (1953–1992), when he became an emeritus professor, and was an adjunct professor of mathematics at the New Mexico State University (1995-2008). Muller received his BS in 1947 and his PhD in 1951 in physics from Caltech; an honorary PhD was conferred by the University of Paris in 1989. He was the inventor of the Muller C-element (or Muller C-gate), a device used to implement asynchronous circuitry in electronic computers. He also co-invented the Reed–Muller codes.

Without loss of generality, we assume that p2 is the best approximation to the root and consider the parabola through the three starting values. Make the change of variable

$t= x- p_2 ,$
and use the differences
$h_0 = p_0 - p_2 , \qquad h_1 = p_1 - p_2 .$
Consider the quadratic polynomial involving the variable t:
$y = at^2 +bt +c$
that passes through these three initial points:
\begin{align*} \mbox{at } t&= h_0: \qquad ah_0^2 +bh_0 +c = f_0 = f(h_0 ) , \\ \mbox{at } t&= h_1: \qquad ah_1^2 +bh_1 +c = f_1 = f(h_1 ) , \\ \mbox{at } t&= 0: \qquad a0^2 +b0 +c = f_2 = f(0 ) , \\ \end{align*}
Solving these three equations, we find
$a= \frac{\left( f_0 -c \right) h_1 - \left( f_1 -c \right) h_0}{h_1 h_0^2 - h_1^2 h_0} , \qquad b = \frac{\left( f_1 -c \right) h_0^2 - \left( f_0 -c \right) h_1^2}{h_1 h_0^2 - h_1^2 h_0} , \qquad c= f_2 .$
The quadratic formula is used to find the roots $$t = z_1 , \ z_2$$ of the equation $$at^2 + bt +c =0 :$$
$z = \frac{-2c}{b \pm \sqrt{b^2 - 4ac}} .$
To update the iterates, throw out the one of the initial points that is farthest away. Although a lot of auxiliary calculations are done in Muller’s method, it only requires one function evaluation per iteration.

If Muller’s method is used to find the real roots of $$f(x) =0 ,$$ it is possible that one may encounter complex approximations, because the roots of the quadratic in $$at^2 + bt +c =0$$ might be complex (nonzero imaginary components). In these cases the imaginary components will have a small magnitude and can be set equal to zero so that the calculations proceed with real numbers.

Example: We consider the Bernoulli polynomial of order n:

$B_n (x) = \sum_{k=0}^n \binom{n}{k} B_k x^{n-k} = \frac{{\texttt D}}{e^{\texttt D} -1} \, x^n ,$
where
$B_k = \sum_{i=0}^k \frac{1}{i+1} \sum_{j=0}^i (-1)^j \binom{i}{j} j^k$
is the Bernoulli number.
 Upon plotting the Bernoulli polynomial we determine an interval [0.1, 0.5] containing its zero. Plot[{BernoulliB[22, x]}, {x, 0, 1}, PlotStyle -> {Thickness[0.01], Magenta}, PlotLabel -> Style["Bernoulli polynomial \!$$\*SubscriptBox[\(B$$, \ $$22$$]\)(x)", FontSize -> 16, Background -> LightBlue]] Bernoulli polynomial of order 22. Mathematica code

First, we pick up three points (x1, y1), (x2, y2), and (x3, y3), where
$x_1 = 0.1, \quad x_2 = 0.3, \quad x_3 = 0.5, \qquad y_i = B_{22} \left( x_i \right) , \quad i =1,2,3.$
So y1 = 5009.53, y2 = -1913.47, and y3 = -6192.12. Then the next iterate x4 is a solution of the quadratic equation
$a\,x^2 + b\, x + c = 0 ,$
where
\begin{align*} a &= \frac{\left( y_1 - y_2 \right) - \left( y_2 - y_3 \right)\left( x_1 - x_2 \right) / \left( x_2 - x_3 \right)}{x_1^2 - x_2^2 - \left( x_2^2 - x_3^2 \right) \left( x_1 - x_2 \right) / \left( x_2 - x_3 \right)} = 33054.4, \\ b &= \frac{\left( y_1 0 y_2 \right) - a \left( x_1^2 - x_2^2 \right)}{x_1 - x_2} = -47836.8, \\ c &= y_1^2 - a\,x_1^2 - b\,x_1 = 9462.67 . \end{align*}
x1 = 0.1; x2 = 0.3; x3 = 0.5; y1 = BernoulliB[22, x1]; y2 = BernoulliB[22, x2]; y3 = BernoulliB[22, x3]; a = (y1-y2 -(y2-y3)*(x1-x2)/(x2-x3))/(x1^2 - x2^2 -(x2^2 - x3^2)*(x1-x2)/(x2-x3)); b = (y1-y2 -a*(x1^2 -x2^2))/(x1-x2); c = y1-a*x1^2 -b*x1;
root = FindRoot[a*x^2 + b*x + c , {x, 0.1}] x4 = x /. root
0.23644
We remove the largest value (x1, y1) and set instead our new value. So we set x1 = 0.23644. Now we can pproceed with the next iteration.
\begin{align*} a &= \frac{\left( y_1 - y_2 \right) - \left( y_2 - y_3 \right)\left( x_1 - x_2 \right) / \left( x_2 - x_3 \right)}{x_1^2 - x_2^2 - \left( x_2^2 - x_3^2 \right) \left( x_1 - x_2 \right) / \left( x_2 - x_3 \right)} = 64509.1, \\ b &= \frac{\left( y_1 0 y_2 \right) - a \left( x_1^2 - x_2^2 \right)}{x_1 - x_2} = -73000.5 \\ c &= y_1^2 - a\,x_1^2 - b\,x_1 = 14180.9 . \end{align*}
x1 = x4; y1 = BernoulliB[22, x4]; a = (y1-y2 -(y2-y3)*(x1-x2)/(x2-x3))/(x1^2 - x2^2 -(x2^2 - x3^2)*(x1-x2)/(x2-x3)); b = (y1-y2 -a*(x1^2 -x2^2))/(x1-x2); c = y1-a*x1^2 -b*x1;
root = FindRoot[a*x^2 + b*x + c , {x, 0.1}] x4 = x /. root
0.249082
The value of the Bernoulli polynomial at this point is
BernoulliB[22, x4]
35.7025
which is too llarge. therefore, we need to make the third iteration and pick x4 as x3:
x3 = x4; y3 = BernoulliB[22, x4]; a = (y1-y2 -(y2-y3)*(x1-x2)/(x2-x3))/(x1^2 - x2^2 -(x2^2 - x3^2)*(x1-x2)/(x2-x3)); b = (y1-y2 -a*(x1^2 -x2^2))/(x1-x2); c = y1-a*x1^2 -b*x1;
root = FindRoot[a*x^2 + b*x + c , {x, 0.1}] x4 = x /. root
0.249082
which is closed to the true answer provided by Mathematica:
FindRoot[BernoulliB[22, x], {x, 0.1}]
0.25
■

1. Brent, R.P., An algorithm with guaranteed convergence for finding a zero of a function, Computer Journal, 1971, Vol. 14, Issue 4, pp. 422-425.
2. Bus, J. C. P., and T. J. Dekker, Two efficient algorithms with guaranteed convergence for finding a zero of a function, ACM Transactions on Mathematical Software (TOMS), 1.4 (1975): 330-345.
3. Chun, C., Ham, Y.M., Some fourth-order modiﬁcations of Newton’s method, Applied Mathematics and Computation, 2008, Vol. 197, pp. 654--658.
4. Dekker, T.J., Finding a zero by means of successive linear interpolation, Constructive Aspects of the Fundamental Theorem of Algebra, B. Dejon and P. Henrici, Eds., Wiley Interscience, London, 1969, pp. 37--51.
5. Feng, X.L., He, Y.N., High order iterative methods without derivatives for solving nonlinear equations, Applied Mathematics and Computation, 2007, Vol. 186, pp. 1617--1623.
6. Jarratt, P., Some fourth order multipoint iterative methods for solving equations, Math. Comput. 20 (95) (1966) 434–437.
7. King, R.F., A Family of Fourth Order Methods for Nonlinear Equations, SIAM Journal on Numerical Analysis, 1973, Volume 10, Issue 5, pp. 876--879. https://doi.org/10.1137/0710072
8. Kou, J., Li, Y., The improvements of Chebyshev–Halley methods with fifth-order convergence, Applied Mathematics and Computation, 2007, Volume 188, Issue 1, pp. 143--147. https://doi.org/10.1016/j.amc.2006.09.097
9. Kung, H.T., Traub, J.F., Optimal order of one-point and multipoint iteration, Journal of the Association for Computing Machinery (ACM), 1974, Vol. 21, No. 4, pp. 643--651. https://doi.org/10.1145/321850.321860
10. Muller, D.E., A method for solving algebraic equations using automatic computer, Mathematical Tables and other Aids to Computation, 1956, Vol. 10, No. 56, pp. 208--215.
11. Ostrowski, A.M., Solution of Equations in Euclidean and Banach Space, Academic Press, New York, 1973.
12. Parhi, S.K., Gupta, D.K., A sixth order method for nonlinear equations, Applied Mathematics and Computation, 2008, Vol. 203, pp. 50--55.
13. Traub. J.F., Iterative Methods for the Solution of Equation, American Mathematical Society; Second Edition (January 1, 1982) New Jersy.
14. Wang, X., Kou, J., Li, Y., A variant of Jarratt method with sixth-order convergence, Applied Mathematics and Computation, 2008, Vol. 204, pp. 14--19.