Numerical Analysis: Solve Nonlinear Equations

September 26, 2024 9 minute read

Define nonlinear the nonlinear equation, Discussing how to solve it, let’s see the following,

\[f(x) = 0\tag 1\]

Dichotomy

In dichotomy method, the function will be divided two regions, and the root is approximated step by step, based on a theorem:

Let $f$ be a continuous function on the interval $[a, b]$. If $f(a)f(b) < 0$, then $f$ has a root between $a$ and $b$.

The typical bisection method computation process is as follows:

Take the midpoint $x_0 = \frac{a+b}{2}$ of the interval $[a, b]$ and compute $f(x_0)$:

If $f(a)f(x_0) = 0$, then the root is found
If $f(a)f(x_0) > 0$, then the interval containing the root is $[x_0, b]$, and the next iteration is performed on this interval
If $f(a)f(x_0) < 0$, then the interval containing the root is $[a, x_0]$, and the next iteration is performed on this interval

This process is repeated consistently until the condition is met

In fact, it is difficult for computer calculations to converge exactly to zero, so a certain precision $\varepsilon$ is usually set, and when $|\alpha - x_n| < \varepsilon$, it is considered that $x_0$ is the root.

Using this precision, we can also predict the number of iterations required to converge to this precision：

\[\vert \alpha - x_n \vert \leqslant \frac{b_n-a_n}{2} = \frac{b_0-a_0}{2^{n+1}} \tag2\] \[n = \frac{(\ln(b_0-a_0)-\ln\varepsilon)}{\ln 2}-1 \tag3\]

The bisection method is easy to understand, the computation process is simple, and convergence can be guaranteed. However, its disadvantages are also obvious, namely slow convergence speed and inability to find even-degree multiple roots, among others.

Below, we introduce a commonly used approach in practical applications: the iterative method.

Iterative Method - Concept

he main idea of the iterative method is to convert the nonlinear equation solution into a fixed-point problem, that is, to construct a recursive relationship, compute an approximate sequence, and make it converge to the root of equation (1):

\[x =G(x)\tag4\]

At the same time, select an initial approximate value for a root and perform iterative refinement.

\[x_{i+1}=G(x_i), i=0,1,2,\dots \tag 5\]

At this point, we inevitably start considering a question: how can we ensure that the recursive relation we construct is convergent? If we obtain a recursive formula of the form $x_{i+1}=2x_i+1$, it is clear that it will never converge. To guarantee the convergence of our constructed recursive relation, we have the following theorem:

Convergence

$Theorem 1:$

$\text{1-1: if }x\in[a,b],G(x)\in[a,b], \text{It is considered that convergence exists.}$

$\text{1-2:if }\forall x,y\in[a,b], \ G(x)-G(y)<L\vert x-y\vert,\ 0<L<1, \text{It is considered that a unique fixed point exists.}$

Here, let me briefly mention the proof ideas for Theorem 1. For part 1-1, by subtracting the upper and lower bounds, we can show that the function values at the bounds have opposite signs, indicating the existence of a root. For part 1-2, we use proof by contradiction. Assuming there are two fixed points, the derivation will ultimately lead to the conclusion that part 1-2 holds only if the two points are equal.

$Theorem 2:$

$\text{If Theorem 1 holds, then the sequence } x_{k+1}=G(x_k) \text{ generated by }{x_k}\text{ converges to the unique fixed point }\alpha$

It’s easy to prove theorem 2 right，By subtracting the fixed point from the $k$-th recursive formula and deriving it, we can obtain a formula that is the k+1power of $L$ multiplied by a constant. Thus, as $k$ approaches infinity, the result naturally approaches $0$.

This situation is referred to as global convergence. In fact, Theorem 1 can also be expressed in derivative form using the Mean Value Theorem. That is,

$ \forall x\in[a,b], \ \vert G(x)’ \vert<L<1, \text{It is considered to converge to a single root}$

There is also the case of local convergence: convergence in the neighborhood of the fixed point, but not when the initial value is far away.

Convergence Speed

We previously mentioned that the bisection method has a slow convergence speed. In practical applications, we typically use iterative methods. Regarding the convergence speed of iterative methods, we can elaborate on the following two theorems:

$\text{Theorem 3:} $

$\text{if Theorem 1 is satisfied}，\vert \alpha-x_{k+1}\vert \leqslant \frac{L}{1-L} \vert x_{k+1}-x_k \vert $

Theorem 3 shows that when $L$ is smaller, the convergence speed is faster.

$\text{Theorem 4:} $

$\text{if Theorem 1 is satisfied, when } x_{k+1}=G(x_k),\text{if } $

$(1)G’(\alpha) = G^2(\alpha)=\dots = G^{p-1}(\alpha) = 0 $

$(2)G^p(\alpha)\neq0 $

$\text{It is considered that }x_k\text{ converges to }p \text{ with order }\alpha\text{, and vice versa.}$

The proof method involves performing a Taylor expansion of $x_{k+1}$at the fixed point, which yields the following expression: $\lim\limits_{x\rightarrow\infty}\frac{x_{k+1}-\alpha}{(x_k-\alpha)^p}=G^p(\alpha)$，which is not equal to 0.

Iterative Method—Simple Iteration

This involves directly constructing a single-point fixed-length iterative format of the form in equation (4). It is important to note that we previously mentioned that directly constructed recursive relations may not necessarily converge. Therefore, we first need to use Theorem 1 or the derivative form derived from the Mean Value Theorem to determine whether the recursive relation converges.

Iterative Method—Newton’s Method

The above theorems provide conditions for the iterativity and convergence speed of iterative methods, but one of the core steps in the computation, the construction of the iterative equation, has not been mentioned. Below, we will discuss the steps for the classic method known as Newton’s Method, outlining the steps for solving using Newton’s Method and its subsequent variations.

Newton’s Method constructs the iterative formula using Taylor expansion, truncating to the first-order expansion:

\[f(x)\approx x_0+f'(x_0)(x-x_0) \tag 6\]

union$(1)$：

\[x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)} \tag 7\]

Convergence Analysis

Take the derivative of $G(x)=x_k-\frac{f(x_k)}{f’(x_k)}$：

\[G'(\alpha) = \frac{f(\alpha)f''(\alpha)}{(f'(a))^2} \tag 8\]

When $f’(\alpha) \neq 0$, since $f(\alpha)=0$，we have $G’(\alpha)=0$， Therefore, the convergence near $\alpha$ is at least quadratic, meaning that the Newton iteration for single root problems converges at least quadratically. However, what if it is a multiple root? In that case, it can be expressed in the form of equation (9).

\[f(x)=(x-\alpha)^mg(x) \tag 9\]

\[G(x)=x-\frac{(x-a)^mg(x)}{m(x-a)^{m-1}g'(x)}=x-\frac{(x-a)g(x)}{mg'(x)} \tag {10}\] \[G'(x)=1-\frac1m<1 \tag{11}\]

In the case of multiple roots, the convergence is linear. In fact, from equation (11), we can also see that when $m=1$(i.e., for a single root), the first derivative is 0, indicating that it converges at least quadratically.

Variations

Here are a few examples of variations of Newton’s Method:

Simplified Newton’s method：$x_{k+1}=x_k-\frac{f(x_k)}{c}$. That is, the derivative from the original Newton method is converted into a constant.
Newton down-hill method：$x_{k+1}=x_k-\lambda \frac{f(x_k)}{f’(x_k)} $. That is, considering $x_k$ and $x_{k+1}$ comprehensively.
Secant method：$x_{k+1}=x_k- \frac{x_{k+1}-x_k}{f(x_{k+1})-f(x_k)}f(x_k) $. Using secant but not derivative.

Here, we don’t discuss the time efficiency and the convergence.

Multiple roots

When we know the multiplicity of the root $m$, we can improve

\[x_{k+1}=x_k-m\frac{f(x_k)}{f'(x_k)} \tag {12}\]

At this point, the convergence is quadratic

However, in most cases, we do not know the multiplicity of the root. The method here is to take $u(x)=\frac{f(x)}{f’(x)}$, and modify the formula accordingly

\[x_{k+1}=x_k-\frac{u(x_k)}{u'(x_k)} \tag {13}\]

This method actually requires computing the second derivative of the original function $f(x)$, which results in lower time efficiency, and $u(x)$ may no longer be a continuous function.

Another method is Steffensen’s acceleration iteration, which can determine the multiplicity of the root and then use equation (12) to calculate the convergence value.

Steffensen’s acceleration iteration

\[\lim\limits_{i\rightarrow\infty} \frac{\varepsilon_{i+1}}{\varepsilon_i}=1-\frac{1}{r} \tag{14}\] \[r\approx \frac{a-x_{i+1}}{x_{i+2}-x_{i+1}}=\frac{x_ix_{i+2}-x_{i+1}^2}{x_{i+2}-2x_{i+1}+x_i}\frac{1}{x_{i+2}-x_{i+1}}-\frac{x_{i+1}}{x_{i+2}-x_{i+1}} \tag{15}\]

where $r$ is the multiplicity of the root.

System of nonlinear equations

Consider the system of nonlinear equations situation,

\[F(x)=0,F(x)=(f_1,f_2,\dots,f_n)^T,x=(x_1,x_2,\dots,x_n)^T \tag{16}\]

Using the Quasi-Newton Method for solving, which is similar to Newton’s Method, the approach involves computing the Jacobian matrix of $F(x)$. However, since it requires not only calculating the Jacobian matrix but also inverting it, the computational load is quite heavy. Therefore, we consider using some approximation methods, including:

\[\left\{ \begin{aligned} &x^{i+1} = x^i -H_iF(x^i), \\ &H_{i+1}(F(x^{i+1})-F(x^{i})) = x^{i+1}-x^i \\ &H_{i+1} = H_i+\Delta H_i \end{aligned} \right.\tag {17}\]

where $H =A^{-1}=F’$

In equation (17), we have not specified the value of $\Delta H_i$. The method that takes $rank(\Delta H_i)=1$ is called the rank-1 Quasi-Newton Method. Here, we have the Broyden rank-1 method and the inverse Broyden rank-1 method. Due to space limitations, I won’t elaborate further (Yes, I’ve reached a point of exhaustion and can’t write any more).

Summary

1. Convergence Criteria

Determining Convergence: Distinguish between global and local convergence.
Assessing Convergence Speed: Evaluate using the order of convergence and factor LL.
Representing Computational Efficiency: Methods to express efficiency.

2. Iterative Methods

Bisection Method:
- Methodology
- Convergence speed
- Drawbacks
Simple Iteration Method:
- Convergence criteria
Newton’s Method:
- Methodology
- Issue with convergence order for multiple roots
- Variations like the secant method
- Improved Newton’s method for multiple roots
- Introduction to Steffensen’s accelerated iteration
Rank-1 Quasi-Newton Method for nonlinear systems

Xinhao Liu