Download a free evaluation copy of NeuroSolutions to discover how to apply neural network technology to your artificial intelligence application.

contents.gifindex.gifprev1.gifnext1.gif

1.8 Newton's Method

If you are familiar with numerical analysis, you may be asking why we aren't using Newton's method for the search. Newton's method is known to find the roots of quadratic equations in one iteration. The minimum of the performance surface can be equated to the root of the gradient equation Eq.1.32, as is outlined by Eq.1.31. Hence, Newton's method can also be used in search. The adaptive weight equation using the Newton's method is Newton's Derivation

NEURAL AND ADAPTIVE SYSTEMS00000106.gif (1.43)

Compare this with Eq.1.33 and note that the gradient information is weighted by the inverse of the correlation matrix of the input and that NEURAL AND ADAPTIVE SYSTEMS00090002.gif is equal to 1. This means that Newton's method corrects the direction of the search such that it always points to the minimum, while the gradient descent points to the maximum direction of change. These two directions may or may not coincide (Figure 1-18).

NEURAL AND ADAPTIVE SYSTEMS00000107.gif

Figure 1-18 Directions of the steepest descent and Newton's method


They coincide when the contour plots are circles, that is, when the largest and the smallest eigenvalue of the correlation matrix are the same. When the ratio of the largest to the smallest eigenvalue (the eigenvalue spread) increases, the slope of the performance surface in the two directions differs more and more. Thus, for large eigenvalue spreads the optimization path taken by gradient descent is normally much longer than the path taken by Newton's method. This implies that Newton's method will be faster than LMS when the input data correlation matrix has a large eigenvalue spread.

Another advantage of Newton's method versus the steepest descent method is in the time constant of adaptation. When the gradient is multiplied by RNEURAL AND ADAPTIVE SYSTEMS00000108.gif not only the direction of the gradient is being changed but also the different eigenvalues in each direction are being equalized. What this means is that Newton's method is automatically correcting the time constant of adaptation for each direction such that all the weights converge at the same rate. Hence Newton's method has a single time constant of adaptation, unlike the steepest descent method.

These advantages of the Newton's method should not come as a surprise, because Newton's method uses much more information about the performance surface (the curvature). In fact, to implement Newton's method we need to compute the inverse of the correlation matrix, which takes significantly longer than the single multiplication required by the LMS method and also requires global information. Newton's method is also brittle, that is, if the surface is not exactly quadratic, the method may diverge. This is the reason Newton's method is normally modified to also include a small step size NEURAL AND ADAPTIVE SYSTEMS00090002.gif instead of using NEURAL AND ADAPTIVE SYSTEMS00090002.gif=1 as in Eq.1.43.

NEURAL AND ADAPTIVE SYSTEMS00000109.gif (1.44)

Note that x(k) is a vector and RNEURAL AND ADAPTIVE SYSTEMS00000108.gif is a matrix, so the update for one weight influences all the other inputs in the system. This is the reason that the computations are no longer local to each weight. However, they are not difficult if one assumes that the inverse of R is known a priori. The algorithm or Eq. 1.44 is called the LMS/Newton algorithm. The case where RNEURAL AND ADAPTIVE SYSTEMS00000108.gif has to be estimated on-line is much more involved and leads to the recursive least squares (RLS) algorithm.

Alternatively, to improve convergence speed with the LMS, we can implement an orthogonalizing transformation of the input correlation function followed by an equalization of the eigenvalues which is called a whitening transformation. (see Appendix A section 3.18). Since Newton's method coincides with the steepest descent for performance surfaces that are symmetric, this preprocessing will make the LMS perform as Newton's method.

NEUROSOLUTIONS EXAMPLE 1.20

Newton's Method

In this example, we implement Newton's method with a custom DLL. For this example, we must compute RNEURAL AND ADAPTIVE SYSTEMS00000108.gif and apply Eq. 1.44 to the simulator. The autocorrelation function for this example is

NEURAL AND ADAPTIVE SYSTEMS00000110.gif

so RNEURAL AND ADAPTIVE SYSTEMS00000108.gif becomes (see the Appendix, section 3.11)

NEURAL AND ADAPTIVE SYSTEMS00000111.gif

By applying Newton's method to the learning algorithm, we have essentially compensated for the eigenvalue spread. This means that Newton's method behaves as the steepest descent for a circular performance surface where the steepest descent direction always points directly to the optimal value. Thus, although the calculations are more complicated and more demanding (we need to know compute RNEURAL AND ADAPTIVE SYSTEMS00000108.gif), the convergence is much faster (in fact, the algorithm can converge in one epoch). When we run the simulator, notice that no matter where the algorithm starts, it always heads directly toward the optimum.

NeuroSolutions Example

Go to next section