Convergence for Multiple Weights Case

One can show that the condition to guarantee converge (see Widrow and Stearns) is

NEURAL AND ADAPTIVE SYSTEMS00000157.gif (1B.20)

where NEURAL AND ADAPTIVE SYSTEMS00090016.gif is the eigenvalue matrix,

NEURAL AND ADAPTIVE SYSTEMS00000158.gif (1B.21)

which means that in every principal direction of the performance surface (given by the eigenvectors of the input correlation matrix R), we must have

NEURAL AND ADAPTIVE SYSTEMS00000159.gif (1B.22)

where NEURAL AND ADAPTIVE SYSTEMS00090003.gifi is the corresponding eigenvalue. This equation also means that with a single NEURAL AND ADAPTIVE SYSTEMS00090002.gif each weight wi(k) is approaching its optimal value wi* with a different time constant ("speed"), so the weight tracks bend, and the path is no longer a straight line toward the minimum.

This is the mathematical description of our earlier statement that the gradient descent algorithm behaves like many one dimensional univariable algorithms along the eigenvector directions. Notice that Eq. 1B.21 is diagonal, so there is no crosscoupling between time constants along the eigenvector directions.

In any other direction of the space there will be coupling. However, we can still decompose the overall weight tract as a combination of weight tracts along each eigendirection as we did in Figure 1-16. Eq. 1B.22 shows that the step size along each direction obeys the same rule as the unidimensional case (
Eq.1.17).

Use your browser's back button to return to text.