Convergence for Multiple Weights Case
One can show that the condition to guarantee converge (see Widrow and Stearns) is
(1B.20)
where is the eigenvalue matrix,
(1B.21)
which means that in every principal direction of the performance surface
(given by the eigenvectors of the input correlation matrix R), we must have
(1B.22)
where i is the corresponding eigenvalue. This equation also means that with a single each weight wi(k) is approaching its optimal value wi* with a different time constant ("speed"), so the weight tracks bend, and the
path is no longer a straight line toward the minimum.
This is the mathematical description of our earlier statement that the
gradient descent algorithm behaves like many one dimensional univariable algorithms
along the eigenvector directions. Notice that Eq. 1B.21 is diagonal, so there is
no crosscoupling between time constants along the eigenvector directions.
In any other direction of the space there will be coupling. However, we can
still decompose the overall weight tract as a combination of weight tracts along
each eigendirection as we did in Figure 1-16. Eq. 1B.22 shows that the step
size along each direction obeys the same rule as the unidimensional case (Eq.1.17).