| Download a free evaluation copy of NeuroSolutions to discover how to apply neural network technology to your artificial intelligence application. |
1.9 Analytic versus Iterative Solutions
Selecting a search procedure to find the optimal weights is a drastic conceptual change from the analytic least square solution, albeit an equivalent procedure. In learning systems the iterative solution is the most common for several reasons.
When
working with learning systems, the interest is very often in
on-line solutions, that is, solutions that can be implemented
sample-by-sample. The analytic solution requires data to be
available beforehand to compute the autocorrelation matrix R
and cross-correlation vector p. Fast computers are
required to crank out the solution (the inverse of R and
product with p). The method produces a value that
immediately gives the best possible performance. But several
problems may surface when applying the analytic approach. If the
matrix R is ill-conditioned, the computation
of
may not be very accurate. Moreover, the
analytic solution also requires a great deal of computation time
(Computation of a matrix inverse is proportional to the square of
the number of columns D of the matrix. In the big O notation this means
O(Dē)).
The
iterative solution is not free from shortcomings. We already saw
that there is no guarantee that the solution is close to the
optimal weight w* when all the input samples are used by
the algorithm. This depends on the data and on a judicious
selection of the step size
. The accuracy of the iterative solution is not
directly dependent on the condition number of R, but
matrices with large eigenvalue spread produce slow convergence
because the gradient descent adaptation is coupled. As we said
previously, the slowest mode controls the speed of adaptation,
while the largest step size is constrained by the largest
eigenvalue.
The great appeal of the iterative approach to optimization is that very efficient algorithms exist to estimate the gradient (e.g., the LMS algorithm). Only two multiplications per weight are necessary, so the computation scales proportional to the number of weights D (i.e., O(D) time). Moreover, the method can be readily extended to nonlinear systems, while the analytic approach for most cases of practical relevance cannot be computed.