Download a free evaluation copy of NeuroSolutions to discover how to apply neural network technology to your artificial intelligence application.

contents.gifindex.gifprev1.gifnext1.gif

1.9 Analytic versus Iterative Solutions

Selecting a search procedure to find the optimal weights is a drastic conceptual change from the analytic least square solution, albeit an equivalent procedure. In learning systems the iterative solution is the most common for several reasons.

When working with learning systems, the interest is very often in on-line solutions, that is, solutions that can be implemented sample-by-sample. The analytic solution requires data to be available beforehand to compute the autocorrelation matrix R and cross-correlation vector p. Fast computers are required to crank out the solution (the inverse of R and product with p). The method produces a value that immediately gives the best possible performance. But several problems may surface when applying the analytic approach. If the matrix R is ill-conditioned, the computation of NEURAL AND ADAPTIVE SYSTEMS00000112.gifmay not be very accurate. Moreover, the analytic solution also requires a great deal of computation time (Computation of a matrix inverse is proportional to the square of the number of columns D of the matrix. In the big O notation this means O(Dē)).

The iterative solution is not free from shortcomings. We already saw that there is no guarantee that the solution is close to the optimal weight w* when all the input samples are used by the algorithm. This depends on the data and on a judicious selection of the step size NEURAL AND ADAPTIVE SYSTEMS00090002.gif. The accuracy of the iterative solution is not directly dependent on the condition number of R, but matrices with large eigenvalue spread produce slow convergence because the gradient descent adaptation is coupled. As we said previously, the slowest mode controls the speed of adaptation, while the largest step size is constrained by the largest eigenvalue.

The great appeal of the iterative approach to optimization is that very efficient algorithms exist to estimate the gradient (e.g., the LMS algorithm). Only two multiplications per weight are necessary, so the computation scales proportional to the number of weights D (i.e., O(D) time). Moreover, the method can be readily extended to nonlinear systems, while the analytic approach for most cases of practical relevance cannot be computed.

Go to next section