is set too high, the learning can diverge. The selection of
can be even trickier than the selection of
because it is highly dependent on the performance surface. If
is too large, the weights may not move quickly enough to the minimum and the
adaptation may stall. If
is too small, the search may reach the global minimum quickly and must wait a
long time before the learning rate decreases enough to minimize the rattling.
There are other (more automatic) methods for adapting the learning rate that we
discuss later in the book.
Use your browser's back button to
return to text.