Download a free evaluation copy of NeuroSolutions to discover how to apply neural network technology to your artificial intelligence application.

contents.gifindex.gifprev1.gifnext1.gif

1.4 Adaptive Linear Systems

1.4.1 Least Squares as a Search for the Parameters of a Linear System

The purpose of least squares is to find parameters (b, w) that minimize the difference between the system output yi and the desired response di. Regression is effectively computing the optimal parameters of an interpolating system (linear in this case) which predicts the value of d from the value of x.

NEURAL AND ADAPTIVE SYSTEMS00000034.gif

Figure 1-6 Regression as a linear system design problem


Figure 1-6 shows graphically the operation of adapting the parameters of the linear system. The system output y is always a linear combination of the input x with the bias, so it has to lie on a straight line of equation y = wx+b. Changing b modifies the y intercept, while changing w modifies the slope. Therefore we conclude that the goal of linear regression is to adjust the position of the line to minimize the average square difference between the y values (on the line) and the cloud of points di (i.e. the criterion J).

The key point is to recognize that the error contains information that can be used to optimally place the line. Figure 1-6 shows this by including a subsystem that accepts the error as input and modifies the parameters of the system. Thus, the error NEURAL AND ADAPTIVE SYSTEMS00090000.gifi is fed back to the system and indirectly affects the output through a change in the parameters (b, w). Effectively the system is made "aware" of its performance through the error. With the incorporation of the mechanism that automatically modifies the system parameters, a very powerful linear system can be built that will constantly seek optimal parameters. Such systems are called neural and adaptive systems and are the focus of this book.

1.4.2 Neural and Adaptive Systems

Before pursuing the study of adaptive systems, it is important to reflect briefly on the implications of neural and adaptive systems in engineering design. System design usually begins with specifications. First, the problem domain is studied and modeled, specifications are established, and then a system is built to meet the specifications. The key point is that the system is built to meet the current specifications and will always use the designed set of parameters, even if the external conditions change.

Here we are proposing a very different system design approach based on adaptation, which has a biological flavor to it. In the beginning the system parameters may be way off, creating a large error. However, through the feedback from the error, the system can change its parameters to decrease the error as much as possible. The system's "experience" with the data designs the best set of parameters. An adaptive system is more complex because it has to not only accomplish the desired task, but also be equipped with a subsystem that adapts its parameters. But notice that even if the data changes in the future, this design methodology will modify the system parameters so that the best possible performance is obtained. Additionally, the same system can be used for multiple problems.

There are basically two ways to adapt the system parameters: supervised learning and unsupervised learning. The method described until now belongs to supervised learning because there is a desired response. Later on in the book we will find other methods that also adapt the system parameters, but using only an internal rule. Since there is no desired response, these methods are called unsupervised. We will concentrate here on supervised learning methods.

The ingredients of supervised adaptive system design are

A system (linear in this case) with adaptive parameters

The existence of a desired or target response d

An optimality criterion (the MSE in this case) to be minimized

A method (subsystem) to compute the optimal parameters.

The method of least squares finds the optimal parameters (b, w) analytically. Our goal is to find alternate ways of computing the same parameters using a search procedure.

1.4.3 Analysis of the Error in the Space of the Parameters: The Performance Surface

Let us analyze the mean square error (J) as we change the parameters of the system (w and b). Without loss of generality, we are going to assume that b =0 (or equivalently, that the mean of x and d have been removed), such that J becomes a function of the single variable w

NEURAL AND ADAPTIVE SYSTEMS00000035.gif (1.9)

If w is treated as the variable and all other parameters are held constant, we can immediately see that J is quadratic on w with the coefficient of w² (i.e. xi²) being always positive. In the space of the possible w values, J is a parabola facing upward (J is always positive since it is a sum of squares). The function J(w) is called the performance surface for the regression problem (Figure 1-7). The performance surface is an important tool that helps us visualize how the adaptation of the weights affects the mean square error.

NEURAL AND ADAPTIVE SYSTEMS00000036.gif

Figure 1-7 The performance surface for the regression problem


NEUROSOLUTIONS EXAMPLE 1.4

Plotting the Performance Surface

The performance surface is just a plot of the error criterion (J) versus the value of the weights, so what we will do during the simulation is to vary the Synapse weight (which corresponds to the slope parameter of the linear regressor) between two appropriate values. We can imagine that the error will be minimum at an intermediate value of the weight and that it will increase for both lower values and higher values.

To modify the Synapse weight incrementally we attach a Linear Scheduler to the Synapse and place the Matrix Viewer on it so we can see how the weight is changing. To visualize the MSE we will bring another Scatter Plot to the L2 Criterion. This will allow us to plot the cost versus weight (performance surface).

NEURAL AND ADAPTIVE SYSTEMS00000037.gif

Run the example and see how the slope parameter of the linear PE affects the mean square error of the linear regressor. As we are going to see the input and desired signals tremendously affect the shape of the performance surface. But how can we change the shape of the performance curve without touching the data files? Let us substitute the L2 Criterion by the Lp Criterion. Go to Palettes and open the Error Criteria menu. Click on the Lp Criterion and bring the pointer to the Breadboard. Notice that the pointer changed to a stamper. If you left-click on the L2 Criterion component, the L2 is substituted by the Lp component, which computes a cost given by the p norm

NEURAL AND ADAPTIVE SYSTEMS00000038.gif

By default the norm is p =5. Run the simulation again. What do you see? Does the location of the minimum change appreciably? What about the shape of the performance surface? Do you now better understand the function of the cost criterion?

NeuroSolutions Example

Using the performance surface, we can develop a geometric method for finding the value of w, here denoted by w*, which minimizes the performance criterion. Previously (Eq. 1.5), we computed w* by setting to zero the derivative of J with respect to w.

The gradient of the performance surface is a vector (with the dimension of w) that always points toward the direction of maximum J change and with a magnitude equal to the slope of the tangent of the performance surface (Figure 1-8). If you visualize the performance surface as a hillside, each point on the hill will have a gradient arrow that points in the direction of steepest ascent at that point, with larger magnitudes for steeper slopes. A ball rolling down the hill will always attempt to roll in the direction opposite to the gradient arrow (the steepest descent). The slope at the bottom is zero, so the gradient is also zero (that is the reason the ball stops there).

In our special case the gradient has just one component along the weight axis w, NEURAL AND ADAPTIVE SYSTEMS00090001.gifJ = NEURAL AND ADAPTIVE SYSTEMS00090001.gifwJ given by

NEURAL AND ADAPTIVE SYSTEMS00000039.gif (1.10)

(see the gradient definition and construction). A graphical way to construct NEURAL AND ADAPTIVE SYSTEMS00090001.gifwJ at a point w0 is to first find the level curve (curve of constant J value) that passes through the point (also called the contour plot). Then take the tangent to the level curve at w0. The gradient component NEURAL AND ADAPTIVE SYSTEMS00090001.gifwJ is always perpendicular to the contour curve at w0, with a magnitude given by the partial derivative of J with respect to the weight w (Eq. 1.10). For one weight (one-dimensional problem), as in Figure 1-8 the construction is simplified and we have to only find the direction of the gradient on the axis.

NEURAL AND ADAPTIVE SYSTEMS00000040.gif

Figure 1-8 Performance surface and its gradient


At the bottom of the bowl, the gradient is zero, because the parabola has slope zero at the vertex. Thus, for a parabolic performance surface, computing the gradient and equating it to zero finds the value of the coefficients that minimize the cost, just as we did in Eq.1.6. The important observation is that the analytical solution found by the least squares coincides with the minimum of the performance surface. Substituting the value of w* into Eq.1.9, the minimum value of the error (Jmin) can be computed (properties of the performance surface).

NEUROSOLUTIONS EXAMPLE 1.5

Comparison of Performance Curves for Different Data Sets

In this example we provide two sets of input files and two sets of output files. By changing the input data, we find that the minimum error, its location in the weight space (a simple line for this 1D example), and the shape of the performance surface change. On the other hand, if we change the desired signal, only the minimum value of the performance and its location change, but the overall shape remains the same.

NeuroSolutions Example

1.4.4 Search of the Performance Surface with Steepest Descent

Since the performance surface is a paraboloid, which has a single minimum, an alternative procedure to find the best value of the coefficient w is to search the performance surface instead of computing the best coefficient analytically by Eq.1.6. The search for the minimum of a function can be done efficiently using a broad class of methods based on gradient information. The gradient has two main advantages for the search:

The gradient can be computed locally.

The gradient always points in the direction of maximum change.

If the goal is to reach the minimum, the search must be in the direction opposite to the gradient. Thus, the overall method of searching can be stated in the following way.

Start the search with an arbitrary initial weight w(0), where the iteration number is denoted by the index in parentheses. Then compute the gradient of the performance surface at w(0), and modify the initial weight proportionally to the negative of the gradient at w(0). This changes the operating point to w(1). Then compute the gradient at the new position w(1), and apply the same procedure again, that is,

NEURAL AND ADAPTIVE SYSTEMS00000041.gif (1.11)

where NEURAL AND ADAPTIVE SYSTEMS00090002.gif is a small constant and NEURAL AND ADAPTIVE SYSTEMS00090001.gifJ(k) denotes the gradient of the performance surface at the kth iteration. The constant NEURAL AND ADAPTIVE SYSTEMS00090002.gif is used to maintain stability in the search by ensuring that the operating point does not move too far along the performance surface. This search procedure is called the steepest descent method. Figure 1-9 illustrates the search procedure.

NEURAL AND ADAPTIVE SYSTEMS00000042.gif

Figure 1-9 The search using the gradient information


If we trace the path of the weights from iteration to iteration, intuitively we see that if the constant NEURAL AND ADAPTIVE SYSTEMS00090002.gif is small, eventually the best value for the coefficient w* will be found. Whenever w >w*, we decrease w, and whenever w <w*, we increase w.

Go to next section