Correlation Coefficient and Linear Regression

Consider estimating a random variable d by a constant b according to the mean square estimation,

NEURAL AND ADAPTIVE SYSTEMS00000123.gif

where E[.] is the expected value operator (see Appendix A). The best b is obtained by taking the derivative with respect to b and setting the result to zero, yielding b* = E[d]. If we substitute this value in the equation, we obtain the variance of d as the smallest error.

Now if we try to approximate d by wx + b we obtain

NEURAL AND ADAPTIVE SYSTEMS00000124.gif

and the best b* = E[d]-wE[x]. Therefore the best w can be found solving the problem

NEURAL AND ADAPTIVE SYSTEMS00000125.gif

It is easy to show by differentiation that the best w*

NEURAL AND ADAPTIVE SYSTEMS00000126.gif (1B.4)

Therefore the minimum mean square error linear estimator for d is

NEURAL AND ADAPTIVE SYSTEMS00000127.gif (1B.5)

The term in parentheses is a zero-mean unit variance version of the input x, so the product with NEURAL AND ADAPTIVE SYSTEMS00090014.gif scales it by the variance of d. The term E[d] just guarantees the correct mean.

It is interesting to note that if d and x are uncorrelated, the best estimate of d is its mean. However, if x and d are exactly correlated, the best estimate is highly improved (NEURAL AND ADAPTIVE SYSTEMS00000128.gif in Eq. 1B.5). The minimum mean square error is

NEURAL AND ADAPTIVE SYSTEMS00000129.gif

This equation shows that in fact NEURAL AND ADAPTIVE SYSTEMS00000130.gif can be interpreted as the amount of variance in the data that is captured by the linear model.

There is a very interesting interpretation of the mean square estimation solution. Note that

NEURAL AND ADAPTIVE SYSTEMS00000131.gif (1B.6)

which means that the error (the quantity inside the curly braes) is orthogonal to the input.

Use your browser's back button to return to text.