Authors: Utans, Joachim; Moody, John
Source: Proceedings of the First International Conference on Artificial Intelligence Applications on Wall Street, IEEE Computer Society Press, Los Alamitos, CA, 1991
Click to receive a postscript copy of the paper via FTP
The notion of generalization can be defined precisely as the prediction risk, the expected performance of an estimator on new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select the optimal network architecture. The prediction risk must be estimated from the available data; here we approximate the prediction risk by v-fold cross-validation and asymtotic estimates of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of complete a priori information that could be used to impose a structure to the network architecture.