7.13 文献笔记¶

交互验证的主要参考文献为Stone(1974)1，Stone(1977)2和Allen(1974)3。

AIC由Akaike(1973)4提出，而BIC由Schwarz（1978）5提出。

Madigan and Raftery（1994）6概述了贝叶斯模型选择。

MDL准则归功于Rissanen(1983)7。

Cover and Thomas（1991）8包含编码理论和复杂性的很好的描述。

VC维在Vapnik（1996）8中有描述。

Stone（1977）2证明了AIC和舍一交互验证渐进相等。

一般的交互验证由Golub et. al（1979）10和Wahba（1980）11描述。也可以参见Hastie and Tibshirani（1990）12的第三章。

自助法归功于Efron（1979）13；它的概述可以参见Efron and Tibshirani（1993）14。Efron（1983）15提出一系列预测误差的自助法估计，包括乐观估计和.632估计。

Efron（1986）16比较CV和GCV以及误差率的自助法估计。Breiman and Spector（1992)17，Breiman（1992）18，Shao（1996）19，Zhang（1993）20和Kohavi（1995）21等人研究了模型选择的交互验证和自助法。.632+估计由Efron and Tibshirani（1997）22提出。

Cherkassky and Ma(2003)23发表了回归中SRM用于模型选择的表现的研究，这对应本书的7.9.1节。他们抱怨我们对待SRM不公平，因为没有正确地应用它。我们的回复可以在期刊的同一个问题中找到（Hastie et. al（2003）24）。

1: Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society Series B 36: 111–147.
2(1,2): Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion, Journal of the Royal Statistical Society Series B. 39: 44–7.
3: Allen, D. (1974). The relationship between variable selection and data augmentation and a method of prediction, Technometrics 16: 125–7.
4: Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp. 267–281.
5: Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics 6(2): 461–464.
6: Madigan, D. and Raftery, A. (1994). Model selection and accounting for model uncertainty using Occam’s window, Journal of the American Statistical Association 89: 1535–46.
7: Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length, Annals of Statistics 11: 416–431.
8(1,2): Vapnik, V. (1996). The Nature of Statistical Learning Theory, Springer, New York.
10: Golub, G., Heath, M. and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21: 215–224.
11: Wahba, G. (1980). Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data, Proceedings of the International Conference on Approximation theory in Honour of George Lorenz, Academic Press, Austin, Texas, pp. 905–912.
12: Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models, Chapman and Hall, London.
13: Efron, B. (1979). Bootstrap methods: another look at the jackknife, Annals of Statistics 7: 1–26.
14: Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman and Hall, London.
15: Efron, B. (1983). Estimating the error rate of a prediction rule: some improvements on cross-validation, Journal of the American Statistical Association 78: 316–331.
16: Efron, B. (1986). How biased is the apparent error rate of a prediction rule?, Journal of the American Statistical Association 81: 461–70.
17: Breiman, L. and Spector, P. (1992). Submodel selection and evaluation in regression: the X-random case, International Statistical Review 60: 291–319.
18: Breiman, L. (1992). The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error, Journal of the American Statistical Association 87: 738–754.
19: Shao, J. (1996). Bootstrap model selection, Journal of the American Statistical Association 91: 655–665.
20: Zhang, P. (1993). Model selection via multifold cross-validation, Annals of Statistics 21: 299–311.
21: Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection, International Joint Conference on Artificial Intelligence (IJCAI), Morgan Kaufmann, pp. 1137–1143.
22: Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap: method, Journal of the American Statistical Association 92: 548–560.
23: Cherkassky, V. and Ma, Y. (2003). Comparison of model selection for regression, Neural computation 15(7): 1691–1714.
24: Hastie, T., Tibshirani, R. and Friedman, J. (2003). A note on “Comparison of model selection for regression” by Cherkassky and Ma, Neural computation 15(7): 1477–1480.

统计学习精要(中文)

7.13 文献笔记

7.13 文献笔记¶