[2] S. An, F. Boussaid, and M. Bennamoun. How Can Deep Rectifier Networks Achieve Linear Separability and Preserve Distances? In International Conference on Machine Learning, pages 514–523, June 2015. ISSN: 1938-7228 Section: Machine Learning.
[3] C. Anil, J. Lucas, and R. Grosse. Sorting Out Lipschitz Function Approximation. In International Conference on Machine Learning, pages 291–301, May 2019. ISSN: 1938-7228 Section: Machine Learning.
[4] P. Bartlett, S. Evans, and P. Long. Representing smooth functions as compositions of nearidentity functions with implications for deep network optimization. arXiv, 2018.
[5] P. L. Bartlett and M. H. Wegkamp. Classification with a Reject Option using a Hinge Loss. Journal of Machine Learning Research, 9(Aug):1823–1840, 2008.
[6] J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud, and J.-H. Jacobsen. Invertible Residual Networks. In International Conference on Machine Learning, pages 573–582, May 2019. ISSN: 1938-7228 Section: Machine Learning.
[7] A. Bendale and T. E. Boult. Towards Open Set Deep Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[8] J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics. Springer-Verlag, New York, 2 edition, 1985.
[9] A. Blum. Random Projection, Margins, Kernels, and Feature-Selection. In C. Saunders, M. Grobelnik, S. Gunn, and J. Shawe-Taylor, editors, Subspace, Latent Structure and Feature Selection, Lecture Notes in Computer Science, pages 52–68, Berlin, Heidelberg, 2006. Springer.
[10] J. Bradshaw, A. G. d. G. Matthews, and Z. Ghahramani. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks. arXiv:1707.02476 [stat], July 2017. arXiv: 1707.02476.
[11] J. Br ̈ ocker. Reliability, sufficiency, and the decomposition of proper scores. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 135(643):1512–1519, 2009.
[12] R. Calandra, J. Peters, C. E. Rasmussen, and M. P. Deisenroth. Manifold Gaussian Processes for regression. 2016 International Joint Conference on Neural Networks (IJCNN), 2016.
[13] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Canada, Aug. 2017. Association for Computational Linguistics.
[14] A. Chernodub and D. Nowicki. Norm-preserving Orthogonal Permutation Linear Unit Activation Functions (OPLU). arXiv:1604.02313 [cs], Jan. 2017. arXiv: 1604.02313.
[15] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning Non-Linear Combinations of Kernels. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 396–404. Curran Associates, Inc., 2009.
[16] J. Daunizeau. Semi-analytical approximations to statistical moments of sigmoid and softmax mappings of normal variables. Feb. 2017.
[17] G. P. Dehaene. A deterministic and computable Bernstein-von Mises theorem. ArXiv, 2019.
[18] J. S. Denker and Y. LeCun. Transforming Neural-Net Output Levels to Probability Distributions. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 853–859. Morgan-Kaufmann, 1991.
[19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs], Oct. 2018. arXiv: 1810.04805.
[20] L. Dinh, D. Krueger, and Y. Bengio. NICE: Non-linear Independent Components Estimation. arXiv:1410.8516 [cs], Oct. 2014. arXiv: 1410.8516.
[21] L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using Real NVP. arXiv:1605.08803 [cs, stat], May 2016. arXiv: 1605.08803.
[22] M. Dusenberry, G. Jerfel, Y. Wen, Y. Ma, J. Snoek, K. Heller, B. Lakshminarayanan, and D. Tran. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors. Proceedings of the International Conference on Machine Learning, 1, 2020.
[23] D. Feng, L. Rosenbaum, and K. Dietmayer. Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection. Apr. 2018.
[24] D. Freedman. Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters. The Annals of Statistics, 27(4):1119–1141, Aug. 1999.
[25] T. Gneiting, F. Balabdaoui, and A. E. Raftery. Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2):243–268, Apr. 2007.
[26] T. Gneiting and A. E. Raftery. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359–378, Mar. 2007.
[27] H. Gouk, E. Frank, B. Pfahringer, and M. Cree. Regularisation of Neural Networks by Enforcing Lipschitz Continuity. Apr. 2018.
[28] P. D. Gr ̃ AŒnwald and A. P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Annals of Statistics, 32(4):1367–1433, Aug. 2004. Publisher: Institute of Mathematical Statistics.
[29] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 5769–5779, Long Beach, California, USA, Dec. 2017. Curran Associates Inc.
[30] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. In International Conference on Machine Learning, pages 1321–1330, July 2017. ISSN: 1938-7228 Section: Machine Learning.
[31] D. Hafner, D. Tran, T. Lillicrap, A. Irpan, and J. Davidson. Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors. July 2018.
[32] R. E. Harang and E. M. Rudd. Principled Uncertainty Estimation for Deep Neural Networks, 2018. Library Catalog: www.semanticscholar.org.
[33] M. Hauser and A. Ray. Principles of Riemannian Geometry in Neural Networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2807–2816. Curran Associates, Inc., 2017.
[34] M. Hein, M. Andriushchenko, and J. Bitterwolf. Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem. pages 41–50, 2019.
[35] D. Hendrycks and T. Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. Sept. 2018.
[36] D. Hendrycks, K. Lee, and M. Mazeika. Using Pre-Training Can Improve Model Robustness and Uncertainty. In International Conference on Machine Learning, pages 2712–2721, May 2019. ISSN: 1938-7228 Section: Machine Learning.
[37] D. Hendrycks*, N. Mu*, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan. AugMix: A Simple Method to Improve Robustness and Uncertainty under Data Shift. In International Conference on Learning Representations, 2020.
[38] J.-H. Jacobsen, J. Behrmann, R. Zemel, and M. Bethge. Excessive Invariance Causes Adversarial Vulnerability. Sept. 2018.
[39] J.-H. Jacobsen, J. Behrmannn, N. Carlini, F. Tram ̃ Aˇ sr, and N. Papernot. Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness. Mar. 2019.
[40] r.-H. Jacobsen, A. W. M. Smeulders, and E. Oyallon. i-RevNet: Deep Invertible Networks. Feb. 2018.
[42] A. Kristiadi, M. Hein, and P. Hennig. Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks. arXiv:2002.10118 [cs, stat], Feb. 2020. arXiv: 2002.10118.
[43] B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6402–6413. Curran Associates, Inc., 2017.
[44] J. Landes. Probabilism, entropies and strictly proper scoring rules. International Journal of Approximate Reasoning, 63:1–21, Aug. 2015.
[45] S. Larson, A. Mahendran, J. J. Peper, C. Clarke, A. Lee, P. Hill, J. K. Kummerfeld, K. Leach, M. A. Laurenzano, L. Tang, and J. Mars. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction. arXiv:1909.02027 [cs], Sept. 2019. arXiv: 1909.02027.
[46] N. D. Lawrence and J. Q. Candela. Local Distance Preservation in the GP-LVM Through Back Constraints. Jan. 2006.
[47] L. LeCam. Convergence of Estimates Under Dimensionality Restrictions. The Annals of Statistics, 1(1):38–53, Jan. 1973.
[48] K. Lee, H. Lee, K. Lee, and J. Shin. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples. In International Conference on Learning Representations, 2018.
[49] D. Macedo, T. I. Ren, C. Zanchettin, A. L. I. Oliveira, A. Tapp, and T. Ludermir. Isotropic Maximization Loss and Entropic Score: Fast, Accurate, Scalable, Unexposed, Turnkey, and Native Neural Networks Out-of-Distribution Detection. arXiv:1908.05569 [cs, stat], Feb. 2020. arXiv: 1908.05569.
[50] D. J. C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448–472, May 1992. Number: 3 Publisher: MIT Press.
[51] A. Malinin and M. Gales. Predictive Uncertainty Estimation via Prior Networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7047–7058. Curran Associates, Inc., 2018.
[52] A. Malinin and M. Gales. Prior Networks for Detection of Adversarial Attacks. arXiv:1812.02575 [cs, stat], Dec. 2018. arXiv: 1812.02575.
[53] A. Meinke and M. Hein. Towards neural networks that provably know when they don’t know. In International Conference on Learning Representations, 2020.
[54] T. P. Minka. A family of algorithms for approximate bayesian inference. phd, Massachusetts Institute of Technology, USA, 2001. AAI0803033.
[55] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations, 2018.
[56] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
[57] J. Nixon, M. W. Dusenberry, L. Zhang, G. Jerfel, and D. Tran. Measuring calibration in deep learning. In CVPR Workshop, 2019.
[58] M. Panov and V. Spokoiny. Finite Sample Bernstein von Mises Theorem for Semiparametric Problems. Bayesian Analysis, 10(3):665–710, Sept. 2015.
[59] M. Parry, A. P. Dawid, and S. Lauritzen. Proper local scoring rules. Annals of Statistics, 40(1):561–592, Feb. 2012. Publisher: Institute of Mathematical Statistics.
[60] D. C. Perrault-Joncas. Metric Learning and Manifolds: Preserving the Intrinsic Geometry. 2017.
[61] A. Rahimi and B. Recht. Random Features for Large-Scale Kernel Machines. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1177–1184. Curran Associates, Inc., 2008.
[62] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. University Press Group Limited, Jan. 2006. Google-Books-ID: vWtwQgAACAAJ.
[63] C. Riquelme, G. Tucker, and J. Snoek. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. In International Conference on Learning Representations, 2018.
[64] H. Ritter, A. Botev, and D. Barber. A Scalable Laplace Approximation for Neural Networks. In International Conference on Learning Representations, 2018.
[65] F. Rousseau, L. Drumetz, and R. Fablet. Residual Networks as Flows of Diffeomorphisms. Journal of Mathematical Imaging and Vision, 62(3):365–375, Apr. 2020.
[66] W. Ruan, X. Huang, and M. Kwiatkowska. Reachability analysis of deep neural networks with provable guarantees. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, pages 2651–2659, Stockholm, Sweden, July 2018. AAAI Press.
[67] W. J. Scheirer, L. P. Jain, and T. E. Boult. Probability Models for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2317–2324, Nov. 2014. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
[68] M. O. Searcod. Metric Spaces. Springer London, London, 2007 edition edition, Aug. 2006.
[69] M. Sensoy, L. Kaplan, and M. Kandemir. Evidential Deep Learning to Quantify Classification Uncertainty. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 3179–3189. Curran Associates, Inc., 2018.
[70] L. Shu, H. Xu, and B. Liu. DOC: Deep Open Classification of Text Documents. arXiv:1709.08716 [cs], Sept. 2017. arXiv: 1709.08716.
[71] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. M. A. Patwary, Prabhat, and R. P. Adams. Scalable Bayesian Optimization Using Deep Neural Networks. arXiv:1502.05700 [stat], Feb. 2015. arXiv: 1502.05700.
[72] J. Sokolic, R. Giryes, G. Sapiro, and M. R. D. Rodrigues. Robust Large Margin Deep Neural Networks. IEEE Transactions on Signal Processing, 2017.
[74] L. Tierney, R. E. Kass, and J. B. Kadane. Approximate Marginal Densities of Nonlinear Functions. Biometrika, 76(3):425–433, 1989. Publisher: [Oxford University Press, Biometrika Trust].
[75] G.-L. Tran, E. V. Bonilla, J. Cunningham, P. Michiardi, and M. Filippone. Calibrating Deep Convolutional Gaussian Processes. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1554–1563, Apr. 2019. ISSN: 1938-7228 Section: Machine Learning.
[76] Y. Tsuzuku, I. Sato, and M. Sugiyama. Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6541–6550. Curran Associates, Inc., 2018.
[77] B. van Aken, J. Risch, R. Krestel, and A. Loser. Challenges for Toxic Comment Classification: An In-Depth Error Analysis. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 33–42, Brussels, Belgium, Oct. 2018. Association for Computational Linguistics.
[78] J. van Amersfoort, L. Smith, Y. W. Teh, and Y. Gal. Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network. arXiv:2003.02037 [cs, stat], Mar. 2020. arXiv: 2003.02037.
[79] N. Vedula, N. Lipka, P. Maneriker, and S. Parthasarathy. Towards Open Intent Discovery for Conversational Text. arXiv:1904.08524 [cs], Apr. 2019. arXiv: 1904.08524.
[80] Y. Wen, D. Tran, and J. Ba. BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning. In International Conference on Learning Representations, 2020.
[81] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh, and L. Daniel. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. In International Conference on Learning Representations, 2018.
[82] A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing. Stochastic Variational Deep Kernel Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 2594–2602, USA, 2016. Curran Associates Inc.
[83] M.-A. Yaghoub-Zadeh-Fard, B. Benatallah, F. Casati, M. Chai Barukh, and S. Zamanirad. User Utterance Acquisition for Training Task-Oriented Bots: A Review of Challenges, Techniques and Opportunities. IEEE Internet Computing, pages 1–1, 2020. Conference Name: IEEE Internet Computing.
[84] S. Zagoruyko and N. Komodakis. Wide Residual Networks. arXiv:1605.07146 [cs], June 2017. arXiv: 1605.07146.
[85] Y. Zheng, G. Chen, and M. Huang. Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1198–1209, 2020. Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing