🔥 神经网络中的不确定性研究综述
【摘 要】 在过去十年中,神经网络几乎触及了每一个科学领域,并成为各种现实世界应用的关键部分。由于越来越多的传播和使用,人们对神经网络预测结果的信心也变得越来越重要。但基础的神经网络要么无法提供不确定性估计,要么存在过于自信或信心不足的问题。为了克服这个问题,许多研究人员致力于理解和量化神经网络中的预测不确定性。前人已经确定了不同类型和来源的不确定性,并提出了各种估计和量化神经网络中不确定性的方法。本文全面概述了神经网络中的不确定性估计,回顾了该领域的最新进展,突出了当前的挑战,并确定了潜在的研究机会。它旨在为任何对神经网络中的不确定性估计感兴趣的人提供一个宽泛的概述和介绍,而不预先假定读者具备该领域的先验知识。为此,论文首先对不确定性来源这一关键因素进行了全面介绍,并将其分为(可还原的) 模型不确定性 和(不可还原的) 数据不确定性 。介绍了基于单一确定性神经网络
、贝叶斯神经网络
、神经网络集成
、测试时数据增强
四种不确定性的建模方法,讨论了这些领域的不同分支及最新发展。在实际应用方面,我们讨论了各种不确定性的测量方法,以及神经网络的校准方法,概述了现有基线和可用成果。来自医学图像分析、机器人、地球观测等领域的不同例子,引申出了神经网络实际应用对不确定性的需求和挑战。此外,本文还讨论了用于任务和安全等关键现实世界应用神经网络中不确定性估计方法的实际局限性,并展望了更广泛使用此类方法的下一步发展方向。
【原 文】 Gawlikowski, Jakob, et al. “A survey of uncertainty in deep neural networks.” arxiv preprint arXiv:2107.03342 (2021).
【阅后感】 本文比较全面地回顾了与神经网络的预测不确定性相关的话题:从不确定性的来源与分类开始,到估计不确定性的方法,再到不确定性的量化评估和校准方法,内容全面而精炼,值得深入阅读。其中个人觉得值得阅读的有几点:(1)系统梳理了不确定性来源和类型,这在之前的文章中几乎都是一笔带过的,但这对正确理解不确定性具有提纲挈领的作用;(2)不确定性的深度神经网络估计方法更为全面,之前的综述主要围绕贝叶斯神经网络方法展开,少量提到了其他方法(如:深度集成),而本文更系统地将估计方法分为四类:单一确定性神经网络、深度集成网络、贝叶斯神经网络、测试时数据增强,梳理地更全面;(3)在最近几年比较热门的不确定性计量和校准方面涉及的内容比较多,这是之前文章所缺乏的;(4)将最近一两年有关不确定性测试基线的最新进展也囊括进来了,反映出作者在这个方面上跟踪很及时;(5) 重要的是,作者来自空间数据科学领域。
1 概述
在过去十年中,深度神经网络 (DNN)
取得了巨大进步,促进其应用于各种研究领域,其中包括对复杂系统进行建模或理解,例如地球观测、医学图像分析或机器人技术。尽管深度神经网络在医学图像分析 [1][2][3][4][5][6] 、自动车辆控制 [7][8][9][10] 等高风险领域很有吸引力,但在注重任务安全问题的现实应用中部署仍然有限。造成这种限制的主要因素是:
- 深度神经网络的推断模型`缺乏表现力和透明度`,使得人们很难相信他们的结果 [2]
- 无法区分`分布内样本`和`分布外样本` [11][12] ,对`分布偏移`比较敏感 [13]
- 缺少为深度神经网络的决策 [14] 和过度自信的预测 [15][16] 提供可靠不确定性估计的方法
- 对`对抗性样本`不敏感,使深度神经网络更容易受到破坏 [17][18][19]
上述因素主要源于数据中已经包含的不确定性( **数据不确定性** )或缺乏对神经网络的了解( **模型不确定性** )。为了克服这些限制,提供不确定性估计是必不可少的,以便`预测不确定性`可以被忽略或传递给人类专家来做出相应决策 [20]。
预测不确定性 在本文中为一个独立、完整的术语,中文理解为 “预测结果的不确定性” 或 “预测输出的不确定性”,英文翻译为 “Uncertainty of Prediction”,专指某个机器学习模型的预测输出中存在的不确定性,也是本文讨论的主要对象。获得预测不确定性的方法比较多( 见 第 3 节
),但从贝叶斯视角来看可以划分为贝叶斯方法和非贝叶斯方法,前者的主要特点是先得到模型参数的后验分布,而后通过边缘化得到预测;而后者会采用各种方法直接估计预测输出的不确定性,例如目前 SOTA 的深度集成方法对不同子模型的输出做平均后得出预测,还有些方法假设预测输出服从某种分布,而后通过最优化计算得到该分布的参数。
不过 Wilson 等
在一篇 博客 中也提出,深度集成方法不仅不是非贝叶斯方法,反而恰恰说明了贝叶斯方法的有效性,因为贝叶斯方法的最大特点就是边缘化而非后验分布,而深度集成方法恰恰是在对可能的模型做边缘化,以得到更为可靠的预测。
模型引起的不确定性 也被称为 模型不确定性 或 认知不确定性 。
数据引起的不确定性 也被称为 数据不确定性 、偶然不确定性 或 任意不确定性。
注:
熟悉贝叶斯统计的人通常会将推断理解为对模型参数的推断,但本文中更多指对预测结果的推断。
个人理解:
分布偏移可以被理解为采集或观测的训练数据与现实情况之间由于时间等因素发生了偏差,根据不全面的训练数据得到的模型无法完整体现现实世界。
式 (7)
实际上对应了贝叶斯方法,即先推断模型参数的后验分布,而后通过边缘化求得预测分布,该预测分布中包含了所有不确定性,而 式 (8)
是其中确定性最高的那套参数的点估计预测结果。 式 (9)
和 式 (10)
则代表了非贝叶斯方法,此类方法期望通过样本直接得到预测分布和点估计。
《安全的人工智能需要贝叶斯深度学习》 一文提出,在多任务学习中存在另外一种本文没有提到的数据不确定,即在相同输入数据集情况下,由于任务不同而导致的不确定性,并因此将数据不确定性划分为两类:
异质的不确定性(或数据依赖型不确定性):此类不确定性依赖于输入数据,且会被传导至模型的预测结果中。
同质的不确定性(或任务依赖型不确定性):此类不确定性不依赖于输入数据,而依赖于具体任务。不同任务之间的此类不确定性存在不同,但对于同一任务中的所有输入数据而言,此类不确定性是一个常量。任务依赖型不确定性不会被传导至模型输出,但可用于描述和任务有关不确定性,例如,该文作者建议将任务不确定性的估计结果用于多任务加权损失函数的构造。
因为 模型不确定性 可以捕获由于缺乏知识而导致的不确定性内容,因此可以用来捕获域内、域偏移和域外不确定性。与此相对,数据不确定性 仅捕获由训练数据本身引起的不确定性,因此只能用来捕获域内不确定性,例如重叠样本、系统性标签的噪声等。
> **表 1 :本文涉及的四种方法一览表**(`贝叶斯神经网络`、`集成方法`、`单一确定性神经网络`、`测试时数据增强`)。表中的 `High/Low` 标签是相对于其他方法并基于其背后的思想给出的。 ![](https://xishansnowblog.oss-cn-beijing.aliyuncs.com/images/images/bayes_20220323_132337_6bbe.webp) ### 3.1 单一确定性神经网络方法 确定性神经网络的参数是确定性的,同一输入的重复前向传递会提供相同的结果。我们梳理和总结了 “基于单一确定性神经网络的前向传递来计算 $y^\ast$ 的预测不确定性” 的所有方法。
(1)确定性
是相对于 随机性
而言的。在确定性神经网络中,权重被视为未知的确切值;(2)单一性
主要是相对于集成方法的多个神经网络而言的,指仅存在一个神经网络模型。
疑惑: 以下两个英文有何不同?
真实分布 ( true distribution )
是指什么? 预测分布(predictive distribution)
和 预测分布(predicted distribution)
又是指什么?
经反复理解并查阅原文文献后,此处的“预测分布( predicted distribution
)”,与传统贝叶斯方法中的 “预测分布(predictive distribution
)” 含义相同。在后者中,会通过对模型参数的后验分布做边缘化,以求解响应变量的概率分布,即预测分布;更进一步地基于该预测分布求期望值作为点估计。而此处的思想是不对模型参数建模,而是直接对神经网络的预测分布做参数化近似建模( 有点像施加在输出变量上的变分推断 ),通过训练求解近似分布的最优参数,进而用近似分布代替预测分布求解期望值作为点估计。由于该方法没有对模型参数建模,其神经网络权重是确定性的而非随机性的,因此是非贝叶斯的。
之所以这样说,是因为在贝叶斯方法中,只将不确定性分为 数据不确定性
和 模型不确定性
,而分布不确定性
被视为模型不确定性的一部分。因此,如果仅笼统地对模型不确定性进行估计,其结果往往会给分布外样本
带来过高地预测置信度。
注:这也是该文的主要出发点之一,即有效的分布外样本检测能力。
此处解释似乎有些问题:
Dirichlet 分布
是连续型随机变量的概率密度,而 类别分布(Catelogorical 分布)
是离散型随机变量的概率质量,当某个模型的似然为类别分布时,Dirichlet 分布为其共轭先验,因此可以得到后验的封闭形式解
。此处实际上是将确定性神经网络的输出视为服从 类别分布
的随机向量,而将 Dirichlet 分布
视为模型参数,通过贝叶斯方法在训练数据基础上推断得到其后验分布后,通过边缘化最终得出似然的概率分布。
准确性(accuracy)
的定义参见机器学习书籍。校准(calibration)
的定义参见 第 5 节 对预测置信度进行校准
。
在随机变分推断方法中,小批量随机梯度估计存在方差较大的问题,为提升梯度下降速度,需要开发出减少方差的方法,其中比较典型的有:打分函数估计器
和 重参数化技巧
。参见 《贝叶斯神经网络技术浅析》的第 4.2 节
几个容易让人混淆的术语:
不确定性估计(Estimation of Uncertainty)
:指不确定性估计的过程或方法,如:贝叶斯方法(MCMC、变分推断等)、集成方法等。
不确定性估计结果(the Uncertainty Estimates)
:指不确定估计过程或方法给出的结果,可以用后验预测图的形式表达(见下图)。
不确定性测度(Uncertainty Measures)
:指用于定量刻画不确定性构成和大小的测度指标,测度值的大小反映了不确定性的大小。如:互信息、熵、预测方差等,主要目的是方便用户对单点预测结果中的不确定性进行量化分析。
不确定性估计的质量(Quality of the Uncertainty Estimates)
:指通过对不确定性估计结果的分析,来辨识不同不确定性估计方法的优劣。理论上需要和不确定性的真实值做对比(即比较不确定性估计结果
与 不确定真实值
,见下图),但实践中无法得到不确定性的真实值(这也是实践中的困难之一),因此,通常使用达到收敛状态的蒙特卡洛估计结果作为真实值的近似。
下图说明了设计 不确定性测度
新指标的重要性:
不同估计方法的后验预测图,蓝色区域为两倍标准差的不确定性区间。(1)收敛并混合良好的 HMC 后验预测作为基线(即作为真实值的替代),注意中间缺乏样本的区间不确定性明显增加,符合真实预期;(2)PBP、MVG 和 BBH 等其他方法产生的后验预测分布,明显在缺乏样本的区间都错误地表现出了较低的方差,表明在此处出现了过度自信( 或者说不确定性估计结果不准确 );(3)后三种方法的对数似然都与 HMC 方法相当,说明依赖样本点的传统测度指标( 分类中的对数似然、回归中的 RMSE 等 ),无法有效地评价不确定性估计的质量,需要设计新的测度指标。
graph LR 不确定性测度-->分类任务 不确定性测度-->回归任务 不确定性测度-->分割任务 分类任务-->数据不确定性 A 分类任务-->模型不确定性 A 分类任务-->分布不确定性 A 分类任务-->多样本不确定性 A 回归任务-->数据不确定性 B 回归任务-->模型不确定性 B 回归任务-->分布不确定性 B 回归任务-->多样本不确定性 B 数据不确定性 A-->最大概率 数据不确定性 A-->熵 模型不确定性 A-->互信息 模型不确定性 A-->预期 KL 散度 模型不确定性 A-->预测方差
此处数据不确定性面向单数据点,实际上是基于 softmax
函数输出的类别分布计算了测度指标,可以用一个标量来表达该数据点上的预测不确定性水平。
可以先熟悉机器学习中的常见质量评价指标,如:准确率(Accuracy)、错误率(Error rate)、精确率(Precision)、召回率(recall)、综合评价指标(F-Measure)等,以及常见图形:ROC 曲线、PR 曲线等。
准确率(Accuracy) 为 被正确预测的总样本数(TP+TN)
除以 所有样本数量(TP+FP+TN+FN)
,该指标在样本不平衡时可能不准确。
真阳性率(True Positive Rate, TPR) 为 被正确预测为阳性的样本数(TP)
除以 真实阳性样本的总数(TP+FN)
,通常表明对阳性样本的敏感性,即如果病人是阳性的,那么检出率大概为多少;
假阳性率(False Positive Rate, FPR) 为 被错误预测为阳性的样本数(FP)
除以 真实阴性样本的总数(TN+FP)
,通常表明阴性样本的特异性,即如果病人是阴性的,那么被错检的概率大概为多少;
精度(Precision) 为 被正确预测为阳性的样本数(TP)
除以 被预测为阳性的样本总数(TP+FP)
,又称查准率,衡量的是预测的信噪比;
召回率(Recall) 为 被正确预测为阳性的样本数(TP)
除以 真实阳性样本的总数(TP+FN)
,又称查全率,衡量的是预测的覆盖率;
F-Score 为综合指标,定义为 $(1+ \beta^2)\frac{Recall \cdot Precision}{Recall+ \beta^2 \cdot Precision}$, 当 $\beta=1$ 时, 被称为 F1-Score
,此时精度和召回率都很重要,权重相当。如果认为精度更重要一些,可以设置 $\beta < 1$ ,如果认为召回率更重要一些,可以设置 $\beta > 1$。
上述指标的真实含义均与正例的定义有关,例如:定义垃圾邮件为正,则正常邮件为负,此时查准率可能意味着是否需要到垃圾邮件箱
中再人工筛查一遍,以免遗漏正常邮件,而查全率则可能意味着收件箱
中垃圾邮件的多少。
不同的应用,关心的指标也不一样。例如:
- 在肿瘤判断和地震预测场景中,要求高的召回率,即有肿瘤或有地震尽量都与测出来;
- 在垃圾邮件检测场景中,要求高的精度,即放进
垃圾邮件箱
中的,尽量不要有正常邮件。
指标计算案例:
癌症检查数据样本有 $10000$ 个,其中 $10$ 个祥本真有癌症,其它无癌症。假设分类模型在 $9990$ 个无癌症数据中预测正确了 $9980$ 个,在 $10$ 个癌症数据中预测正确了 $9$ 个,此时真阳 = $9$ ,真阴 = $9980$ ,假阳 = $10$ ,假阴 = $1$。
那么:
- $Accuracy = (9+9980) /10000=99.89%$
- $Precision=9 / (9+10)= 47.36%$
- $Recall = 9/(9+1) = 90%$
- $F1-score=2×(47.36% × 90%)/(1 × 47.36% + 90%)=62.07%$
- $F2-score=5× (47.36% × 90%)/(4×47.36%+90%)=76. 27%$
上一节介绍的各种测度指标,大多是将预测结果的期望值作为假想真实值而得到的统计量,但该统计量是否具备真实的意义,则需要校准来保证。例如:某个预测器以预测方差作为不确定性的测度,但其给出的方差值却与真实的不确定性之间不匹配,那么该测度就是没有校准的;没有校准的不确定性测度从某种程度上来说等于没有得到预测不确定性
,有时甚至还会对后续任务产生误导。
(1)校准误差主要来自模型不确定性;
(2)大多数估计不确定性的方法,都会为模型不确定性建模,因而或多或少内含校准能力;
(3)实践表明,内含校准能力的估计方法不足以消除校准误差,尤其是深层神经网络,因此存在显式校准误差的必要性;
(4)校准并没有减少模型不确定性,只是对模型不确定性延伸至预测不确定性中的部分进行了标校。
(1)正则化方法本质上属于单一确定性神经网络方法的一种;
(2)其基本思想是在训练阶段,通过正则化将模型不确定性降到最低;
(3)此类方法在预测阶段无法单独计算模型不确定性,因为其假设是模型不确定性已经通过正则化被消除了。
## 参考文献
- [1] T. Nair, D. Precup, D. L. Arnold, and T. Arbel, “Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation,” Medical image analysis, vol. 59, p. 101557, 2020.
- [2] A. G. Roy, S. Conjeti, N. Navab, C. Wachinger, A. D. N. Initiative et al., “Bayesian quicknat: Model uncertainty in deep whole-brain segmentation for structure-wise quality control,” NeuroImage, vol. 195, pp. 11–22, 2019.
- [3] P. Seeb ̈ock, J. I. Orlando, T. Schlegl, S. M. Waldstein, H. Bogunovi ́c, S. Klimscha, G. Langs, and U. Schmidt-Erfurth, “Exploiting epistemic uncertainty of anatomy segmentation for anomaly detection in retinal oct,” IEEE transactions on medical imaging, vol. 39, no. 1, pp. 87–98, 2019.
- [4] T. LaBonte, C. Martinez, and S. A. Roberts, “We know where we don’t know: 3d bayesian cnns for credible geometric uncertainty,” arxiv preprint arXiv:1910.10793, 2019.
- [5] J. C. Reinhold, Y. He, S. Han, Y. Chen, D. Gao, J. Lee, J. L. Prince, and A. Carass, “Validating uncertainty in medical image translation,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 95–98.
- [6] S. Eggenreich, C. Payer, M. Urschler, and D. ˇStern, “Variational inference and bayesian cnns for uncertainty estimation in multi-factorial bone age prediction,” arxiv preprint arXiv:2002.10819, 2020.
- [7] D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 3266–3273. https://github.com/google/uncertainty-baselines .
- [8] J. Choi, D. Chun, H. Kim, and H.-J. Lee, “Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 502–511.
- [9] A. Amini, A. Soleimany, S. Karaman, and D. Rus, “Spatial uncertainty sampling for end-to-end control,” arxiv preprint arXiv:1805.04829, 2018.
- [10] A. Loquercio, M. Segu, and D. Scaramuzza, “A general framework for uncertainty estimation in deep learning,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3153–3160, 2020.
- [11] K. Lee, H. Lee, K. Lee, and J. Shin, “Training confidence-calibrated classifiers for detecting out-of-distribution samples,” in International Conference on Learning Representations, 2018.
- [12] J. Mitros and B. Mac Namee, “On the validity of bayesian neural networks for uncertainty estimation,” arxiv preprint arXiv:1912.01530, 2019.
- [13] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,” in Advances in Neural Information Processing Systems, 2019, pp. 13 991– 14 002.
- [14] M. S. Ayhan and P. Berens, “Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks,” in Medical Imaging with Deep Learning Conference, 2018.
- [15] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 1321–1330.
- [16] A. G. Wilson and P. Izmailov, “Bayesian deep learning and a probabilistic perspective of generalization,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33, 2020, pp. 4697–4708.
- [17] M. Rawat, M. Wistuba, and M.-I. Nicolae, “Harnessing model uncertainty for detecting adversarial examples,” in NIPS Workshop on Bayesian Deep Learning, 2017.
- [18] A. C. Serban, E. Poll, and J. Visser, “Adversarial examples-a complete characterisation of the phenomenon,” arxiv preprint arXiv:1810.01185, 2018.
- [19] L. Smith and Y. Gal, “Understanding measures of uncertainty for adversarial example detection,” in Proceesings of the Conference on Uncertainty in Artificial Intelligence, 2018, pp. 560–569.
- [20] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning, 2016, pp. 1050–1059.
- [21] M. Rußwurm, S. M. Ali, X. X. Zhu, Y. Gal, and M. K ̈orner, “Model and data uncertainty for satellite time series forecasting with deep recurrent models,” in 2020 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2020.
- [22] J. Gawlikowski, S. Saha, A. Kruspe, and X. X. Zhu, “Out-of-distribution detection in satellite image classification,” in RobustML workshop at ICLR 2021. ICRL, 2021, pp. 1–5.
- [23] Y. Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” in International Conference on Machine Learning. PMLR, 2017, pp. 1183–1192.
- [24] K. Chitta, J. M. Alvarez, and A. Lesnikowski, “Large-scale visual active learning with deep probabilistic ensembles,” arxiv preprint arXiv:1811.03575, 2018.
- [25] J. Zeng, A. Lesnikowski, and J. M. Alvarez, “The relevance of bayesian layer positioning to model uncertainty in deep bayesian active learning,” arxiv preprint arXiv:1811.12535, 2018.
- [26] V.-L. Nguyen, S. Destercke, and E. H ̈ullermeier, “Epistemic uncertainty sampling,” in International Conference on Discovery Science. Springer, 2019, pp. 72–86.
- [27] W. Huang, J. Zhang, and K. Huang, “Bootstrap estimated uncertainty of the environment model for model-based reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3870–3877.
- [28] G. Kahn, A. Villaflor, V. Pong, P. Abbeel, and S. Levine, “Uncertainty-aware reinforcement learning for collision avoidance,” arxiv preprint arXiv:1702.01182, 2017.
- [29] B. Lotjens, M. Everett, and J. P. How, “Safe reinforcement learning with model uncertainty estimates,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 8662–8668.
- [30] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” in Proceedings of the 32nd International Conference on International Conference on Machine LearningVolume 37, 2015, pp. 1613–1622.
- [31] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Advances in neural information processing systems, 2017, pp. 6402–6413.
- [32] A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,” in Advances in Neural Information Processing Systems, 2018, pp. 7047–7058.
- [33] X. Zhao, Y. Ou, L. Kaplan, F. Chen, and J.-H. Cho, “Quantifying classification uncertainty using regularized evidential neural networks,” arxiv preprint arXiv:1910.06864, 2019.
- [34] Q. Wu, H. Li, W. Su, L. Li, and Z. Yu, “Quantifying intrinsic uncertainty in classification via deep dirichlet mixture networks,” arxiv preprint arXiv:1906.04450, 2019.
- [35] J. Van Amersfoort, L. Smith, Y. W. Teh, and Y. Gal, “Uncertainty estimation using a single deep deterministic neural network,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020, pp. 9690–9700.
- [36] T. Ramalho and M. Miranda, “Density estimation in representation space to predict model uncertainty,” in Engineering Dependable and Secure Machine Learning Systems: Third International Workshop, EDSMLS 2020, New York City, NY, USA, February 7, 2020, Revised Selected Papers, vol. 1272. Springer Nature, 2020, p. 84.
- [37] A. Mobiny, H. V. Nguyen, S. Moulik, N. Garg, and C. C. Wu, “Dropconnect is effective in modeling uncertainty of bayesian deep networks,” arxiv preprint arXiv:1906.04569, 2019.
- [38] D. Krueger, C.-W. Huang, R. Islam, R. Turner, A. Lacoste, and A. Courville, “Bayesian hypernetworks,” arxiv preprint arXiv:1710.04759, 2017.
- [39] M. Valdenegro-Toro, “Deep sub-ensembles for fast uncertainty estimation in image classification,” in Bayesian Deep Learning Workshop at Neural Information Processing Systems 2019, 2019.
- [40] Y. Wen, D. Tran, and J. Ba, “Batchensemble: an alternative approach to efficient ensemble and lifelong learning,” in 8th International Conference on Learning Representations, 2020.
- [41] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019.
- [42] Q. Wen, L. Sun, X. Song, J. Gao, X. Wang, and H. Xu, “Time series data augmentation for deep learning: A survey,” arxiv preprint arXiv:2002.12478, 2020.
- [43] T. Tsiligkaridis, “Information robust dirichlet networks for predictive uncertainty estimation,” arxiv preprint arXiv:1910.04819, 2019.
- [44] M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,” in Advances in Neural Information Processing Systems, 2018, pp. 3179–3189.
- [45] A. Malinin, B. Mlodozeniec, and M. Gales, “Ensemble distribution distillation,” in 8th International Conference on Learning Representations, 2020.
- [46] M. Raghu, K. Blumer, R. Sayres, Z. Obermeyer, B. Kleinberg, S. Mullainathan, and J. Kleinberg, “Direct uncertainty prediction for medical second opinions,” in International Conference on Machine Learning. PMLR, 2019, pp. 5281–5290.
- [47] J. Wenger, H. Kjellstr ̈om, and R. Triebel, “Non-parametric calibration for classification,” in International Conference on Artificial Intelligence and Statistics, 2020, pp. 178–190.
- [48] J. Zhang, B. Kailkhura, and T. Y.-J. Han, “Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 11 117–11 128.
- [49] R. Ghanem, D. Higdon, and H. Owhadi, Handbook of uncertainty quantification. Springer, 2017, vol. 6.
- [50] Y. Gal, “Uncertainty in deep learning,” Ph.D. dissertation, University of Cambridge, 2016.
- [51] A. G. Kendall, “Geometry and uncertainty in deep learning for computer vision,” Ph.D. dissertation, University of Cambridge, 2019.
- [52] A. Malinin, “Uncertainty estimation in deep learning with application to spoken language assessment,” Ph.D. dissertation, University of Cambridge, 2019.
- [53] H. Wang and D.-Y. Yeung, “Towards bayesian deep learning: A framework and some xiting methods,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3395–3408, 2016.
- [54] ——, “A survey on bayesian deep learning,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–37, 2020.
- [55] N. St ̊ahl, G. Falkman, A. Karlsson, and G. Mathiason, “Evaluation of uncertainty quantification in deep learning,” in Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer International Publishing, 2020, pp. 556–568.
- [56] F. K. Gustafsson, M. Danelljan, and T. B. Schon, “Evaluating scalable bayesian deep learning methods for robust computer vision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 318–319.
- [57] E. H ̈ullermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods,” Machine Learning, vol. 110, no. 3, pp. 457–506, 2021.
- [58] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya et al., “A review of uncertainty quantification in deep learning: Techniques, applications and challenges,” Information Fusion, 2021.
- [59] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner et al., “Relational inductive biases, deep learning, and graph networks,” arxiv preprint arXiv:1806.01261, 2018.
- [60] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Advances in neural information processing systems, 2017, pp. 5574–5584.
- [61] Y. Gal and Z. Ghahramani, “Bayesian convolutional neural networks with bernoulli apprxiate variational inference,” arxiv preprint arXiv:1506.02158, 2015.
- [62] C. Bishop, Pattern Recognition and Machine Learning. Springer Verlag New York, 2006.
- [63] H. Ritter, A. Botev, and D. Barber, “A scalable laplace approximation for neural networks,” in 6th International Conference on Learning Representations, vol. 6. International Conference on Representation Learning, 2018.
- [64] J. Nandy, W. Hsu, and M. L. Lee, “Towards maxizing the representation gap between in-domain; out-of-distribution examples,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., 2020, pp. 9239–9250.
- [65] A. Ashukha, A. Lyzhov, D. Molchanov, and D. Vetrov, “Pitfalls of in-domain uncertainty estimation and ensembling in deep learning,” in International Conference on Learning Representations, 2020.
- [66] T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arxiv preprint arXiv:1708.04552, 2017.
- [67] D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in 5th International Conference on Learning Representations, 2017.
- [68] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” in 6th International Conference on Learning Representations, 2018.
- [69] A. Shafaei, M. Schmidt, and J. J. Little, “A less biased evaluation of out-of-distribution sample detectors,” in British Machine Vision Conference 2019, 2019.
- [70] M. Mundt, I. Pliushch, S. Majumder, and V. Ramesh, “Open set recognition through deep neural network uncertainty: Does out-of-distribution detection require generative classifiers?” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
- [71] P. Oberdiek, M. Rottmann, and H. Gottschalk, “Classification uncertainty of deep neural networks based on gradient information,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Springer, 2018, pp. 113–125.
- [72] J. Lee and G. AlRegib, “Gradients as a measure of uncertainty in neural networks,” in 2020 IEEE International Conference on Image Processing. IEEE, 2020, pp. 2416–2420.
- [73] G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory, 1993, pp. 5–13.
- [74] D. Barber and C. M. Bishop, “Ensemble learning in bayesian neural networks,” Nato ASI Series F Computer and Systems Sciences, vol. 168, pp. 215–238, 1998.
- [75] A. Graves, “Practical variational inference for neural networks,” in Advances in neural information processing systems, 2011, pp. 2348– 2356.
- [76] C. Louizos, K. Ullrich, and M. Welling, “Bayesian compression for deep learning,” in Advances in neural information processing systems, 2017, pp. 3288–3298.
- [77] D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International Conference on Machine Learning, 2015, pp.1530–1538.
- [78] R. M. Neal, “Bayesian training of backpropagation networks by the hybrid monte carlo method,” Citeseer, Tech. Rep., 1992.
- [79] ——, “An improved acceptance procedure for the hybrid monte carlo algorithm,” Journal of Computational Physics, vol. 111, no. 1, pp. 194– 203, 1994.
- [80] ——, “Bayesian learning for neural networks,” Ph.D. dissertation, University of Toronto, 1995.
- [81] M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient langevin dynamics,” in Proceedings of the 28th international conference on machine learning, 2011, pp. 681–688.
- [82] C. Nemeth and P. Fearnhead, “Stochastic gradient markov chain monte carlo,” Journal of the American Statistical Association, pp. 1–18, 2020.
- [83] T. Salimans and D. P. Kingma, “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks,” in Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 901–909.
- [84] J. Lee, M. Humt, J. Feng, and R. Triebel, “Estimating model uncertainty of neural networks in sparse information form,” in International Conference on Machine Learning. PMLR, 2020, pp. 5702–5713.
- [85] O. Achrack, O. Barzilay, and R. Kellerman, “Multi-loss sub-ensembles for accurate classification with uncertainty estimation,” arxiv preprint arXiv:2010.01917, 2020.
- [86] G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, “Snapshot ensembles: Train 1, get m for free,” in International conference on learning representations, 2017.
- [87] G. D. Cavalcanti, L. S. Oliveira, T. J. Moura, and G. V. Carvalho, “Combining diversity measures for ensemble pruning,” Pattern Recognition Letters, vol. 74, pp. 38–45, 2016.
- [88] H. Guo, H. Liu, R. Li, C. Wu, Y. Guo, and M. Xu, “Margin & diversity based ordering ensemble pruning,” Neurocomputing, vol. 275, pp. 237– 246, 2018.
- [89] W. G. Martinez, “Ensemble pruning via quadratic margin maxization,” IEEE Access, vol. 9, pp. 48 931–48 951, 2021.
- [90] J. Lindqvist, A. Olmin, F. Lindsten, and L. Svensson, “A general framework for ensemble distribution distillation,” in 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2020, pp. 1–6.
- [91] D. Molchanov, A. Lyzhov, Y. Molchanova, A. Ashukha, and D. Vetrov, “Greedy policy search: A simple baseline for learnable test-time augmentation,” arxiv preprint arXiv:2002.09103, vol. 2, no. 7, 2020.
- [92] M. Mo ̇zejko, M. Susik, and R. Karczewski, “Inhibited softmax for uncertainty estimation in neural networks,” arxiv preprint arXiv:1810.01861, 2018.
- [93] L. Oala, C. Heiß, J. Macdonald, M. M ̈arz, W. Samek, and G. Kutyniok, “Interval neural networks: Uncertainty scores,” arxiv preprint arXiv:2003.11566, 2020.
- [94] A. Malinin and M. Gales, “Reverse kl-divergence training of prior networks: Improved uncertainty and adversarial robustness,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d ́Alch ́e-Buc, E. Fox, and R. Garnett, Eds., 2019, pp. 14 547–14 558.
- [95] V. T. Vasudevan, A. Sethy, and A. R. Ghias, “Towards better confidence estimation for neural models,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 7335–7339.
- [96] M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why relu networks y_ield high-confidence predictions far away from the training data and how to mitigate the problem,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 41–50.
- [97] T. Joo, U. Chung, and M.-G. Seo, “Being bayesian about categorical probability,” in International Conference on Machine Learning. PMLR, 2020, pp. 4950–4961.
- [98] T. Tsiligkaridis, “Failure prediction by confidence estimation of uncertainty-aware dirichlet networks,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2021, pp. 3525–3529.
- [99] ——, “Information robust dirichlet networks for predictive uncertainty estimation,” arxiv preprint arXiv:1910.04819, 2019.
- [100] A. P. Dempster, “A generalization of bayesian inference,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 30, no. 2, pp. 205–232, 1968.
- [101] A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 14 927–14 937.
- [102] B. Charpentier, D. Z ̈ugner, and S. G ̈unnemann, “Posterior network: Uncertainty estimation without ood samples via density-based pseudocounts,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33, 2020, pp. 1356–1367.
- [103] N. Tagasovska and D. Lopez-Paz, “Single-model uncertainties for deep learning,” in Advances in Neural Information Processing Systems, 2019, pp. 6417–6428.
- [104] T. Kawashima, Q. Yu, A. Asai, D. Ikami, and K. Aizawa, “The aleatoric uncertainty estimation using a separate formulation with virtual residuals,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 1438–1445.
- [105] Y.-C. Hsu, Y. Shen, H. Jin, and Z. Kira, “Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 951–10 960.
- [106] J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, “Large automatic learning, rule extraction, and generalization,” Complex systems, vol. 1, no. 5, pp. 877–922, 1987.
- [107] N. Tishby, E. Levin, and S. A. Solla, “Consistent inference of probabilities in layered networks: Predictions and generalization,” in International Joint Conference on Neural Networks, vol. 2, 1989, pp. 403–409.
- [108] W. L. Buntine and A. S. Weigend, “Bayesian back-propagation,” Complex systems, vol. 5, no. 6, pp. 603–643, 1991.
- [109] D. J. C. MacKay, “Bayesian model comparison and backprop nets,” in Advances in neural information processing systems, 1992, pp. 839–846.
- [110] M.-A. Sato, “Online model selection based on the variational bayes,” Neural computation, vol. 13, no. 7, pp. 1649–1681, 2001.
- [111] A. Corduneanu and C. M. Bishop, “Variational bayesian model selection for mixture distributions,” in Artificial intelligence and Statistics, vol. 2001. Morgan Kaufmann Waltham, MA, 2001, pp. 27–34.
- [112] S. Ghosh, J. Yao, and F. Doshi-Velez, “Model selection in bayesian neural networks via horseshoe priors,” Journal of Machine Learning Research, vol. 20, no. 182, pp. 1–46, 2019.
- [113] M. Federici, K. Ullrich, and M. Welling, “Improved bayesian compression,” arxiv preprint arXiv:1711.06494, 2017.
- [114] J. Achterhold, J. M. Koehler, A. Schmeink, and T. Genewein, “Variational network quantization,” in International Conference on Learning Representations, 2018.
- [115] D. J. MacKay, “Information-based objective functions for active data selection,” Neural computation, vol. 4, no. 4, pp. 590–604, 1992.
- [116] A. Kirsch, J. van Amersfoort, and Y. Gal, “Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning,” in Advances in Neural Information Processing Systems, 2019, pp. 7026– 7037.
- [117] C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Variational continual learning,” in International Conference on Learning Representations, 2018.
- [118] S. Ebrahimi, M. Elhoseiny, T. Darrell, and M. Rohrbach, “Uncertainty-guided continual learning with bayesian neural networks,” in International Conference on Learning Representations, 2020.
- [119] S. Farquhar and Y. Gal, “A unifying bayesian view of continual learning,” arxiv preprint arXiv:1902.06494, 2019.
- [120] H. Li, P. Barnaghi, S. Enshaeifar, and F. Ganz, “Continual learning using bayesian neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- [121] M. E. E. Khan, A. Immer, E. Abedi, and M. Korzepa, “Apprxiate inference turns deep networks into gaussian processes,” in Advances in neural information processing systems, 2019, pp. 3094–3104.
- [122] J. S. Denker and Y. LeCun, “Transforming neural-net output levels to probability distributions,” in Advances in neural information processing systems, 1991, pp. 853–859.
- [123] D. J. MacKay, “A practical bayesian framework for backpropagation networks,” Neural computation, vol. 4, no. 3, pp. 448–472, 1992.
- [124] J. Hernandez-Lobato, Y. Li, M. Rowland, T. Bui, D. Hern ́andezLobato, and R. Turner, “Black-box alpha divergence minimization,” in International Conference on Machine Learning, 2016, pp. 1511–1520.
- [125] Y. Li and Y. Gal, “Dropout inference in bayesian neural networks with alpha-divergences,” in International Conference on Machine Learning, 2017, pp. 2052–2061.
- [126] T. Minka et al., “Divergence measures and message passing,” Technical report, Microsoft Research, Tech. Rep., 2005.
- [127] T. P. Minka, “Expectation propagation for apprxiate bayesian inference,” in Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 2001, pp. 362–369.
- [128] J. Zhao, X. Liu, S. He, and S. Sun, “Probabilistic inference of bayesian neural networks with generalized expectation propagation,” Neurocomputing, vol. 412, pp. 392–398, 2020.
- [129] J. M. Hern ́andez-Lobato and R. Adams, “Probabilistic backpropagation for scalable learning of bayesian neural networks,” in International Conference on Machine Learning, 2015, pp. 1861–1869.
- [130] D. Tran, A. Kucukelbir, A. B. Dieng, M. Rudolph, D. Liang, and D. M. Blei, “Edward: A library for probabilistic modeling, inference, and criticism,” arxiv preprint arXiv:1610.09787, 2016.
- [131] D. Tran, M. D. Hoffman, R. A. Saurous, E. Brevdo, K. Murphy, and D. M. Blei, “Deep probabilistic programming,” in International Conference on Machine Learning, 2016.
- [132] E. Bingham, J. P. Chen, M. Jankowiak, F. Obermeyer, N. Pradhan, T. Karaletsos, R. Singh, P. Szerlip, P. Horsfall, and N. D. Goodman, “Pyro: Deep universal probabilistic programming,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 973–978, 2019.
- [133] R. Caba ̃nas, A. Salmer ́on, and A. R. Masegosa, “Inferpy: Probabilistic modeling with tensorflow made easy,” Knowledge-Based Systems, vol. 168, pp. 25–27, 2019.
- [134] Y. Ito, C. Srinivasan, and H. Izumi, “Bayesian learning of neural networks adapted to changes of prior probabilities,” in International Conference on Artificial Neural Networks. Springer, 2005, pp. 253– 259.
- [135] S. Sun, G. Zhang, J. Shi, and R. Grosse, “Functional variational bayesian neural networks,” in International Conference on Learning Representations, 2018.
- [136] S. Depeweg, J. M. Hern ́andez-Lobato, S. Udluft, and T. Runkler, “Sensitivity analysis for predictive uncertainty in bayesian neural networks,” arxiv preprint arXiv:1712.03605, 2017.
- [137] S. Farquhar, L. Smith, and Y. Gal, “Try depth instead of weight correlations: Mean-field is a less restrictive assumption for deeper networks,” arxiv preprint arXiv:2002.03704, 2020.
- [138] J. Postels, F. Ferroni, H. Coskun, N. Navab, and F. Tombari, “Sampling-free epistemic uncertainty estimation using approximated variance propagation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 2931–2940.
- [139] J. Gast and S. Roth, “Lightweight probabilistic deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3369–3378.
- [140] S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-Velez, and S. Udluft, “Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 1184–1193.
- [141] ——, “Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1184–1193.
- [142] D. P. Kingma, T. Salimans, and M. Welling, “Variational Dropout and the local reparameterization trick,” in Advances in neural information processing systems, 2015, pp. 2575–2583.
- [143] A. Wu, S. Nowozin, E. Meeds, R. Turner, J. Hern ́andez-Lobato, and A. Gaunt, “Deterministic variational inference for robust bayesian neural networks,” in 7th International Conference on Learning Representations, ICLR 2019, 2019.
- [144] C. Louizos and M. Welling, “Structured and efficient variational deep learning with matrix gaussian posteriors,” in International Conference on Machine Learning, 2016, pp. 1708–1716.
- [145] G. Zhang, S. Sun, D. Duvenaud, and R. Grosse, “Noisy natural gradient as variational inference,” in International Conference on Machine Learning, 2018, pp. 5852–5861.
- [146] S. Sun, C. Chen, and L. Carin, “Learning structured weight uncertainty in bayesian neural networks,” in Artificial Intelligence and Statistics, 2017, pp. 1283–1292.
- [147] J. Bae, G. Zhang, and R. Grosse, “Eigenvalue corrected noisy natural gradient,” arxiv preprint arXiv:1811.12565, 2018.
- [148] A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, and M. E. Khan, “Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient,” in Advances in Neural Information Processing Systems, 2018, pp. 6245–6255.
- [149] C. Louizos and M. Welling, “Multiplicative normalizing flows for variational bayesian neural networks,” in International Conference on Machine Learning, 2017, pp. 2218–2227.
- [150] K. Osawa, S. Swaroop, M. E. E. Khan, A. Jain, R. Eschenhagen, R. E. Turner, and R. Yokota, “Practical deep learning with bayesian principles,” in Advances in neural information processing systems, 2019, pp. 4287–4299.
- [151] Y. Gal, J. Hron, and A. Kendall, “Concrete Dropout,” in Advances in neural information processing systems, 2017, pp. 3581–3590.
- [152] Z. Eaton-Rosen, F. Bragman, S. Bisdas, S. Ourselin, and M. J. Cardoso, “Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 691–699.
- [153] C. R. N. Tassi, “Bayesian convolutional neural network: Robustly quantify uncertainty for misclassifications detection,” in Mediterranean Conference on Pattern Recognition and Artificial Intelligence. Springer, 2019, pp. 118–132.
- [154] P. McClure and N. Kriegeskorte, “Robustly representing uncertainty through sampling in deep neural networks,” arxiv preprint arXiv:1611.01639, 2016.
- [155] M. Khan, D. Nielsen, V. Tangkaratt, W. Lin, Y. Gal, and A. Srivastava, “Fast and scalable bayesian deep learning by weight-perturbation in adam,” in International Conference on Machine Learning. PMLR, 2018, pp. 2611–2620.
- [156] M. E. Khan, Z. Liu, V. Tangkaratt, and Y. Gal, “Vprop: Variational inference using rmsprop,” in Advances in neural information processing systems, 2017, pp. 3288–3298.
- [157] A. Atanov, A. Ashukha, D. Molchanov, K. Neklyudov, and D. Vetrov, “Uncertainty estimation via stochastic batch normalization,” in International Symposium on Neural Networks. Springer, 2019, pp. 261–269.
- [158] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid monte carlo,” Physics letters B, vol. 195, no. 2, pp. 216–222, 1987.
- [159] R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, vol. 2, no. 11, p. 2, 2011.
- [160] K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola, and E. P. x_ing, “Variance reduction in stochastic gradient langevin dynamics,” in Advances in neural information processing systems, 2016, pp. 1154–1162.
- [161] B. Leimkuhler and S. Reich, Simulating hamiltonian dynamics. Cambridge university press, 2004, vol. 14.
- [162] P. J. Rossky, J. Doll, and H. Friedman, “Brownian dynamics as smart monte carlo simulation,” The Journal of Chemical Physics, vol. 69, no. 10, pp. 4628–4633, 1978.
- [163] G. O. Roberts and O. Stramer, “Langevin diffusions and metropolis-hastings algorithms,” Methodology and computing in applied probability, vol. 4, no. 4, pp. 337–357, 2002.
- [164] H. Kushner and G. G. y_in, Stochastic approximation and recursive algorithms and applications. Springer Science & Business Media, 2003, vol. 35.
- [165] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning. MIT press Cambridge, 2016, vol. 1, no. 2.
- [166] Y.-A. Ma, T. Chen, and E. Fox, “A complete recipe for stochastic gradient mcmc,” in Advances in Neural Information Processing Systems, 2015, pp. 2917–2925.
- [167] G. Marceau-Caron and Y. Ollivier, “Natural langevin dynamics for neural networks,” in International Conference on Geometric Science of Information. Springer, 2017, pp. 451–459.
- [168] Z. Nado, J. Snoek, R. B. Grosse, D. Duvenaud, B. Xu, and J. Martens, “Stochastic gradient langevin dynamics that exploit neural network structure,” in International Conference on Learning Representations (Workshop), 2018.
- [169] U. Simsekli, R. Badeau, A. T. Cemgil, and G. Richard, “Stochastic quasi-newton langevin monte carlo,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48, 2016, pp. 642–651.
- [170] Y. Zhang and C. A. Sutton, “Quasi-newton methods for markov chain monte carlo,” in Advances in Neural Information Processing Systems, 2011, pp. 2393–2401.
- [171] T. Fu, L. Luo, and Z. Zhang, “Quasi-newton hamiltonian monte carlo.” in Conference on Uncertainty in Artificial Intelligence, 2016.
- [172] C. Li, C. Chen, D. Carlson, and L. Carin, “Preconditioned stochastic gradient langevin dynamics for deep neural networks,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2016, pp. 1788–1794.
- [173] S. Ahn, A. K. Balan, and M. Welling, “Bayesian posterior sampling via stochastic gradient fisher scoring,” in International Conference on Learning Representations, 2012.
- [174] S. Patterson and Y. W. Teh, “Stochastic gradient riemannian langevin dynamics on the probability simplex,” in Advances in neural information processing systems, 2013, pp. 3102–3110.
- [175] N. Ye and Z. Zhu, “Stochastic fractional hamiltonian monte carlo,” in IJCAI, 2018, pp. 3019–3025.
- [176] N. Ding, Y. Fang, R. Babbush, C. Chen, R. D. Skeel, and H. Neven, “Bayesian sampling using stochastic gradient thermostats,” in Advances in neural information processing systems, 2014, pp. 3203–3211.
- [177] X. Shang, Z. Zhu, B. Leimkuhler, and A. J. Storkey, “Covariance-controlled adaptive langevin thermostat for large-scale bayesian sampling,” in Advances in Neural Information Processing Systems, 2015, pp. 37–45.
- [178] B. Leimkuhler and X. Shang, “Adaptive thermostats for noisy gradient systems,” SIAM Journal on Scientific Computing, vol. 38, no. 2, pp. A712–A736, 2016.
- [179] S. Ahn, B. Shahbaba, and M. Welling, “Distributed stochastic gradient mcmc,” in International conference on machine learning, 2014, pp. 1044–1052.
- [180] K.-C. Wang, P. Vicol, J. Lucas, L. Gu, R. Grosse, and R. Zemel, “Adversarial distillation of bayesian neural network posteriors,” in International Conference on Machine Learning, 2018, pp. 5190–5199.
- [181] A. K. Balan, V. Rathod, K. P. Murphy, and M. Welling, “Bayesian dark knowledge,” in Advances in Neural Information Processing Systems, 2015, pp. 3438–3446.
- [182] D. Zou, P. Xu, and Q. Gu, “Stochastic variance-reduced hamilton monte carlo methods,” in International Conference on Machine Learning, 2018, pp. 6028–6037.
- [183] A. Durmus, U. Simsekli, E. Moulines, R. Badeau, and G. Richard, “Stochastic gradient richardson-romberg markov chain monte carlo,” in Advances in Neural Information Processing Systems, 2016, pp. 2047– 2055.
- [184] A. Durmus, E. Moulines et al., “High-dimensional bayesian inference via the unadjusted langevin algorithm,” Bernoulli, vol. 25, no. 4A, pp. 2854–2882, 2019.
- [185] I. Sato and H. Nakagawa, “Apprxiation analysis of stochastic gradient langevin dynamics by using fokker-planck equation and ito process,” in International Conference on Machine Learning, 2014, pp. 982–990.
- [186] C. Chen, N. Ding, and L. Carin, “On the convergence of stochastic gradient mcmc algorithms with high-order integrators,” in Advances in Neural Information Processing Systems, 2015, pp. 2278–2286.
- [187] Y. W. Teh, A. H. Thiery, and S. J. Vollmer, “Consistency and fluctuations for stochastic gradient langevin dynamics,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 193–225, 2016.
- [188] C. Li, A. Stevens, C. Chen, Y. Pu, Z. Gan, and L. Carin, “Learning weight uncertainty with stochastic gradient mcmc for shape classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5666–5675.
- [189] F. Wenzel, K. Roth, B. Veeling, J. Swiatkowski, L. Tran, S. Mandt, J. Snoek, T. Salimans, R. Jenatton, and S. Nowozin, “How good is the bayes posterior in deep neural networks really?” in International Conference on Machine Learning. PMLR, 2020, pp. 10 248–10 259.
- [190] N. Ye, Z. Zhu, and R. K. Mantiuk, “Langevin dynamics with continuous tempering for training deep neural networks,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 618–626.
- [191] R. Chandra, K. Jain, R. V. Deo, and S. Cripps, “Langevin-gradient parallel tempering for bayesian neural learning,” Neurocomputing, vol. 359, pp. 315–326, 2019.
- [192] A. Botev, H. Ritter, and D. Barber, “Practical gauss-newton optimisation for deep learning,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017, pp. 557–565.
- [193] J. Martens and R. Grosse, “Optimizing neural networks with kroneckerfactored apprxiate curvature,” in Proceeding of the 32nd International conference on machine learning, 2015, pp. 2408–2417.
- [194] S. Becker and Y. LeCun, “Improving the convergence of back-propagation learning with second-order methods,” in Proceedings of the 1988 Connectionist Models Summer School, San Mateo, D. Touretzky, G. Hinton, and T. Sejnowski, Eds. Morgan Kaufmann, 1989, pp. 29–37.
- [195] D. C. Liu and J. Nocedal, “On the limited memory bfgs method for large scale optimization,” Mathematical Programming, vol. 45, pp. 503–528, 08 1989.
- [196] P. Hennig, “Fast probabilistic optimization from noisy gradients,” in Proceedings of the 30th International Conference on Machine Learning, vol. 28-1. PMLR, 2013, pp. 62–70.
- [197] N. L. Roux and A. W. Fitzgibbon, “A fast natural newton method,” in Proceedings of the International Conference on Machine Learning, 2010.
- [198] R. B. Grosse and J. Martens, “A kronecker-factored apprxiate fisher matrix for convolution layers,” in Proceedings of the 33nd International Conference on Machine Learning, 2016, pp. 573–582.
- [199] S.-W. Chen, C.-N. Chou, and E. Chang, “Bda-pch: Block-diagonal approximation of positive-curvature hessian for training neural networks,” CoRR, abs/1802.06502, 2018.
- [200] J. Ba, R. Grosse, and J. Martens, “Distributed second-order optimization using kronecker-factored approximations,” in International Conference on Learning Representations, 2017.
- [201] T. George, C. Laurent, X. Bouthillier, N. Ballas, and P. Vincent, “Fast apprxiate natural gradient descent in a kronecker factored eigenbasis,” in Advances in Neural Information Processing Systems, 2018, pp. 9573–9583.
- [202] Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in Neural Information Processing Systems 2, 1990, pp. 598–605.
- [203] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
- [204] A. Kristiadi, M. Hein, and P. Hennig, “Being bayesian, even just a bit, fixes overconfidence in relu networks,” in International Conference on Machine Learning. PMLR, 2020, pp. 5436–5446.
- [205] M. Humt, J. Lee, and R. Triebel, “Bayesian optimization meets laplace approximation for robotic introspection,” arxiv preprint arXiv:2010.16141, 2020.
- [206] A. Kristiadi, M. Hein, and P. Hennig, “Learnable uncertainty under laplace approximations,” arxiv preprint arXiv:2010.02720, 2020.
- [207] K. Shinde, J. Lee, M. Humt, A. Sezgin, and R. Triebel, “Learning multiplicative interactions with bayesian neural networks for visual inertial odometry,” in Workshop on AI for Autonomous Driving at the 37th International Conference on Machine Learning, 2020.
- [208] J. Feng, M. Durner, Z.-C. Marton, F. Balint-Benczedi, and R. Triebel, “Introspective robot perception using smoothed predictions from bayesian neural networks,” in International Symposium on Robotics Research, 2019.
- [209] A. Y. Foong, Y. Li, J. M. Hern ́andez-Lobato, and R. E. Turner, “’inbetween’ uncertainty in bayesian neural networks,” arxiv preprint arXiv:1906.11537, 2019.
- [210] A. Immer, M. Korzepa, and M. Bauer, “Improving predictions of bayesian neural nets via local linearization,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 703–711.
- [211] M. Hobbhahn, A. Kristiadi, and P. Hennig, “Fast predictive uncertainty for classification with bayesian deep networks,” arxiv preprint arXiv:2003.01227, 2020.
- [212] E. Daxberger, E. Nalisnick, J. U. Allingham, J. Antor ́an, and J. M. Hern ́andez-Lobato, “Expressive yet tractable bayesian deep learning via subnetwork inference,” arxiv preprint arXiv:2010.14689, 2020.
- [213] W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” in Advances in Neural Information Processing Systems, 2019, pp. 13 153– 13 164.
- [214] J. Mukhoti, P. Stenetorp, and Y. Gal, “On the importance of strong baselines in bayesian deep learning,” arxiv preprint arXiv:1811.09385, 2018.
- [215] A. Filos, S. Farquhar, A. N. Gomez, T. G. Rudner, Z. Kenton, L. Smith, M. Alizadeh, A. de Kroon, and Y. Gal, “A systematic comparison of bayesian deep learning robustness in diabetic retinopathy tasks,” arxiv preprint arXiv:1912.10481, 2019.
- [216] J. Mukhoti and Y. Gal, “Evaluating bayesian deep learning methods for semantic segmentation,” arxiv preprint arXiv:1811.12709, 2018.
- [217] O. Sagi and L. Rokach, “Ensemble learning: A survey,” WIREs Data Mining and Knowledge Discovery, vol. 8, no. 4, p. e1249, 2018.
- [218] L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE transactions on pattern analysis and machine intelligence, vol. 12, no. 10, pp. 993–1001, 1990.
- [219] Y. Cao, T. A. Geddes, J. Y. H. Yang, and P. Yang, “Ensemble deep learning in bioinformatics,” Nature Machine Intelligence, pp. 1–9, 2020.
- [220] L. Nannia, S. Ghidoni, and S. Brahnam, “Ensemble of convolutional neural networks for bioimage classification,” Applied Computing and Informatics, 2020.
- [221] L. Wei, S. Wan, J. Guo, and K. K. Wong, “A novel hierarchical selective ensemble classifier with bioinformatics application,” Artificial intelligence in medicine, vol. 83, pp. 82–90, 2017.
- [222] F. Lv, M. Han, and T. Qiu, “Remote sensing image classification based on ensemble extreme learning machine with stacked autoencoder,” IEEE Access, vol. 5, pp. 9021–9031, 2017.
- [223] X. Dai, X. Wu, B. Wang, and L. Zhang, “Semisupervised scene classification for remote sensing images: A method based on convolutional 39 neural networks and ensemble learning,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 6, pp. 869–873, 2019.
- [224] E. Marushko and A. Doudkin, “Methods of using ensembles of heterogeneous models to identify remote sensing objects,” Pattern Recognition and Image Analysis, vol. 30, no. 2, pp. 211–216, 2020.
- [225] T. Kurutach, I. Clavera, Y. Duan, A. Tamar, and P. Abbeel, “Model-ensemble trust-region policy optimization,” in International Conference on Learning Representations, 2018.
- [226] A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, “Epopt: Learning robust neural network policies using model ensembles,” in International Conference on Learning Representations, 2017.
- [227] S. Fort, H. Hu, and B. Lakshminarayanan, “Deep ensembles: A loss landscape perspective,” arxiv preprint arXiv:1912.02757, 2019.
- [228] A. Renda, M. Barsacchi, A. Bechini, and F. Marcelloni, “Comparing ensemble strategies for deep learning: An application to facial expression recognition,” Expert Systems with Applications, vol. 136, pp. 1–11, 2019.
- [229] E. J. Herron, S. R. Young, and T. E. Potok, “Ensembles of networks produced from neural architecture search,” in International Conference on High Performance Computing. Springer, 2020, pp. 223–234.
- [230] S. Lee, S. Purushwalkam, M. Cogswell, D. Crandall, and D. Batra, “Why m heads are better than one: Training a diverse ensemble of deep networks,” arxiv preprint arXiv:1511.06314, 2015.
- [231] I. E. Livieris, L. Iliadis, and P. Pintelas, “On ensemble techniques of weight-constrained neural networks,” Evolving Systems, pp. 1–13, 2020.
- [232] L. Nanni, S. Brahnam, and G. Maguolo, “Data augmentation for building an ensemble of convolutional neural networks,” in Innovation in Medicine and Healthcare Systems, and Multimedia. Singapore: Springer Singapore, 2019, pp. 61–69.
- [233] J. Guo and S. Gould, “Deep cnn ensemble with data augmentation for object detection,” arxiv preprint arXiv:1506.07224, 2015.
- [234] R. Rahaman and A. H. Thiery, “Uncertainty quantification and deep ensembles,” stat, vol. 1050, p. 20, 2020.
- [235] Y. Wen, G. Jerfel, R. Muller, M. W. Dusenberry, J. Snoek, B. Lakshminarayanan, and D. Tran, “Combining ensembles and data augmentation can harm your calibration,” in International Conference on Learning Representations, 2021.
- [236] W. Kim, B. Goyal, K. Chawla, J. Lee, and K. Kwon, “Attention-based ensemble for deep metric learning,” in Proceedings of the European Conference on Computer Vision, 2018.
- [237] J. Yang and F. Wang, “Auto-ensemble: An adaptive learning rate scheduling based deep learning model ensembling,” IEEE Access, vol. 8, pp. 217 499–217 509, 2020.
- [238] M. Leutbecher and T. N. Palmer, “Ensemble forecasting,” Journal of computational physics, vol. 227, no. 7, pp. 3515–3539, 2008.
- [239] W. S. Parker, “Ensemble modeling, uncertainty and robust predictions,” Wiley Interdisciplinary Reviews: Climate Change, vol. 4, no. 3, pp. 213–223, 2013.
- [240] W. H. Beluch, T. Genewein, A. N ̈urnberger, and J. M. K ̈ohler, “The power of ensembles for active learning in image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9368–9377.
- [241] A. Vyas, N. Jammalamadaka, X. Zhu, D. Das, B. Kaul, and T. L. Willke, “Out-of-distribution detection using an ensemble of self supervised leave-out classifiers,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 550–564.
- [242] J. Koci ́c, N. Joviˇci ́c, and V. Drndarevi ́c, “An end-to-end deep neural network for autonomous driving designed for embedded automotive platforms,” Sensors, vol. 19, no. 9, p. 2064, 2019.
- [243] G. Mart ́ınez-Mu ̃noz, D. Hern ́andez-Lobato, and A. Su ́arez, “An analysis of ensemble pruning techniques based on ordered aggregation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 245–259, 2008.
- [244] C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535–541.
- [245] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” stat, vol. 1050, p. 9, 2015.
- [246] E. Englesson and H. Azizpour, “Efficient evaluation-time uncertainty estimation by improved distillation,” in Workshop on Uncertainty and Robustness in Deep Learning at International Conference on Machine Learning, 2019.
- [247] S. Reich, D. Mueller, and N. Andrews, “Ensemble distillation for structured prediction: Calibrated, accurate, fast—choose three,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 5583–5595.
- [248] G. Wang, W. Li, S. Ourselin, and T. Vercauteren, “Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 61–72.
- [249] G. Wang, W. Li, M. Aertsen, J. Deprest, S. Ourselin, and T. Vercauteren, “Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks,” Neurocomputing, vol. 338, pp. 34–45, 2019.
- [250] N. Moshkov, B. Mathe, A. Kertesz-Farkas, R. Hollandi, and P. Horvath, “Test-time augmentation for deep learning-based cell segmentation on microscopy images,” Scientific reports, vol. 10, no. 1, pp. 1–7, 2020.
- [251] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
- [252] D. Shanmugam, D. Blalock, G. Balakrishnan, and J. Guttag, “When and why test-time augmentation works,” arxiv preprint arXiv:2011.11156, 2020.
- [253] I. Kim, Y. Kim, and S. Kim, “Learning loss for test-time augmentation,” in Advances in Neural Information Processing Systems, 2020, pp. 4163–4174.
- [254] Q. Yu and K. Aizawa, “Unsupervised out-of-distribution detection by mxium classifier discrepancy,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9518–9526.
- [255] J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,” in Advances in Neural Information Processing Systems, 2019, pp. 14 707–14 718.
- [256] J. Yao, W. Pan, S. Ghosh, and F. Doshi-Velez, “Quality of uncertainty quantification for bayesian neural network inference,” arxiv preprint arXiv:1906.09686, 2019.
- [257] X. Huang, J. Yang, L. Li, H. Deng, B. Ni, and Y. Xu, “Evaluating and Boosting uncertainty quantification in classification,” arxiv preprint arXiv:1909.06030, 2019.
- [258] J. Davis and M. Goadrich, “The relationship between precision-recall and roc curves,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233–240.
- [259] T. Pearce, A. Brintrup, M. Zaki, and A. Neely, “High-quality prediction intervals for deep learning: A distribution-free, ensembled approach,” in International Conference on Machine Learning. PMLR, 2018, pp. 4075–4084.
- [260] D. Su, Y. Y. Ting, and J. Ansel, “Tight prediction intervals using expanded interval minimization,” arxiv preprint arXiv:1806.11222, 2018.
- [261] P. McClure, N. Rho, J. A. Lee, J. R. Kaczmarzyk, C. Y. Zheng, S. S. Ghosh, D. M. Nielson, A. G. Thomas, P. Bandettini, and F. Pereira, “Knowing what you know in brain segmentation using bayesian deep neural networks,” Frontiers in neuroinformatics, vol. 13, p. 67, 2019.
- [262] A. P. Soleimany, H. Suresh, J. J. G. Ortiz, D. Shanmugam, N. Gural, J. Guttag, and S. N. Bhatia, “Image segmentation of liver stage malaria infection with spatial uncertainty sampling,” arxiv preprint arXiv:1912.00262, 2019.
- [263] R. D. Soberanis-Mukul, N. Navab, and S. Albarqouni, “Uncertainty-based graph convolutional networks for organ segmentation refinement,” in Medical Imaging with Deep Learning. PMLR, 2020, pp. 755–769.
- [264] P. Seebock, J. I. Orlando, T. Schlegl, S. M. Waldstein, H. Bogunovic, S. Klimscha, G. Langs, and U. Schmidt-Erfurth, “Exploiting epistemic uncertainty of anatomy segmentation for anomaly detection in retinal oct,” IEEE Transactions on Medical Imaging, vol. 39, p. 87–98, 2020.
- [265] V. Kuleshov, N. Fenner, and S. Ermon, “Accurate uncertainties for deep learning using calibrated regression,” in International Conference on Machine Learning. PMLR, 2018, pp. 2796–2804.
- [266] S. Seo, P. H. Seo, and B. Han, “Learning for single-shot confidence calibration in deep neural networks through stochastic inferences,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9030–9038.
- [267] Z. Li and D. Hoiem, “Improving confidence estimates for unfamiliar examples,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2686–2695.
- [268] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
- [269] G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, “Regularizing neural networks by penalizing confident output distributions,” arxiv preprint arXiv:1701.06548, 2017.
- [270] R. M ̈uller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?” in Advances in Neural Information Processing Systems, 2019, pp. 4694–4703.
- [271] B. Venkatesh and J. J. Thiagarajan, “Heteroscedastic calibration of uncertainty estimators in deep learning,” arxiv preprint arXiv:1910.14179, 2019.
- [272] P. Izmailov, W. J. Maddox, P. Kirichenko, T. Garipov, D. Vetrov, and A. G. Wilson, “Subspace inference for bayesian deep learning,” in Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 1169–1179.
- [273] Z. Zhang, A. V. Dalca, and M. R. Sabuncu, “Confidence calibration for convolutional neural networks using structured Dropout,” arxiv preprint arXiv:1906.09551, 2019.
- [274] M.-H. Laves, S. Ihler, K.-P. Kortmann, and T. Ortmaier, “Well-calibrated model uncertainty with temperature scaling for Dropout variational inference,” arxiv preprint arXiv:1909.13550, 2019.
- [275] A. Mehrtash, W. M. Wells, C. M. Tempany, P. Abolmaesumi, and T. Kapur, “Confidence calibration and predictive uncertainty estimation for deep medical image segmentation,” IEEE Transactions on Medical Imaging, 2020.
- [276] B. Zadrozny and C. Elkan, “Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers,” in International Conference on Machine Learning, vol. 1. Citeseer, 2001, pp. 609– 616.
- [277] D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” in International Conference on Learning Representations, 2019.
- [278] S. Thulasidasan, G. Chennupati, J. A. Bilmes, T. Bhattacharya, and S. Michalak, “On mixup training: Improved calibration and predictive uncertainty for deep neural networks,” in Advances in Neural Information Processing Systems, 2019, pp. 13 888–13 899.
- [279] J. Maro ̃nas, D. Ramos, and R. Paredes, “Improving calibration in mixup-trained deep neural networks through confidence-based loss functions,” arxiv preprint arXiv:2003.09946, 2020.
- [280] K. Patel, W. Beluch, D. Zhang, M. Pfeiffer, and B. Yang, “On-manifold adversarial data augmentation improves uncertainty calibration,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 8029–8036.
- [281] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid objects using mean shift,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2. IEEE, 2000, pp. 142–149.
- [282] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.
- [283] M. P. Naeini, G. F. Cooper, and M. Hauskrecht, “Obtaining well calibrated probabilities using bayesian binning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 2015, 2015, p. 2901.
- [284] M. Kull, M. Perell ́o-Nieto, M. K ̈angsepp, T. de Menezes e Silva Filho, H. Song, and P. A. Flach, “Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration,” in Advances in Neural Information Processing Systems, 2019, pp. 12 295– 12 305.
- [285] D. Levi, L. Gispan, N. Giladi, and E. Fetaya, “Evaluating and calibrating uncertainty prediction in regression tasks,” arxiv preprint arXiv:1905.11659, 2019.
- [286] J. Vaicenavicius, D. Widmann, C. Andersson, F. Lindsten, J. Roll, and T. Sch ̈on, “Evaluating model calibration in classification,” in The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 3459–3467.
- [287] M. H. DeGroot and S. E. Fienberg, “The comparison and evaluation of forecasters,” Journal of the Royal Statistical Society: Series D (The Statistician), vol. 32, no. 1-2, pp. 12–22, 1983.
- [288] J. Nixon, M. W. Dusenberry, L. Zhang, G. Jerfel, and D. Tran, “Measuring calibration in deep learning,” in CVPR Workshops, 2019, pp. 38–41.
- [289] A. Ghandeharioun, B. Eoff, B. Jou, and R. Picard, “Characterizing sources of uncertainty to proxy calibration and disambiguate annotator and data bias,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop. IEEE, 2019, pp. 4202–4206.
- [290] F. J. Pulgar, A. J. Rivera, F. Charte, and M. J. del Jesus, “On the impact of imbalanced data in convolutional neural networks performance,” in International Conference on Hybrid Artificial Intelligence Systems. Springer, 2017, pp. 220–232.
- [291] K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Advances in Neural Information Processing Systems, 2018, pp. 7167– 7177.
- [292] M. L. Iuzzolino, T. Umada, N. R. Ahmed, and D. A. Szafir, “In automation we trust: Investigating the role of uncertainty in active learning systems,” arxiv preprint arXiv:2004.00762, 2020.
- [293] B. Settles, “Active learning literature survey,” University of Wisconsin-Madison Department of Computer Sciences, Tech. Rep., 2009.
- [294] R. Pop and P. Fulop, “Deep ensemble bayesian active learning: Addressing the mode collapse issue in monte carlo Dropout via ensembles,” arxiv preprint arXiv:1811.03897, 2018.
- [295] M. Ghavamzadeh, S. Mannor, J. Pineau, and A. Tamar, “Bayesian reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 8, no. 5-6, pp. 359–483, 2015.
- [296] S. Hu, D. Worrall, S. Knegt, B. Veeling, H. Huisman, and M. Welling, “Supervised uncertainty quantification for segmentation with multiple annotations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 137–145.
- [297] F. C. Ghesu, B. Georgescu, E. Gibson, S. Guendel, M. K. Kalra, R. Singh, S. R. Digumarthy, S. Grbic, and D. Comaniciu, “Quantifying and leveraging classification uncertainty for chest radiograph assessment,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 676–684.
- [298] M. S. Ayhan, L. Kuehlewein, G. Aliyeva, W. Inhoffen, F. Ziemssen, and P. Berens, “Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection,” Medical Image Analysis, p. 101724, 2020.
- [299] N. Sunderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford et al., “The limits and potentials of deep learning for robotics,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 405–420, 2018.
- [300] S. Thrun, “Probabilistic robotics,” Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002.
- [301] D. Fox, “Markov localization-a probabilistic framework for mobile robot localization and navigation.” Ph.D. dissertation, Citeseer, 1998.
- [302] D. Fox, W. Burgard, H. Kruppa, and S. Thrun, “A probabilistic approach to collaborative multi-robot localization,” Autonomous robots, vol. 8, no. 3, pp. 325–344, 2000.
- [303] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlo localization for mobile robots,” Artificial intelligence, vol. 128, no. 1-2, pp. 99–141, 2001.
- [304] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” IEEE robotics & automation magazine, vol. 13, no. 2, pp. 99–110, 2006.
- [305] T. Bailey and H. Durrant-Whyte, “Simultaneous localization and mapping (slam): Part ii,” IEEE robotics & automation magazine, vol. 13, no. 3, pp. 108–117, 2006.
- [306] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “Fastslam: A factored solution to the simultaneous localization and mapping problem,” Aaai/iaai, vol. 593598, 2002.
- [307] M. Kaess, V. Ila, R. Roberts, and F. Dellaert, “The bayes tree: An algorithmic foundation for probabilistic robot mapping,” in Algorithmic Foundations of Robotics IX. Springer, 2010, pp. 157–173.
- [308] F. Dellaert and M. Kaess, “Factor graphs for robot perception,” Foundations and Trends in Robotics, vol. 6, no. 1-2, pp. 1–139, 2017.
- [309] H. A. Loeliger, “An introduction to factor graphs,” IEEE Signal Processing Magazine, vol. 21, no. 1, pp. 28–41, 2004.
- [310] D. Silver and J. Veness, “Monte-carlo planning in large pomdps,” in Advances in Neural Information Processing Systems, 2010.
- [311] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning algorithms for pomdps,” Journal of Artificial Intelligence Research, vol. 32, pp. 663–704, 2008.
- [312] S. M. Richards, F. Berkenkamp, and A. Krause, “The lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems,” in Conference on Robot Learning. PMLR, 2018, pp. 466– 476.
- [313] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controller optimization for quadrotors with gaussian processes,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 491–496.
- [314] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in Neural Information Processing Systems, 2017.
- [315] H. Grimmett, R. Triebel, R. Paul, and I. Posner, “Introspective classification for robot perception,” The International Journal of Robotics Research, vol. 35, no. 7, pp. 743–762, 2016.
- [316] R. Bajcsy, “Active perception,” Proceedings of the IEEE, vol. 76, no. 8, pp. 966–1005, 1988.
- [317] R. Triebel, H. Grimmett, R. Paul, and I. Posner, “Driven learning for driving: How introspection improves semantic mapping,” in Robotics Research. Springer, 2016, pp. 449–465.
- [318] A. Narr, R. Triebel, and D. Cremers, “Stream-based active learning for efficient and adaptive classification of 3d objects,” in 2016 IEEE International Conference on Robotics and Automation. IEEE, 2016, pp. 227–233.
- [319] D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical models,” Journal of artificial intelligence research, vol. 4, pp. 129–145, 1996.
- [320] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427–436.
- [321] K. Wong, S. Wang, M. Ren, M. Liang, and R. Urtasun, “Identifying unknown instances for autonomous driving,” in Conference on Robot Learning. PMLR, 2020, pp. 384–393.
- [322] W. Boerdijk, M. Sundermeyer, M. Durner, and R. Triebel, “”What’s this?”–learning to segment unknown objects from manipulation sequences,” in Intern. Conf. on Robotics and Automation, 2021.
- [323] C. Richter and N. Roy, “Safe visual navigation via deep learning and novelty detection,” Robotics: Science and Systems Foundation, 2017.
- [324] V. Peretroukhin, M. Giamou, D. M. Rosen, W. N. Greene, N. Roy, and J. Kelly, “A smooth representation of belief over so (3) for deep rotation learning with uncertainty,” arxiv preprint arXiv:2006.01031, 2020.
- [325] B. L ̈utjens, M. Everett, and J. P. How, “Safe reinforcement learning with model uncertainty estimates,” in 2019 International Conference on Robotics and Automation. IEEE, 2019, pp. 8662–8668.
- [326] G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine, “Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 5129–5136.
- [327] F. Stulp, E. Theodorou, J. Buchli, and S. Schaal, “Learning to grasp under uncertainty,” in 2011 IEEE International Conference on Robotics and Automation. IEEE, 2011, pp. 5703–5708.
- [328] V. Tchuiev and V. Indelman, “Inference over distribution of posterior class probabilities for reliable bayesian classification and object-level perception,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4329–4336, 2018.
- [329] Y. Feldman and V. Indelman, “Bayesian viewpoint-dependent robust classification under model and localization uncertainty,” in 2018 IEEE International Conference on Robotics and Automation. IEEE, 2018, pp. 3221–3228.
- [330] N. Yang, L. von Stumberg, R. Wang, and D. Cremers, “D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1281–1292.
- [331] S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 2043–2050.
- [332] C. Gur ̆au, C. H. Tong, and I. Posner, “Fit for purpose? predicting perception performance based on past experience,” in International Symposium on Experimental Robotics. Springer, 2016, pp. 454–464.
- [333] S. Daftry, S. Zeng, J. A. Bagnell, and M. Hebert, “Introspective perception: Learning to predict failures in vision systems,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 1743–1750.
- [334] M. Netzband, W. L. Stefanov, and C. Redman, Applied remote sensing for urban planning, governance and sustainability. Springer Science & Business Media, 2007.
- [335] C. Giardino, M. Bresciani, P. Villa, and A. Martinelli, “Application of remote sensing in water resource management: the case study of lake trasimeno, italy,” Water resources management, vol. 24, no. 14, pp. 3885–3899, 2010.
- [336] C. J. Van Westen, “Remote sensing for natural disaster management,” International archives of photogrammetry and remote sensing, vol. 33, no. B7/4; PART 7, pp. 1609–1617, 2000.
- [337] X. X. Zhu, D. Tuia, L. Mou, G.-S. x_ia, L. Zhang, F. Xu, and F. Fraundorfer, “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017.
- [338] J. Gawlikowski, M. Schmitt, A. Kruspe, and X. X. Zhu, “On the fusion strategies of sentinel-1 and sentinel-2 data for local climate zone classification,” in IEEE International Geoscience and Remote Sensing Symposium, 2020, pp. 2081–2084.
- [339] ESA, “European space agency (esa) developed earth observation satellites,” 2019. Available: http://www.esa.int:8080/ESA Multimedia/Images/2019/05/ESA-developed Earth observation missions.
- [340] Z. Nado, N. Band, M. Collier, J. Djolonga, M. W. Dusenberry, S. Farquhar, A. Filos, M. Havasi, R. Jenatton, G. Jerfel et al., “Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning,” arxiv preprint arXiv:2106.04015, 2021.
- [341] M. Kull and P. A. Flach, “Reliability maps: a tool to enhance probability estimates and improve classification accuracy,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2014, pp. 18–33.
- [342] J. Antor ́an, U. Bhatt, T. Adel, A. Weller, and J. M. Hern ́andez-Lobato, “Getting a clue: A method for explaining uncertainty estimates,” in International Conference on Learning Representations, 2021.
- [343] M. Reichstein, G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, and N. Carvalhais, “Deep learning and process understanding for data-driven earth system science,” Nature, vol. 566, no. 7743, pp. 195–204, 2019.
- [344] J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, “Integrating physics-based modeling with machine learning: A survey,” arxiv preprint arXiv:2003.04919, 2020.
- [345] E. De B ́ezenac, A. Pajot, and P. Gallinari, “Deep learning for physical processes: Incorporating prior scientific knowledge,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124009, 2019.