{ "cells": [ { "cell_type": "markdown", "id": "deffff80", "metadata": {}, "source": [ "# 附录 F:贝叶斯深度学习编程初步\n", "\n", "[原文](https://twiecki.io/blog/2016/06/01/bayesian-deep-learning/)\n", "\n", "\n", "\n", "\n", "\n", "## 1 概述\n", "\n", "### 1.1 机器学习的当前趋势\n", "\n", "目前机器学习有三大趋势:`概率编程`、`深度学习`和`大数据`。在概率编程中,很多创新使用`变分推断`以使事物能够适应规模。在本文中,我将展示如何在 `PyMC3` 中使用`变分推断`来拟合一个简单的贝叶斯神经网络。我还将讨论如何将概率编程和深度学习联系起来,以便在未来研究中开辟非常有趣的途径。\n", "\n", "### 1.2 大规模概率编程\n", "\n", "概率编程允许灵活地创建自定义概率模型,并且主要关注洞察力和从数据中学习。该方法本质上是贝叶斯的,因此可以指定先验来约束模型,并以后验分布的形式获得不确定性估计。使用 `MCMC` 采样算法,可以从后验中抽取样本来灵活估计这些模型。 `PyMC3` 和 `Stan` 是当前构建和估计此类模型的领先工具。然而,采样法的主要缺点是非常慢,尤其是对于高维模型。这就是为什么最近`变分推断算法`与 `MCMC` 一样灵活,但速度要快得多。\n", "\n", "变分算法不从后验中抽取样本,而是将某种分布(例如正态)拟合到后验上,将采样问题转化为优化问题。 自动微分变分推断( `ADVI` )方法在 `PyMC3` 、 `Stan` 和 `Edward`(现在改为 `TensorFlow Probability -- TFP`)中被作为变分推断的主要算法实现。\n", "\n", "不幸的是:在传统的分类或回归等机器学习问题中,概率编程通常在非参数学习、集成学习(例如随机森林或梯度提升回归树)等算法中扮演着次要角色。\n", "\n", "### 1.3 深度学习 \n", "\n", "现在深度学习已经成为热点,支配了几乎所有的物体识别基准,在 [`Atari` 游戏](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)中获胜,并且[战胜了世界围棋冠军李世石](http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html)。从统计学角度看,神经网络非常擅长非线性函数逼近和表征学习。深度学习不仅用于分类任务,还可以通过自编码器和其他各种有趣的方法扩展到非监督学习,例如:采用 [循环神经网络 `RNN`](https://en.wikipedia.org/wiki/Recurrent_neural_network),或[混合密度神经网络 `MDN`](http://cbonnett.github.io/MDN_EDWARD_KERAS_TF.html) 来估计多模态分布。其效果为何如此好?没有人真正知道原因,因为有些统计特性仍不为人完全理解。\n", "\n", "深度学习的优势之一是可以训练极其复杂的模型。这依赖于几个基础:\n", "\n", "- **速度:** 提高GPU性能获得更快的处理。\n", "- **软件:** 像[Theano](http://deeplearning.net/software/theano/)和[TensorFlow](https://www.tensorflow.org/)这样的框架允许灵活创建抽象模型,然后可以对其优化并编译到 `CPU` 或 `GPU` 上。\n", "- **学习算法:** 在数据子集上作随机梯度下降可以让我们在海量数据上训练这些模型。使用 `drop-out` 等技术可避免过拟合。\n", "- **架构:** 大量创新都是改变输入层,比如卷积神经网络,或改变输出层,比如[MDN](http://cbonnett.github.io/MDN_EDWARD_KERAS_TF.html)。\n", "\n", "\n", "### 1.4 连接深度学习和概率编程\n", "\n", "一方面,概率编程允许我们以非常有原则和易于理解的方式构建相当小但聚焦的模型,以深入了解数据;另一方面,深度学习使用启发式方法来训练巨大且高度复杂的模型,并在预测方面非常出色。\n", "\n", "然而,最新的 `变分推断` 方法允许概率编程扩展模型复杂性和数据大小。因此,我们正处于能够将两种方法结合起来以开启机器学习新创新的风口浪尖。如需更多动机,另请参阅 `Dustin Tran` 最近的 [博客文章](http://dustintran.com/blog/a-quick-update-edward-and-some-motivations/)。\n", "\n", "虽然这种变化将使概率编程能够应用于更广泛的问题,但我相信这种变化也为深度学习的创新带来了巨大可能性。一些基本的想法包括: \n", "\n", "- **预测的不确定性:** 正如下面将看到的,贝叶斯神经网络能够得到其预测的不确定性。我认为不确定性在机器学习中是一个被低估的概念,因为它不仅对现实世界的应用很重要,而且在训练中也很有用。例如,可以专门针对最不确定的样本训练模型。\n", "\n", "- **表征的不确定性:** 通过贝叶斯神经网络能够获得权重的不确定性估计,进而告诉我们表征学习的稳定性。\n", "\n", "- **使用先验进行正则化:** 权重通常被 `L2 正则化`以避免过度拟合,这等效于为权重参数设置高斯先验。但我们可以设想其他先验,比如用`尖板分布(spike-and-slab)` 来强化稀疏性(更像使用 L1 正则)。\n", "\n", "- **有先验知识的迁移学习:** 如果想在新对象识别数据集上训练网络,我们可以为网络权重设置以某些预训练网络(如 `GoogLeNet`)权重为中心的先验,进而引导新网络的训练和学习。\n", "\n", "- **分层神经网络:** 概率编程中一个非常强大的方法是分层模型,它允许将在子组上学到的东西汇集到总体中(参见 `PyMC3` 中的[分层线性回归教程](https://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/))。而结合神经网络,可以在分层数据集中训练单个神经网络以专注于子组,同时仍然了解整体人口的表征。例如:想象一个经过训练的网络,可以从汽车图片中对汽车模型进行分类。我们可以设置一个分层神经网络,其中训练子神经网络来区分单个制造商的模型。直觉是来自某个制造商的汽车都有某些相似之处,因此训练针对品牌的专门子网络是有意义的。但由于各网络在更高层存在连接,它们仍会与其他专门子网络共享有关所有品牌的有用信息。有趣的是,神经网络的不同层可以通过层次结构的各层来通知,例如:提取线条的网络层在所有子网络中可能是相同的,而高阶表征会有所不同。分层模型将从数据中学习所有这些信息。\n", "\n", "- **其他混合架构:** 我们可以更自由地构建各种神经网络。例如,非参数的贝叶斯模型可用于灵活调整隐藏层的大小和形状,以便在训练期间针对手头问题优化网络架构。目前,这需要代价高昂的超参数优化和大量知识。\n", "\n", "## 2 PyMC3 中的贝叶斯神经网络\n", "\n", "### 2.1 生成数据\n", "\n", "首先,生成一些有关玩具的数据,用于分析一个非线性可分的简单二分类问题。" ] }, { "cell_type": "code", "execution_count": 1, "id": "1457b596", "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNoSectionError\u001b[0m Traceback (most recent call last)", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/configparser.py:238\u001b[0m, in \u001b[0;36mTheanoConfigParser.fetch_val_for_key\u001b[0;34m(self, key, delete_key)\u001b[0m\n\u001b[1;32m 237\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 238\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_theano_cfg\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[43msection\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moption\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m InterpolationError:\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/configparser.py:781\u001b[0m, in \u001b[0;36mRawConfigParser.get\u001b[0;34m(self, section, option, raw, vars, fallback)\u001b[0m\n\u001b[1;32m 780\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 781\u001b[0m d \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_unify_values\u001b[49m\u001b[43m(\u001b[49m\u001b[43msection\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mvars\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 782\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m NoSectionError:\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/configparser.py:1149\u001b[0m, in \u001b[0;36mRawConfigParser._unify_values\u001b[0;34m(self, section, vars)\u001b[0m\n\u001b[1;32m 1148\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m section \u001b[38;5;241m!=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdefault_section:\n\u001b[0;32m-> 1149\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m NoSectionError(section) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28mNone\u001b[39m\n\u001b[1;32m 1150\u001b[0m \u001b[38;5;66;03m# Update with the entry specific variables\u001b[39;00m\n", "\u001b[0;31mNoSectionError\u001b[0m: No section: 'blas'", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/configparser.py:354\u001b[0m, in \u001b[0;36mConfigParam.__get__\u001b[0;34m(self, cls, type_, delete_key)\u001b[0m\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 354\u001b[0m val_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfetch_val_for_key\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelete_key\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdelete_key\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 355\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mis_default \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/configparser.py:242\u001b[0m, in \u001b[0;36mTheanoConfigParser.fetch_val_for_key\u001b[0;34m(self, key, delete_key)\u001b[0m\n\u001b[1;32m 241\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (NoOptionError, NoSectionError):\n\u001b[0;32m--> 242\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key)\n", "\u001b[0;31mKeyError\u001b[0m: 'blas__ldflags'", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [1]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m get_ipython()\u001b[38;5;241m.\u001b[39mrun_line_magic(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmatplotlib\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124minline\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m \n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpymc3\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpm\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01msklearn\u001b[39;00m\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/__init__.py:83\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 75\u001b[0m \u001b[38;5;66;03m# This is the api version for ops that generate C code. External ops\u001b[39;00m\n\u001b[1;32m 76\u001b[0m \u001b[38;5;66;03m# might need manual changes if this number goes up. An undefined\u001b[39;00m\n\u001b[1;32m 77\u001b[0m \u001b[38;5;66;03m# __api_version__ can be understood to mean api version 0.\u001b[39;00m\n\u001b[1;32m 78\u001b[0m \u001b[38;5;66;03m#\u001b[39;00m\n\u001b[1;32m 79\u001b[0m \u001b[38;5;66;03m# This number is not tied to the release version and should change\u001b[39;00m\n\u001b[1;32m 80\u001b[0m \u001b[38;5;66;03m# very rarely.\u001b[39;00m\n\u001b[1;32m 81\u001b[0m __api_version__ \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[0;32m---> 83\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m scalar, tensor\n\u001b[1;32m 84\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcompile\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 85\u001b[0m In,\n\u001b[1;32m 86\u001b[0m Mode,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 93\u001b[0m shared,\n\u001b[1;32m 94\u001b[0m )\n\u001b[1;32m 95\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcompile\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mfunction\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m function, function_dump\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/__init__.py:20\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcompile\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m SpecifyShape, specify_shape\n\u001b[1;32m 10\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mgradient\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 11\u001b[0m Lop,\n\u001b[1;32m 12\u001b[0m Rop,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 18\u001b[0m verify_grad,\n\u001b[1;32m 19\u001b[0m )\n\u001b[0;32m---> 20\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m nnet \u001b[38;5;66;03m# used for softmax, sigmoid, etc.\u001b[39;00m\n\u001b[1;32m 21\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m sharedvar \u001b[38;5;66;03m# adds shared-variable constructors\u001b[39;00m\n\u001b[1;32m 22\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 23\u001b[0m blas,\n\u001b[1;32m 24\u001b[0m blas_c,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 29\u001b[0m xlogx,\n\u001b[1;32m 30\u001b[0m )\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/nnet/__init__.py:3\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mwarnings\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m opt\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mabstract_conv\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m conv2d \u001b[38;5;28;01mas\u001b[39;00m abstract_conv2d\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mabstract_conv\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m conv2d_grad_wrt_inputs, conv3d, separable_conv2d\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/nnet/opt.py:32\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 24\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnnet\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblocksparse\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 25\u001b[0m SparseBlockGemv,\n\u001b[1;32m 26\u001b[0m SparseBlockOuter,\n\u001b[1;32m 27\u001b[0m sparse_block_gemv_inplace,\n\u001b[1;32m 28\u001b[0m sparse_block_outer_inplace,\n\u001b[1;32m 29\u001b[0m )\n\u001b[1;32m 31\u001b[0m \u001b[38;5;66;03m# Cpu implementation\u001b[39;00m\n\u001b[0;32m---> 32\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnnet\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mconv\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m ConvOp, conv2d\n\u001b[1;32m 33\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnnet\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcorr\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m CorrMM, CorrMM_gradInputs, CorrMM_gradWeights\n\u001b[1;32m 34\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnnet\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcorr3d\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Corr3dMM, Corr3dMMGradInputs, Corr3dMMGradWeights\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/nnet/conv.py:20\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mgraph\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbasic\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Apply\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mgraph\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mop\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m OpenMPOp\n\u001b[0;32m---> 20\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m blas\n\u001b[1;32m 21\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbasic\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 22\u001b[0m NotScalarConstantError,\n\u001b[1;32m 23\u001b[0m as_tensor_variable,\n\u001b[1;32m 24\u001b[0m get_scalar_constant_value,\n\u001b[1;32m 25\u001b[0m patternbroadcast,\n\u001b[1;32m 26\u001b[0m )\n\u001b[1;32m 27\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnnet\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mabstract_conv\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m get_conv_output_shape, get_conv_shape_1axis\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/blas.py:163\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 161\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mscalar\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;28mbool\u001b[39m \u001b[38;5;28;01mas\u001b[39;00m bool_t\n\u001b[1;32m 162\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m basic \u001b[38;5;28;01mas\u001b[39;00m tt\n\u001b[0;32m--> 163\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblas_headers\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m blas_header_text, blas_header_version\n\u001b[1;32m 164\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mopt\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m in2out, local_dimshuffle_lift\n\u001b[1;32m 165\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtheano\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtensor\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtype\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m values_eq_approx_remove_inf_nan\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/tensor/blas_headers.py:1016\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 997\u001b[0m header \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m textwrap\u001b[38;5;241m.\u001b[39mdedent(\n\u001b[1;32m 998\u001b[0m \u001b[38;5;124;03m\"\"\"\\\u001b[39;00m\n\u001b[1;32m 999\u001b[0m \u001b[38;5;124;03m static float sdot_(int* Nx, float* x, int* Sx, float* y, int* Sy)\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1010\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m 1011\u001b[0m )\n\u001b[1;32m 1013\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m header \u001b[38;5;241m+\u001b[39m blas_code\n\u001b[0;32m-> 1016\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mblas__ldflags\u001b[49m:\n\u001b[1;32m 1017\u001b[0m _logger\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUsing NumPy C-API based implementation for BLAS functions.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 1020\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mmkl_threads_text\u001b[39m():\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/configparser.py:358\u001b[0m, in \u001b[0;36mConfigParam.__get__\u001b[0;34m(self, cls, type_, delete_key)\u001b[0m\n\u001b[1;32m 356\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m:\n\u001b[1;32m 357\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m callable(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdefault):\n\u001b[0;32m--> 358\u001b[0m val_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdefault\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 359\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 360\u001b[0m val_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdefault\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/theano/link/c/cmodule.py:2621\u001b[0m, in \u001b[0;36mdefault_blas_ldflags\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2617\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 2618\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(numpy\u001b[38;5;241m.\u001b[39mdistutils, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__config__\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m numpy\u001b[38;5;241m.\u001b[39mdistutils\u001b[38;5;241m.\u001b[39m__config__:\n\u001b[1;32m 2619\u001b[0m \u001b[38;5;66;03m# If the old private interface is available use it as it\u001b[39;00m\n\u001b[1;32m 2620\u001b[0m \u001b[38;5;66;03m# don't print information to the user.\u001b[39;00m\n\u001b[0;32m-> 2621\u001b[0m blas_info \u001b[38;5;241m=\u001b[39m \u001b[43mnumpy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdistutils\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__config__\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mblas_opt_info\u001b[49m\n\u001b[1;32m 2622\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 2623\u001b[0m \u001b[38;5;66;03m# We do this import only here, as in some setup, if we\u001b[39;00m\n\u001b[1;32m 2624\u001b[0m \u001b[38;5;66;03m# just import theano and exit, with the import at global\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2630\u001b[0m \u001b[38;5;66;03m# This happen with Python 2.7.3 |EPD 7.3-1 and numpy 1.8.1\u001b[39;00m\n\u001b[1;32m 2631\u001b[0m \u001b[38;5;66;03m# isort: off\u001b[39;00m\n\u001b[1;32m 2632\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mnumpy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdistutils\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01msystem_info\u001b[39;00m \u001b[38;5;66;03m# noqa\u001b[39;00m\n", "\u001b[0;31mAttributeError\u001b[0m: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'" ] } ], "source": [ "%matplotlib inline\n", "import theano \n", "import pymc3 as pm\n", "import sklearn\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from warnings import filterwarnings\n", "filterwarnings('ignore')\n", "sns.set_style('white')\n", "from sklearn import datasets\n", "from sklearn.preprocessing import scale\n", "from sklearn.cross_validation import train_test_split\n", "from sklearn.datasets import make_moons\n", "\n", "X, Y = make_moons(noise=0.2, random_state=0, n_samples=1000)\n", "X = scale(X)\n", "X = X.astype(floatX)\n", "Y = Y.astype(floatX)\n", "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.5)\n", "\n", "fig, ax = plt.subplots(figsize=(12, 8))\n", "ax.scatter(X[Y==0, 0], X[Y==0, 1], label='Class 0')\n", "ax.scatter(X[Y==1, 0], X[Y==1, 1], color='r', label='Class 1')\n", "sns.despine(); ax.legend()\n", "ax.set(xlabel='X', ylabel='Y', title='Toy binary classification data set');" ] }, { "cell_type": "markdown", "id": "a52c9f21", "metadata": {}, "source": [ "### 2.2 模型定义\n", "\n", "该模型对应的神经网络非常简单。基本单元是一个 `logistic 回归` 的感知机。我们并行使用多个感知机,然后将它们堆叠起来以获得隐藏层。此处将使用 2 个隐藏层,每个隐藏层有 5 个神经元,这足以解决本例的简单问题。" ] }, { "cell_type": "code", "execution_count": null, "id": "4c43c607", "metadata": {}, "outputs": [], "source": [ "def construct_nn(ann_input, ann_output):\n", " n_hidden = 5\n", " \n", " # Initialize random weights between each layer\n", " init_1 = np.random.randn(X.shape[1], n_hidden).astype(floatX)\n", " init_2 = np.random.randn(n_hidden, n_hidden).astype(floatX)\n", " init_out = np.random.randn(n_hidden).astype(floatX)\n", " \n", " with pm.Model() as neural_network:\n", " # Weights from input to hidden layer\n", " weights_in_1 = pm.Normal('w_in_1', 0, sd=1, \n", " shape=(X.shape[1], n_hidden), \n", " testval=init_1)\n", " \n", " # Weights from 1st to 2nd layer\n", " weights_1_2 = pm.Normal('w_1_2', 0, sd=1, \n", " shape=(n_hidden, n_hidden), \n", " testval=init_2)\n", " \n", " # Weights from hidden layer to output\n", " weights_2_out = pm.Normal('w_2_out', 0, sd=1, \n", " shape=(n_hidden,), \n", " testval=init_out)\n", " \n", " # Build neural-network using tanh activation function\n", " act_1 = pm.math.tanh(pm.math.dot(ann_input, \n", " weights_in_1))\n", " act_2 = pm.math.tanh(pm.math.dot(act_1, \n", " weights_1_2))\n", " act_out = pm.math.sigmoid(pm.math.dot(act_2, \n", " weights_2_out))\n", " \n", " # Binary classification -> Bernoulli likelihood\n", " out = pm.Bernoulli('out', \n", " act_out,\n", " observed=ann_output,\n", " total_size=Y_train.shape[0] # IMPORTANT for minibatches\n", " )\n", " return neural_network\n", "\n", "# Trick: Turn inputs and outputs into shared variables. \n", "# It's still the same thing, but we can later change the values of the shared variable \n", "# (to switch in the test-data later) and `PyMC3` will just use the new data. \n", "# Kind-of like a pointer we can redirect.\n", "# For more info, see: http://deeplearning.net/software/ `Theano` /library/compile/shared.html\n", "ann_input = `Theano` .shared(X_train)\n", "ann_output = `Theano` .shared(Y_train)\n", "neural_network = construct_nn(ann_input, ann_output)" ] }, { "cell_type": "markdown", "id": "264e449f", "metadata": {}, "source": [ "正态先验有助于正则化权重。通常我们会在输入中添加一个常量 `b`,但在这里省略了它以保持代码清晰。\n", "\n", "### 2.3 `变分推断`:提升模型的复杂性\n", "\n", "现在可以运行像 `NUTS` 这样的随机采样器,由于模型并不复杂,所以工作得很好,但随着将模型扩展到具有更多层的深度架构,随机方法将变得非常缓慢。此时,使用 OPVI 框架中的ADVI 自动变分推断算法,就会快得多,并可更好地扩展。\n", "\n", "```{admonition} 请注意\n", "ADVI 采用平均场近似,因此此处忽略了后验中的相关性。\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "e79e6069", "metadata": {}, "outputs": [], "source": [ "from PyMC3.Theanof import set_tt_rng, MRG_RandomStreams\n", "set_tt_rng(MRG_RandomStreams(42))" ] }, { "cell_type": "code", "execution_count": null, "id": "dae966ab", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "with neural_network:\n", " inference = pm.ADVI()\n", " approx = pm.fit(n=50000, method=inference)" ] }, { "cell_type": "markdown", "id": "921e03fd", "metadata": {}, "source": [ "下面为了让它真正快起来,我们可能希望在 `GPU` 上运行神经网络。\n", "\n", "由于样本更方便后续处理,因此可以从变分近似推断结果中采样(此处仅简单从推断得到得正态分布中采样),而这与 `MCMC` 方法完全不同:" ] }, { "cell_type": "code", "execution_count": null, "id": "dbb60dfe", "metadata": {}, "outputs": [], "source": [ "trace = approx.sample(draws=5000)" ] }, { "cell_type": "markdown", "id": "477dc18e", "metadata": {}, "source": [ "绘制目标函数(ELBO),可以看到随着时间推移,优化算法在逐渐改善拟合。" ] }, { "cell_type": "code", "execution_count": null, "id": "55001edf", "metadata": {}, "outputs": [], "source": [ "plt.plot(-inference.hist)\n", "plt.ylabel('ELBO')\n", "plt.xlabel('iteration');" ] }, { "cell_type": "markdown", "id": "f30ffe8a", "metadata": {}, "source": [ "现在已经训练了模型,可以使用后验预测检查来预测验证集。" ] }, { "cell_type": "code", "execution_count": null, "id": "40846dc0", "metadata": {}, "outputs": [], "source": [ "# Replace arrays our NN references with the test data\n", "ann_input.set_value(X_test)\n", "ann_output.set_value(Y_test)\n", "\n", "with neural_network:\n", " ppc = pm.sample_ppc(trace, samples=500, progressbar=False)\n", "\n", "# Use probability of > 0.5 to assume prediction of class 1\n", "pred = ppc['out'].mean(axis=0) > 0.5" ] }, { "cell_type": "markdown", "id": "7d08f810", "metadata": {}, "source": [ "看看预测结果:" ] }, { "cell_type": "code", "execution_count": null, "id": "3a68f04c", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1])\n", "ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r')\n", "sns.despine()\n", "ax.set(title='Predicted labels in testing set', xlabel='X', ylabel='Y');" ] }, { "cell_type": "code", "execution_count": null, "id": "8876b16f", "metadata": {}, "outputs": [], "source": [ "print('Accuracy = {}%'.format((Y_test == pred).mean() * 100))" ] }, { "cell_type": "markdown", "id": "05876c85", "metadata": {}, "source": [ "可以看出,我们的神经网络做得很好!\n", "\n", "\n", "### 2.4 分类器到底学到了什么?\n", "\n", "我们在整个输入空间的网格上评估类概率的预测。" ] }, { "cell_type": "code", "execution_count": null, "id": "80f6b0d8", "metadata": {}, "outputs": [], "source": [ "grid = pm.floatX(np.mgrid[-3:3:100j,-3:3:100j])\n", "grid_2d = grid.reshape(2, -1).T\n", "dummy_out = np.ones(grid.shape[1], dtype=np.int8)" ] }, { "cell_type": "code", "execution_count": null, "id": "f5bd6c92", "metadata": {}, "outputs": [], "source": [ "ann_input.set_value(grid_2d)\n", "ann_output.set_value(dummy_out)\n", "\n", "with neural_network:\n", " ppc = pm.sample_ppc(trace, samples=500, progressbar=False)" ] }, { "cell_type": "markdown", "id": "fb879d84", "metadata": {}, "source": [ "### 2.5 概率曲面" ] }, { "cell_type": "code", "execution_count": null, "id": "ad10ea09", "metadata": {}, "outputs": [], "source": [ "cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)\n", "fig, ax = plt.subplots(figsize=(14, 8))\n", "contour = ax.contourf(grid[0], grid[1], ppc['out'].mean(axis=0).reshape(100, 100), cmap=cmap)\n", "ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1])\n", "ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r')\n", "cbar = plt.colorbar(contour, ax=ax)\n", "_ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y');\n", "cbar.ax.set_ylabel('Posterior predictive mean probability of class label = 0');" ] }, { "cell_type": "markdown", "id": "f6266609", "metadata": {}, "source": [ "### 2.6 预测值的不确定性\n", "\n", "目前展示的一切,都可以用非贝叶斯神经网络来完成。每个类别标签的后验预测值的平均值应当与最大似然预测值相同。但,在贝叶斯神经网络中,我们还可以查看后验预测值的标准差,来了解预测的不确定性:" ] }, { "cell_type": "code", "execution_count": null, "id": "49f3ac25", "metadata": {}, "outputs": [], "source": [ "cmap = sns.cubehelix_palette(light=1, as_cmap=True)\n", "fig, ax = plt.subplots(figsize=(14, 8))\n", "contour = ax.contourf(grid[0], grid[1], ppc['out'].std(axis=0).reshape(100, 100), cmap=cmap)\n", "ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1])\n", "ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r')\n", "cbar = plt.colorbar(contour, ax=ax)\n", "_ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y');\n", "cbar.ax.set_ylabel('Uncertainty (posterior predictive standard deviation)');" ] }, { "cell_type": "markdown", "id": "57f07fd9", "metadata": {}, "source": [ "可以看到,在非常接近决策边界时,预测标签类别的不确定性最高。可以想象,将预测与不确定性关联起来是许多应用(如医疗保健)的关键特性。为进一步提高精度,我们可能希望主要根据高不确定性区域的样本来训练模型。\n", "\n", "### 2.7 小批量 ADVI\n", "\n", "由于训练数据集规模较小,前面采用的方法直接在所有数据上同时训练模型。但这显然无法扩展到类似 `ImageNet` 这种海量数据集上。此外,小批量数据训练的随机梯度下降方法可有效避免局部极小值,并更快的收敛。因此,贝叶斯神经网络应当具备支持小批量训练的能力。\n", "\n", "幸运的是,`ADVI` 支持小批量运行。只需做一些简单设置:" ] }, { "cell_type": "code", "execution_count": null, "id": "aa4f209d", "metadata": {}, "outputs": [], "source": [ "minibatch_x = pm.Minibatch(X_train, batch_size=32)\n", "minibatch_y = pm.Minibatch(Y_train, batch_size=32)\n", "\n", "neural_network_minibatch = construct_nn(minibatch_x, minibatch_y)\n", "with neural_network_minibatch:\n", " inference = pm.ADVI()\n", " approx = pm.fit(40000, method=inference)" ] }, { "cell_type": "code", "execution_count": null, "id": "9512da77", "metadata": {}, "outputs": [], "source": [ " plt.plot(-inference.hist)\n", " plt.ylabel('ELBO')\n", " plt.xlabel('iteration')" ] }, { "cell_type": "markdown", "id": "f015d531", "metadata": {}, "source": [ "小批量 `ADVI` 的运行时间短得多,其收敛得也更快,并且可以查看迹图。而最关键的是同时能够得到神经网络权重的不确定性度量。" ] }, { "cell_type": "code", "execution_count": null, "id": "ec9562fc", "metadata": {}, "outputs": [], "source": [ " pm.traceplot(trace)" ] }, { "cell_type": "markdown", "id": "2659a997", "metadata": {}, "source": [ "注意:\n", "\n", "```{note} \n", "您在 GPU 上运行上述示例,但需要在 .theanorc 文件中设置 device = GPU 和 floatX = float32 。您可能会说,上述模型并不是真正的深层网络,但实际上我们可以轻松地将其扩展到更多层(包括卷积层),以便在更具挑战性的数据集上进行训练。\n", "```\n", "\n", "我还在 `PyData London` 上介绍了一些这方面的工作,请观看以下 [视频](https://www.youtube.com/watch?v=LlzVlqVzeD8&feature=emb_imp_woyt)。此外,你可以从 [此处](https://github.com/twiecki/WhileMyMCMCGentlySamples/blob/master/content/downloads/notebooks/bayesian_neural_network.ipynb) 下载本例对应的 NoteBook。\n", "\n", "## 3 使用 PyMC3 和 Lasagne 构建更为复杂的分层神经网络\n", "\n", "`Lasagne` 是一个灵活的 `Theano` 库,用于构建各种类型的神经网络。`PyMC3` 也在使用 `Theano` ,因此将两者结合用于构建复杂的贝叶斯神经网络是可能的。可以利用 `Lasagne` 构建人工神经网络(`ANN`),并在参数上设置贝叶斯先验,然后利用 `PyMC3` 的 `ADVI 变分推断` 来估计模型。令人兴奋的是,这个想法不仅可能,而且非常直接。\n", "\n", "下面,首先展示如何集成 `PyMC3` 和 `Lasagne` ,以构建一个密集的 2 层人工神经网络;然后使用`小批量 ADVI` 在 `MNIST` 手写数字数据集上拟合模型;最后实现分层贝叶斯神经网络模型。由于 `Lasagne` 的强大能力,我们同样可以轻松构建具有最大池层的分层贝叶斯卷积神经网络,并在 `MNIST` 上实现 98% 的准确率。\n", "\n", "此处使用的大部分代码都是从 [`Lasagne` 教程](http:// `Lasagne` .readthedocs.io/en/latest/user/tutorial.html) 中借用而来。" ] }, { "cell_type": "code", "execution_count": null, "id": "9a37713c", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_style('white')\n", "sns.set_context('talk')\n", "\n", "import PyMC3 as pm\n", "import Theano.tensor as T\n", "import Theano \n", "from scipy.stats import mode, chisquare\n", "from sklearn.metrics import confusion_matrix, accuracy_score\n", "import Lasagne " ] }, { "cell_type": "markdown", "id": "8bf5558d", "metadata": {}, "source": [ "### 3.1 数据集:MNIST\n", "\n", "我们将使用手写数字的经典 MNIST 数据集。与我上一篇仅限于玩具数据集的博文相反,MNIST 实际上是一项具有挑战性的 ML 任务(当然不像 ImageNet 那样具有挑战性),具有合理数量的维度和数据点。" ] }, { "cell_type": "code", "execution_count": null, "id": "f1c384ce", "metadata": {}, "outputs": [], "source": [ "import sys, os\n", "\n", "def load_dataset():\n", " # We first define a download function, supporting both Python 2 and 3.\n", " if sys.version_info[0] == 2:\n", " from urllib import urlretrieve\n", " else:\n", " from urllib.request import urlretrieve\n", "\n", " def download(filename, source='http://yann.lecun.com/exdb/mnist/'):\n", " print(\"Downloading %s\" % filename)\n", " urlretrieve(source + filename, filename)\n", "\n", " # We then define functions for loading MNIST images and labels.\n", " # For convenience, they also download the requested files if needed.\n", " import gzip\n", "\n", " def load_mnist_images(filename):\n", " if not os.path.exists(filename):\n", " download(filename)\n", " # Read the inputs in Yann LeCun's binary format.\n", " with gzip.open(filename, 'rb') as f:\n", " data = np.frombuffer(f.read(), np.uint8, offset=16)\n", " # The inputs are vectors now, we reshape them to monochrome 2D images,\n", " # following the shape convention: (examples, channels, rows, columns)\n", " data = data.reshape(-1, 1, 28, 28)\n", " # The inputs come as bytes, we convert them to float32 in range [0,1].\n", " # (Actually to range [0, 255/256], for compatibility to the version\n", " # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)\n", " return data / np.float32(256)\n", "\n", " def load_mnist_labels(filename):\n", " if not os.path.exists(filename):\n", " download(filename)\n", " # Read the labels in Yann LeCun's binary format.\n", " with gzip.open(filename, 'rb') as f:\n", " data = np.frombuffer(f.read(), np.uint8, offset=8)\n", " # The labels are vectors of integers now, that's exactly what we want.\n", " return data\n", "\n", " # We can now download and read the training and test set images and labels.\n", " X_train = load_mnist_images('train-images-idx3-ubyte.gz')\n", " y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')\n", " X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')\n", " y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')\n", "\n", " # We reserve the last 10000 training examples for validation.\n", " X_train, X_val = X_train[:-10000], X_train[-10000:]\n", " y_train, y_val = y_train[:-10000], y_train[-10000:]\n", "\n", " # We just return all the arrays in order, as expected in main().\n", " # (It doesn't matter how we do this as long as we can read them again.)\n", " return X_train, y_train, X_val, y_val, X_test, y_test\n", "\n", "print(\"Loading data...\")\n", "X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()" ] }, { "cell_type": "code", "execution_count": null, "id": "82fbb783", "metadata": {}, "outputs": [], "source": [ "# Building a `Theano` .shared variable with a subset of the data to make construction of the model faster.\n", "# We will later switch that out, this is just a placeholder to get the dimensionality right.\n", "input_var = `Theano` .shared(X_train[:500, ...].astype(np.float64))\n", "target_var = `Theano` .shared(y_train[:500, ...].astype(np.float64))" ] }, { "cell_type": "markdown", "id": "4fe44901", "metadata": {}, "source": [ "### 3.2 模型定义\n", "\n", "我认为,仅仅因为意大利宽面条和 `PyMC3` 都依赖于西亚诺,就有可能将它们连接起来。然而,不清楚这到底有多困难。幸运的是,第一个实验效果很好,但有一些潜在的方法可以使这变得更容易。我在 `Lasagne` 的回购协议上发行了一期 GitHub,几天后,PR695 被合并,这使得两者能够更好地融合,如下所示。OSS 万岁。\n", "\n", "首先, `Lasagne` 函数用于创建一个 ANN,其中有两个完全连接的隐藏层,每个层有 800 个神经元,这是纯 `Lasagne` 代码,几乎直接取自教程。在用 `Lasagne` 创建图层时会用到技巧。层。DenseLayer,在这里我们可以传入一个函数 init,它必须返回一个 `Theano` 表达式作为权重和偏差矩阵。这就是我们将传递 `PyMC3` 创建的优先级的地方,这些优先级也是无表达式:" ] }, { "cell_type": "code", "execution_count": null, "id": "3f42a5a0", "metadata": {}, "outputs": [], "source": [ "def build_ann(init):\n", " l_in = `Lasagne` .layers.InputLayer(shape=(None, 1, 28, 28),\n", " input_var=input_var)\n", "\n", " # Add a fully-connected layer of 800 units, using the linear rectifier, and\n", " # initializing weights with Glorot's scheme (which is the default anyway):\n", " n_hid1 = 800\n", " l_hid1 = `Lasagne` .layers.DenseLayer(\n", " l_in, num_units=n_hid1,\n", " nonlinearity= `Lasagne` .nonlinearities.tanh,\n", " b=init,\n", " W=init\n", " )\n", "\n", " n_hid2 = 800\n", " # Another 800-unit layer:\n", " l_hid2 = `Lasagne` .layers.DenseLayer(\n", " l_hid1, num_units=n_hid2,\n", " nonlinearity= `Lasagne` .nonlinearities.tanh,\n", " b=init,\n", " W=init\n", " )\n", "\n", " # Finally, we'll add the fully-connected output layer, of 10 softmax units:\n", " l_out = `Lasagne` .layers.DenseLayer(\n", " l_hid2, num_units=10,\n", " nonlinearity= `Lasagne` .nonlinearities.softmax,\n", " b=init,\n", " W=init\n", " )\n", " \n", " prediction = `Lasagne` .layers.get_output(l_out)\n", " \n", " # 10 discrete output classes -> `PyMC3` categorical distribution\n", " out = pm.Categorical('out', \n", " prediction,\n", " observed=target_var)\n", " \n", " return out" ] }, { "cell_type": "markdown", "id": "c5c5b3a6", "metadata": {}, "source": [ " 接下来是为 ANN 创建权重的函数。因为 `PyMC3` 要求每个随机变量都有不同的名称,所以我们将创建一个类来创建唯一命名的优先级。\n", "\n", " 先验值在这里充当正则化器,以尝试保持 ANN 的权重较小。这在数学上相当于将惩罚较大权重的 L2 损失项放入目标函数中,就像通常所做的那样。" ] }, { "cell_type": "code", "execution_count": null, "id": "5812e58a", "metadata": {}, "outputs": [], "source": [ "class GaussWeights(object):\n", "def __init__(self):\n", " self.count = 0\n", "def __call__(self, shape):\n", " self.count += 1\n", " return pm.Normal('w%d' % self.count, mu=0, sd=.1, \n", " testval=np.random.normal(size=shape).astype(np.float64),\n", " shape=shape)" ] }, { "cell_type": "markdown", "id": "c00bc3dc", "metadata": {}, "source": [ " 如果你把我们到目前为止所做的与之前的博客文章相比较,很明显,使用 `Lasagne` 要舒服得多。我们不必手动跟踪单个矩阵的形状,也不必处理底层矩阵的数学运算以使其全部匹配在一起。\n", "\n", " 接下来是一些设置 mini-batch-ADVI 的函数,您可以在 [前面的博文](https://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/) 中找到更多信息。" ] }, { "cell_type": "code", "execution_count": null, "id": "be045fa2", "metadata": {}, "outputs": [], "source": [ "from six.moves import zip\n", "\n", "# Tensors and RV that will be using mini-batches\n", "minibatch_tensors = [input_var, target_var]\n", "\n", "# Generator that returns mini-batches in each iteration\n", "def create_minibatch(data, batchsize=500):\n", " \n", " rng = np.random.RandomState(0)\n", " start_idx = 0\n", " while True:\n", " # Return random data samples of set size batchsize each iteration\n", " ixs = rng.randint(data.shape[0], size=batchsize)\n", " yield data[ixs]\n", "\n", "minibatches = zip(\n", " create_minibatch(X_train, 500),\n", " create_minibatch(y_train, 500),\n", ")\n", "\n", "total_size = len(y_train)\n", "\n", "def run_advi(likelihood, advi_iters=50000):\n", " # Train on train data\n", " input_var.set_value(X_train[:500, ...])\n", " target_var.set_value(y_train[:500, ...])\n", " \n", " v_params = pm.variational.advi_minibatch(\n", " n=advi_iters, minibatch_tensors=minibatch_tensors, \n", " minibatch_RVs=[likelihood], minibatches=minibatches, \n", " total_size=total_size, learning_rate=1e-2, epsilon=1.0\n", " )\n", " trace = pm.variational.sample_vp(v_params, draws=500)\n", " \n", " # Predict on test data\n", " input_var.set_value(X_test)\n", " target_var.set_value(y_test)\n", " \n", " ppc = pm.sample_ppc(trace, samples=100)\n", " y_pred = mode(ppc['out'], axis=0).mode[0, :]\n", " \n", " return v_params, trace, ppc, y_pred" ] }, { "cell_type": "markdown", "id": "db1a3f66", "metadata": {}, "source": [ "### 3.3 模型训练\n", "\n", "让我们使用`小批量 ADVI` 运行模型:" ] }, { "cell_type": "code", "execution_count": null, "id": "4e237a2f", "metadata": {}, "outputs": [], "source": [ "with pm.Model() as neural_network:\n", " likelihood = build_ann(GaussWeights())\n", " v_params, trace, ppc, y_pred = run_advi(likelihood)" ] }, { "cell_type": "markdown", "id": "99020958", "metadata": {}, "source": [ "确保所有内容都收敛:" ] }, { "cell_type": "code", "execution_count": null, "id": "cc9e116c", "metadata": {}, "outputs": [], "source": [ "plt.plot(v_params.elbo_vals[10000:])\n", "sns.despine()" ] }, { "cell_type": "code", "execution_count": null, "id": "2270ae16", "metadata": {}, "outputs": [], "source": [ "sns.heatmap(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "id": "de28d519", "metadata": {}, "outputs": [], "source": [ "print('Accuracy on test data = {}%'.format(accuracy_score(y_test, y_pred) * 100))" ] }, { "cell_type": "markdown", "id": "0116d71c", "metadata": {}, "source": [ "性能并不是非常高,但嘿,它似乎真的起作用了。\n", "\n", "### 3.4 分层贝叶斯神经网络\n", "\n", "`L2 惩罚项`强度之前的重量标准偏差之间的联系引出了一个有趣的想法。上面我们刚刚修正了 sd=0。1 表示所有层,但第一层的值可能与第二层的值不同。也许是 0。1 开始时太小或太大。在贝叶斯建模中,在这样的情况下,通常只需放置超先验,然后从数据中学习要应用的最佳正则化。这使我们不用在代价高昂的超参数优化中调整该参数。有关分层建模的更多信息,请参阅我的 [另一篇博文](https://twiecki.io/blog/2016/07/05/bayesian-deep-learning/)。" ] }, { "cell_type": "code", "execution_count": null, "id": "a707bca3", "metadata": {}, "outputs": [], "source": [ "class GaussWeightsHierarchicalRegularization(object):\n", " def __init__(self):\n", " self.count = 0\n", " def __call__(self, shape):\n", " self.count += 1\n", " \n", " regularization = pm.HalfNormal('reg_hyper%d' % self.count, sd=1)\n", " \n", " return pm.Normal('w%d' % self.count, mu=0, sd=regularization, \n", " testval=np.random.normal(size=shape),\n", " shape=shape)\n", " \n", "minibatches = zip(\n", " create_minibatch(X_train, 500),\n", " create_minibatch(y_train, 500),\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "88df3075", "metadata": {}, "outputs": [], "source": [ "with pm.Model() as neural_network_hier:\n", " likelihood = build_ann(GaussWeightsHierarchicalRegularization())\n", " v_params, trace, ppc, y_pred = run_advi(likelihood)" ] }, { "cell_type": "code", "execution_count": null, "id": "7dff0f2c", "metadata": {}, "outputs": [], "source": [ "print('Accuracy on test data = {}%'.format(accuracy_score(y_test, y_pred) * 100))" ] }, { "cell_type": "markdown", "id": "486d37df", "metadata": {}, "source": [ "我们得到了一个小但很好的提高精度。让我们看看超参数的后验值:" ] }, { "cell_type": "code", "execution_count": null, "id": "8f0f19be", "metadata": {}, "outputs": [], "source": [ "pm.traceplot(trace, varnames=['reg_hyper1', 'reg_hyper2', 'reg_hyper3', 'reg_hyper4', 'reg_hyper5', 'reg_hyper6']);" ] }, { "cell_type": "markdown", "id": "94e274dd", "metadata": {}, "source": [ "有趣的是,它们都非常不同,这表明改变在网络的每一层应用的正则化量是有意义的。\n", "\n", "### 3.5 卷积神经网络的贝叶斯模型\n", "\n", "这很好,但正如我在上一篇文章中所展示的,到目前为止,在 `PyMC3` 中直接实现的一切都非常简单。真正有趣的是,我们现在可以构建更复杂的人工神经网络,比如卷积神经网络:" ] }, { "cell_type": "code", "execution_count": null, "id": "a2a8b657", "metadata": {}, "outputs": [], "source": [ "def build_ann_conv(init):\n", " network = `Lasagne` .layers.InputLayer(shape=(None, 1, 28, 28),\n", " input_var=input_var)\n", "\n", " network = `Lasagne` .layers.Conv2DLayer(\n", " network, num_filters=32, filter_size=(5, 5),\n", " nonlinearity= `Lasagne` .nonlinearities.tanh,\n", " W=init)\n", "\n", " # Max-pooling layer of factor 2 in both dimensions:\n", " network = `Lasagne` .layers.MaxPool2DLayer(network, pool_size=(2, 2))\n", "\n", " # Another convolution with 32 5x5 kernels, and another 2x2 pooling:\n", " network = `Lasagne` .layers.Conv2DLayer(\n", " network, num_filters=32, filter_size=(5, 5),\n", " nonlinearity= `Lasagne` .nonlinearities.tanh,\n", " W=init)\n", " \n", " network = `Lasagne` .layers.MaxPool2DLayer(network, \n", " pool_size=(2, 2))\n", " \n", " n_hid2 = 256\n", " network = `Lasagne` .layers.DenseLayer(\n", " network, num_units=n_hid2,\n", " nonlinearity= `Lasagne` .nonlinearities.tanh,\n", " b=init,\n", " W=init\n", " )\n", "\n", " # Finally, we'll add the fully-connected output layer, of 10 softmax units:\n", " network = `Lasagne` .layers.DenseLayer(\n", " network, num_units=10,\n", " nonlinearity= `Lasagne` .nonlinearities.softmax,\n", " b=init,\n", " W=init\n", " )\n", " \n", " prediction = `Lasagne` .layers.get_output(network)\n", " \n", " return pm.Categorical('out', \n", " prediction,\n", " observed=target_var)" ] }, { "cell_type": "code", "execution_count": null, "id": "38abfda1", "metadata": {}, "outputs": [], "source": [ "print('Accuracy on test data = {}%'.format(accuracy_score(y_test, y_pred) * 100)) " ] }, { "cell_type": "markdown", "id": "44aad8f0", "metadata": {}, "source": [ "更高的准确度——很好。我也尝试了分层模型,但它的准确率较低(95%),我认为这是由于过度拟合。\n", "\n", "让我们更多地利用我们处于贝叶斯框架中的事实,探索我们预测中的不确定性。由于我们的预测是分类的,我们不能简单地计算后验预测标准差。相反,我们计算卡方统计量,它告诉我们样本是多么均匀。越是统一,我们的不确定性就越高。我不太确定这是否是最好的方法,如果有一个更成熟的方法我不知道,请留下评论。\n", "\n", "正如我们所见,当模型出错时,答案的不确定性要大得多(即提供的答案更加一致)。你可能会说,你从一个常规的人工神经网络得到的多项式预测得到了同样的效果,然而,[事实并非如此](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html)。" ] }, { "cell_type": "code", "execution_count": null, "id": "553cdee0", "metadata": {}, "outputs": [], "source": [ "miss_class = np.where(y_test != y_pred)[0]\n", "corr_class = np.where(y_test == y_pred)[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "74b9fd4d", "metadata": {}, "outputs": [], "source": [ "preds = pd.DataFrame(ppc['out']).T" ] }, { "cell_type": "code", "execution_count": null, "id": "311c4f92", "metadata": {}, "outputs": [], "source": [ "chis = preds.apply(lambda x: chisquare(x).statistic, axis='columns')" ] }, { "cell_type": "code", "execution_count": null, "id": "203e3f10", "metadata": {}, "outputs": [], "source": [ "sns.distplot(chis.loc[miss_class].dropna(), label='Error')\n", "sns.distplot(chis.loc[corr_class].dropna(), label='Correct')\n", "plt.legend()\n", "sns.despine()\n", "plt.xlabel('Chi-Square statistic');" ] }, { "cell_type": "markdown", "id": "dc2f2832", "metadata": {}, "source": [ "### 3.6 结论\n", "\n", "通过连接 `Lasagne` 和 `PyMC3` ,并使用 mini-batch ADVI 在大小适中的复杂数据集(MNIST)上训练贝叶斯神经网络,我们朝着实际问题的贝叶斯深度学习迈出了一大步。\n", "\n", " `Lasagne` 开发人员设计了 API,使得为这个不常见的应用程序集成变得微不足道,这是他们的荣幸。他们也非常乐于助人,乐于助人。\n", "\n", "最后,我还认为这显示了 `PyMC3` 的好处。通过依赖一种常用语言(Python)和抽象计算后端( `Theano` ),我们能够非常轻松地利用该生态系统的强大功能,并以创建 `PyMC3` 时从未想到的方式使用 `PyMC3` 。我期待着将其扩展到新的领域。\n", "\n", "这篇博文写在一本 Jupyter 笔记本上。你可以在这里访问 [笔记本](https://github.com/twiecki/twiecki.github.io/blob/master/downloads/notebooks/bayesian_neural_network_ `Lasagne` .ipynb),并在 Twitter 上关注 [我](https://twitter.com/twiecki) 以了解最新情况。\n", "\n", "## 4 致谢\n", "\n", "`Taku Yoshioka` 在 `PyMC3` 中对最初的 ADVI 实现做了大量工作。我还要感谢斯坦兄弟(特别是阿尔普·库库克尔比尔和丹尼尔·李)推导出了 ADVI 并教给我们关于它的知识。还要感谢克里斯·丰内斯贝克、安德鲁·坎贝尔、吉冈拓谷和皮达尔·科伊尔对早期草稿的有用评论" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md:myst", "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.11.5" } }, "kernelspec": { "display_name": "Python 3", "language": "python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "source_map": [ 13, 75, 102, 108, 157, 169, 174, 180, 186, 188, 192, 196, 200, 210, 214, 222, 224, 233, 239, 245, 249, 258, 264, 273, 283, 293, 297, 301, 303, 321, 336, 342, 401, 406, 414, 454, 460, 469, 475, 518, 524, 528, 532, 537, 541, 543, 551, 570, 576, 578, 582, 584, 592, 637, 639, 647, 652, 656, 660, 666 ] }, "nbformat": 4, "nbformat_minor": 5 }