点参考数据的贝叶斯建模软件spBayes

原文： Finley, A. O., Banerjee, S., & E.Gelfand, A. (2015). SpBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models. Journal of Statistical Software, 63(13). https://doi.org/10.18637/jss.v063.i13

Andrew O. Finley，密歇根州立大学
Sudipto Banerjee，加州大学洛杉矶分校
Alan E. Gelfand，杜克大学

1 模型框架的定义

贝叶斯高斯空间回归模型是一个分层建模框架：

p(\boldsymbol{\theta}) \times \mathcal{N}(\boldsymbol{\beta} | \boldsymbol{\mu}_{\beta},\Sigma_{\beta}) \times \mathcal{N}(\boldsymbol{\alpha} | 0, \boldsymbol{K}(\boldsymbol{\theta})) \times \mathcal{N} (\boldsymbol{y} | \boldsymbol{X \beta} + \boldsymbol{Z}(\boldsymbol{\theta}) \boldsymbol{\alpha}, \boldsymbol{D}(\boldsymbol{\theta}) ) \tag{1}

上式从右往左看：

$\\boldsymbol{y}$ 是一个 $n \times 1$ 维的观测向量（可能分布并不规则）
$\boldsymbol{X}$ 是已知的 $n \times p$ 维回归变量矩阵（通常 $p < n$ ，否则无法得出最优解）
$\boldsymbol{K}(\boldsymbol{\theta})$ 是由空间过程参数 $\boldsymbol{\theta}$ 索引（或定义）的随机效应协方差矩阵，其大小为是 $r \times r$
$\boldsymbol{D}(\boldsymbol{\theta})$ 是由空间过程参数 $\boldsymbol{\theta}$ 索引（或定义）的白噪声协方差矩阵，其大小为 $n \leq n$
$\boldsymbol{Z}(\boldsymbol{\theta})$ 是一个由空间过程参数 $\boldsymbol{\theta}$ 索引（或定义）的 $n \times r$ 降秩矩阵，其中 $r \leq n$ 为秩
随机向量 $\boldsymbol{\alpha} \sim \mathcal{N}(\boldsymbol{0},\boldsymbol{K}(\boldsymbol{\theta}))$ 为一个 $r \times 1$ 的向量，服从均值为 $0$ 的多元高斯分布
斜率 $\boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu}_{\beta},\boldsymbol{\Sigma}_{\beta})$ 是一个 $p \times 1$ 的向量，其中均值向量 $\boldsymbol{\mu}_{\beta}$ 和协方差矩阵 $\boldsymbol{\Sigma}_{\beta}$ ，服从多元高斯分布
空间过程参数 $\boldsymbol{\theta}$ 的先验为 $\boldsymbol{\theta} \sim p(\boldsymbol{theta})$

模型解释：

$\boldsymbol{y,\beta,\alpha,\theta}$ 是模型中的随机变量，其中 $\boldsymbol{y}$ 为数据，通常被假设为高斯分布（但不限于），但模型参数 $\boldsymbol{\beta,\alpha,\theta}$ 大部分情况下被假设为高斯分布，也是模型推断的主要对象。
$\boldsymbol{K}(\boldsymbol{\theta})，\boldsymbol{D}(\boldsymbol{\theta})，\boldsymbol{Z}(\boldsymbol{\theta})$ 可以被视为在指定参数 $\boldsymbol{\theta}$ 时的某种确定性运算结果，而不是随机变量。
之所以被称为分层模型，是因为随机效应变量 $\boldsymbol{\alpha}$ 同样被 $\boldsymbol{\theta}$ 参数化，而参数 $\boldsymbol{\theta}$ 自身也被视为随机变量
注意：模型中并没有看到空间信息的任何表示，Why？因为空间信息被隐含在随机效应的协方差矩阵定义中了，通常和距离有关。

2 模型的任务

根据观测数据 $\boldsymbol{y}$ 推断参数 $\{ \boldsymbol{\beta,\alpha,\theta} \}$ 的概率分布，然后给出新观测 $y^*$ 预测值。

也就是说，预测任务涉及从 $\{ \boldsymbol{\beta,\alpha,\theta} \}$ 的后验（边缘）分布中采样，直接计算算法复杂度比较高，通常需要对稠密矩阵求逆和做乘法，还需要计算矩阵行列式。在软件开发中，需要注意避免冗余操作并确保数值稳定性。因此，在随后的部分中，我们将描述如何使用 Cholesky 分解、求解三角系统以及最小化昂贵的矩阵运算（例如，密集矩阵乘法）来执行所有计算。

3 模型的推断

过程参数 $\boldsymbol{\theta}$ 的采样

从 公式 1 中采样，可以采用 MCMC 方法，尤其是吉布斯采样和随机游走 Metropolis 相结合。

why not NUTS， NUTS has been proposed in 2011。

斜率 $\boldsymbol{\beta}$ 和随机效应 $\boldsymbol{\alpha}$ 的采样

4 模型的实例

下面，在上述贝叶斯高斯空间回归总体框架下，介绍几种比较常见的具体模型。首先设定一些基础约束：

白噪声矩阵 $\boldsymbol{D}(\boldsymbol{\theta})$ 总是被认为是对角线或块对角线（对于多元模型）。
假设空间随机效应 $\boldsymbol{\alpha}$ 来自空间过程的部分实现，空间协方差矩阵 $\boldsymbol{K}(\boldsymbol{\theta})$ 由协方差函数构造指定该空间过程。准确地说，如果 $\left\{w(s): s \in \Re^d\right\}$ 是具有正定协方差函数的高斯空间过程 $C(\boldsymbol{s}, \boldsymbol{t } ; \boldsymbol{\theta})$ （参见，例如，Bochner 1955）如果 $\left\{\boldsymbol{s}_1, \boldsymbol{s}_2, \ldots, \boldsymbol{s}_r\right \}$ 是 $\mathcal{D}$ 中任意 $r$ 个位置的集合，则 $\boldsymbol{\alpha}=\left(w\left(\boldsymbol{s}_1\right), w\left (\boldsymbol{s}_2\right), \ldots, w\left(\boldsymbol{s}_r\right)\right)^{\top}$ 和 $\boldsymbol{K}(\boldsymbol{\theta} )$ 是它的 $r \times r$ 协方差矩阵。

4.1 单变量（满秩）高斯空间回归模型（Gaussian Spatial Regression）

地统计模型通常用有空间参考的预测变量 $\boldsymbol{x}(\boldsymbol{s})$ （一个 $p \times 1$ 的向量）来回归相应空间参考的响应变量 $y(\boldsymbol{s})$ 为：

y(\boldsymbol{s})=\boldsymbol{x}(\boldsymbol{s})^{\top} \boldsymbol{\beta}+w(\boldsymbol{s})+\varepsilon(\boldsymbol{s} ) \tag{12}

其中:

$s \in \mathcal{D} \subseteq \Re^2$ 是一个位置。
残差包括两部分，其中一个是空间过程 $w(s)$ ，另外一个是独立的白噪声过程 $\varepsilon(s)$ 。
白噪声过程 $\varepsilon(s)$ 捕获测量误差或细微的尺度变化。对于任何 $n$ 个位置的集合，比如 $\mathcal{S}=\left\{\boldsymbol{s}_1, \ldots, \boldsymbol{s}_n\right\}$ ，我们假设 $\varepsilon\left(s_i\right)$ 独立同分布且服从正态分布 $N\left(0, \tau^2\right)$ ，其中 $\tau^2$ 被称为块金。
空间过程 $w\left(\boldsymbol{s}_i\right)$ 提供对均值的局部调整（具有结构化的相关性）并通过空间模式来捕获不可测量或不可观测回归量的随机效应。

常用设置：

平稳性假设：通常假设空间过程是平稳的。这意味着相关性 $C(\boldsymbol{s}, \boldsymbol{t})=C(\boldsymbol{s}-\boldsymbol{t})$ ，即相关性是站点之间距离的函数。
各向同性假设：更进一步，指定 $C(\boldsymbol{s}, \boldsymbol{t})=C(\|s-\boldsymbol{t}\|)$ ，其中 $|s-\boldsymbol{t}| $ 是站点 $s$ 和 $\boldsymbol{t}$ 之间的欧几里得距离。
相关函数：我们进一步依据空间过程参数来指定相关性的参数化形式为 $C(\boldsymbol{s}, \boldsymbol{t})=$ $\sigma^2 \rho(\boldsymbol{s}, \boldsymbol{t} ; \boldsymbol{\phi})$ ，其中 $\boldsymbol{\phi}$ ， $\rho(\cdot ; \boldsymbol{\phi})$ 是相关函数， $\phi$ 包括量化衰减率和 $w(s)$ 表面平滑度的参数。 $\operatorname{Var}(w(s))=\sigma^2$ 表示一个空间方差分量。除了常采用的外形式为 $\rho(\boldsymbol{s}, \boldsymbol{t} ; \boldsymbol{\phi})=\exp (-\phi\|\boldsymbol{s}-\boldsymbol{t}\| )$ 的指数族函数，和形式为 $\rho(\boldsymbol{s}, \boldsymbol{t} ; \boldsymbol{\phi})=\exp \left(-\phi\|\boldsymbol{s}-\ boldsymbol{t}\|^\alpha\right)$ 的幂指数族函数外，还可以采用如下的 Matérn 相关函数：

\rho(\|s-\boldsymbol{t}\| ; \phi)=\frac{1}{2^{\nu-1} \Gamma(\nu)}(\|s-\boldsymbol{t} \| \phi)^\nu \mathcal{K}_\nu(\|s-\boldsymbol{t}\| \phi) ; \quad \phi>0, \nu>0 \tag{13}

其中 $\phi=\{\phi, \nu\}$ ， $\phi$ 控制空间相关性的衰减率， $\nu$ 控制空间过程的平滑度。具体来说，如果 $\nu$ 介于正整数 $m$ 和 $(m+1)$ 之间，则空间过程 $w(\boldsymbol{s})$ 是 $m$ 次但不是 $ m+1$ 次均方可微的。此外， $\Gamma$ 为常用的 Gamma 函数，而 $\mathcal{K}_\nu$ 是阶为 $\nu$ 的二类修正的贝塞尔函数。

从 (12) 构建的层次模型作为 (1) 的一个特例出现，其中：

$\boldsymbol{y}$ 是元素为 $y\left(\boldsymbol{s}_i\right)$ 的 $n \times 1$ 向量
$\boldsymbol{X}$ 是行为 $\boldsymbol{x}\left(\boldsymbol{s}_i\right)^{\top}$ 的 $n \times p$ 矩阵
$\boldsymbol{\alpha}$ 是元素为 $w\left(\boldsymbol{s}_i\right), \boldsymbol{Z}(\boldsymbol{\theta})=\boldsymbol{I}_n$ 的 $n \times 1$ 向量
$\boldsymbol{ K}(\boldsymbol{\theta})$ 是元素为 $C\left(\boldsymbol{s}_i, \boldsymbol{s}_j ; \boldsymbol{\theta}\right)$ 的 $n \times n$ 矩阵
$\boldsymbol{D}(\boldsymbol{\theta})=\tau^2 \boldsymbol{I}_n$

可以用 $\boldsymbol{\theta}$ 表示 $\boldsymbol{K}(\boldsymbol{\theta})$ 和 $\boldsymbol{D}(\boldsymbol{\theta})$ 中的过程参数集合。如果采用 公式 (13) 中的 Matérn 协方差函数，就可以得到 $\boldsymbol{\theta}=\left\{\sigma^2, \phi, \nu, \tau^2\right\}$ 。

4.2 单变量（低秩）高斯预测过程模型（Predictive Process Models）

可以由用户在 公式 (1) 的模型框架内选择和设定低秩模型，即 $r \ll n$ 。给定 第 4.1 节 相同的建模场景，但用户可以选择 $r$ 个位置， $\mathcal{S}^* = \{\boldsymbol{s}^*_1, \{\boldsymbol{s}^*_2, \ldots, \{\boldsymbol{s}^*_r\}$ ，同时将空间过程定义为：

\tilde{w}(\boldsymbol{s}) = \mathbb{E}[w(\boldsymbol{s}) \mid w(\boldsymbol{s}^*_i), i=1,2,\ldots,r] \tag{14}

[@Banerjee2008Gaussian] 将 ̃ $\tilde w()\boldsymbol{s}$ 称为预测过程。将 公式 (12) 中的 $w(\boldsymbol{s})$ 替换为 ̃ $\tilde w(\boldsymbol{s})$ 会产生一个单变量高斯空间回归模型的变种。

预测过程实质上产生了一个低秩模型，可以转换为 公式（1）。例如，如果将 $r × 1$ 的随机向量 $\boldsymbol{\alpha}$ 的元素取为 $w\left(s_i^*\right)$ ，则 公式 (12) 对应的预测过程变种可以从 公式 (1) 中获得，其中：

$\boldsymbol{D}(\boldsymbol{\theta})=\tau^2 \boldsymbol{I}$
$\boldsymbol{K}(\boldsymbol{\theta})=\boldsymbol{C}^*(\boldsymbol{\theta})$
$\boldsymbol{Z}(\boldsymbol{\theta})=\mathcal{C}(\boldsymbol{\theta})^{\top} \boldsymbol{C}^*(\boldsymbol{\theta})^{-1}$ , 其中 $\mathcal{C}(\boldsymbol{\theta})^{\top}$ 大小为 $n \times r$ ，其元素为 $w\left(\boldsymbol{s}_i\right)^{\text {'s }}$ 和 $w\left(\boldsymbol{s}_j^*\right)^{\prime}$ 之间的协方差，而 $\boldsymbol{C}^*(\boldsymbol{\theta})^{-1}$ 是 $w\left(\boldsymbol{s}_i^*\right)$ 的 $r \times r$ 协方差矩阵。

当使用第 2.3 节中描述的通用低秩模型计算策略时，可以等价地通过令 $\boldsymbol{K}(\boldsymbol{\theta})=C^*(\boldsymbol {\theta})^{-1}$ 和 $\boldsymbol{Z}(\boldsymbol{\theta})=\mathcal{C}(\boldsymbol{\theta})^{\top}$ 实现参数化。这可以避免计算 $C^*(\theta)^{-1}$ ，虽然该计算对于低秩模型来说可能并不昂贵，但它可能会根据协方差函数的不同选择而在数值上变得不稳定。现在 $\boldsymbol{\alpha} \sim N\left(\mathbf{0}, \boldsymbol{C}^*(\boldsymbol{\theta})^{-1}\right)$ 不再是在结点上的过程实现，但它仍然是满足合法概率定律的 $r \times 1$ 随机向量。如果需要结点上的空间效应，可以很容易地从 $\boldsymbol{\alpha}$ 和 $\boldsymbol{\theta}$ 的后验样本中获得 $\boldsymbol{C}^*(\boldsymbol{ \theta}) \boldsymbol{\alpha}$ .

我们还提供了对预测过程的改进，它试图通过调整残差方差来从低秩近似中捕获残差（例如，参见 Finley、Sang、Banerjee 和 Gelfand 2009）。满秩模型（公式 12）和低秩模型的空间协方差矩阵之间的差异是： $\boldsymbol{C}_w(\boldsymbol{\theta})-\boldsymbol{Z}(\boldsymbol{\theta} ) \boldsymbol{K}(\boldsymbol{\theta}) \boldsymbol{Z}(\boldsymbol{\theta})^{\top}$ ，其中 $\boldsymbol{C}_w(\boldsymbol{\theta}) $ 是 公式 (12) 的空间随机效应的 $n \times n$ 协方差矩阵。

修改后的预测过程模型通过将其对角元素吸收到 $\boldsymbol{D}(\boldsymbol{\theta})$ 中来近似这个“残差”协方差矩阵。因此， $\boldsymbol{D}(\boldsymbol{\theta})=\operatorname{diag}\left\{\boldsymbol{C}_w(\boldsymbol{\theta})-\boldsymbol{Z}(\boldsymbol{ \theta}) \boldsymbol{K}(\boldsymbol{\theta}) \boldsymbol{Z}(\boldsymbol{\theta})^{\top}\right\}+$ $\tau^2 \boldsymbol{I }_n$ ，其中 $\operatorname{diag}(\boldsymbol{A})$ 表示由 $\boldsymbol{A}$ 的对角元素形成的对角矩阵。 (1) 中 $\boldsymbol{Z}(\boldsymbol{\theta})、\boldsymbol{K}(\boldsymbol{\theta})$ 和 $\boldsymbol{\alpha}$ 的其余规范与对于预测过程。

我们经常将修改后的预测过程称为 $\tilde{w}_{\varepsilon}(s)=\tilde{w}(s)+\tilde{\epsilon}(s)$ ，其中 $\tilde{w }(s)$ 是预测过程， $\tilde{\epsilon}(s)$ 是一个均值和方差为零的独立过程，由 $\operatorname{var}\{w(s)\}-\operatorname{ 给出var}\{\tilde{w}(\boldsymbol{s})\}$ .就 $w(\boldsymbol{s})$ 的协方差函数而言， $\tilde{\epsilon}(\boldsymbol{s})$ 的方差为 $C(s, s ; \boldsymbol{\theta} )-\boldsymbol{c}(\boldsymbol{s}, \boldsymbol{\theta})^{\top} \boldsymbol{C}^*(\boldsymbol{\theta})^{-1} \boldsymbol{c }(\boldsymbol{s})$ ，其中 $\boldsymbol{c}(\boldsymbol{s})$ 是 $w(\boldsymbol{s})$ 和 $w 之间的协方差 $r \times 1$ 向量\left(s^* j\right)$ 作为它的条目。此外， $\boldsymbol{w}^*、\tilde{\boldsymbol{w}}$ 和 $\tilde{\boldsymbol{w}}_\epsilon$ 表示 $w\left(s_i^*\right )$ 在 $r$ 结上， $\tilde{w}\left(s_i\right)$ 在 $n$ 个位置和 $\tilde{w}_\epsilon\left(s_i\right) $ 分别在 $n$ 个位置上。

低秩模型的一个关键问题是结点的选择。给定一个计算上可行的 $r$ ，可以在域范围内的使用网格来固定结点位置、或采用空间覆盖设计、或采用以最小化预测方差为准则的更复杂方法。在实践中，如果观测的位置均匀分布在整个域中，我们发现基于使用网格、空间覆盖设计或其他准则选择的结点位置的推断差异相对较小。相反，结点位置的数量对参数估计和后续预测的影响更大。因此，我们经常在计算可行的范围内研究推断对不同结点强度的敏感性。

5 多变量高斯空间回归模型

Multivariate spatial regression models consider $m$ point-referenced outcomes that are regressed, at each location, on a known set of predictors

y_j(\boldsymbol{s})=\boldsymbol{x}_j(\boldsymbol{s})^{\top} \boldsymbol{\beta}_j+w_j(\boldsymbol{s})+\epsilon_j(\boldsymbol{s}), \text { for } j=1,2, \ldots, m,

where $\boldsymbol{x}_j(\boldsymbol{s})$ is a $p_j \times 1$ vector of predictors associated with outcome $j, \boldsymbol{\beta}_j$ is the $p_j \times 1$ slope, $w_j(s)$ and $\epsilon_j(s)$ are the spatial and random error processes associated with outcome $y_j(s)$ . Customarily, we assume the unstructured residuals $\varepsilon(s)=\left(\epsilon_1(\boldsymbol{s}), \epsilon_2(\boldsymbol{s}), \ldots, \epsilon_m(\boldsymbol{s})\right)^{\top}$ follow a zero-centered multivariate normal distribution with zero mean and an $m \times m$ dispersion matrix $\Psi$ .

Spatial variation is modeled using an $m \times 1$ Gaussian process $\boldsymbol{w}(\boldsymbol{s})=\left(w_1(\boldsymbol{s}), \ldots, w_m(\boldsymbol{s})\right)^{\top}$ , specified by a zero mean and a cross-covariance matrix $C_w(s, t)$ with entries being covariance between $w_i(\boldsymbol{s})$ and $w_j(\boldsymbol{t})$ . spBayes uses the linear model of coregionalization (LMC) to specify the cross-covariance. This assumes that $C_w(\boldsymbol{s}, \boldsymbol{t})=\boldsymbol{A} \boldsymbol{M}(\boldsymbol{s}, \boldsymbol{t}) \boldsymbol{A}^{\top}$ , where $\boldsymbol{A}$ is $m \times m$ lowertriangular and $\boldsymbol{M}(\boldsymbol{s}, \boldsymbol{t})$ is $m \times m$ diagonal with each diagonal entry a spatial correlation function endowed with its own set of process parameters.

Suppose we have observed the $m$ outcomes in each of $b$ locations. Let $\boldsymbol{y}$ be $n \times 1$ , where $n=m b$ , obtained by stacking up the $\boldsymbol{y}\left(\boldsymbol{s}_i\right)$ 's over the $b$ locations. Let $\boldsymbol{X}$ be the $n \times p$ matrix of predictors associated with $\boldsymbol{y}$ , where $p=\sum_{j=1}^m p_j$ , and $\boldsymbol{\beta}$ is $p \times 1$ with the $\boldsymbol{\beta}_j$ 's stacked correspondingly. Then, the hierarchical multivariate spatial regression models arise from (1) with the following specifications: $\boldsymbol{D}(\boldsymbol{\theta})=\boldsymbol{I}_b \otimes \boldsymbol{\Psi}, \boldsymbol{\alpha}$ is $n \times 1$ formed by stacking the $\boldsymbol{w}_i$ 's and $\boldsymbol{K}(\boldsymbol{\theta})$ is $n \times n$ partitioned into $m \times m$ blocks given by $\boldsymbol{A} \boldsymbol{M}\left(\boldsymbol{s}_i, \boldsymbol{s}_j\right) \boldsymbol{A}^{\top}$ . The positive-definiteness of $\boldsymbol{K}(\boldsymbol{\theta})$ is ensured by the linear model of coregionalization (Gelfand, Schmidt, Banerjee, and Sirmans 2004). spBayes also offers low rank multivariate models involving the predictive process and the modified predictive process that can be estimated using strategies analogous to Section 2.3. Both the full rank multivariate Gaussian model and its predictive process counterpart are implemented in the spMvLM function. Notation and additional background for fitting these models is given by Banerjee et al. (2008) and Finley et al. (2009) as well as example code in the spMvLM documentation examples.

6 非高斯模型

Two typical non-Gaussian first stage settings are implemented in spBayes: $i$ ) binary response at locations modeled using logit or probit regression, and; ii) count data at locations modeled using Poisson regression. Diggle, Moyeed, and Tawn (1998) unify the use of generalized linear models in spatial data contexts. See also Lin, Wahba, Xiang, Gao, Klein, and Klein (2000), Kammann and Wand (2003) and Banerjee et al. (2004). Here we replace the Gaussian likelihood in (1) with the assumption that $\mathrm{E}[y(s)]$ is linear on a transformed scale, i.e., $\eta(s) \equiv$ $g(\mathrm{E}(y(\boldsymbol{s})))=\boldsymbol{x}(\boldsymbol{s})^{\top} \boldsymbol{\beta}+w(\boldsymbol{s})$ where $g(\cdot)$ is a suitable link function. We refer to these as spatial generalized linear models (GLMs).

With the Gaussian first stage, we can marginalize over the spatial effects and implement our MCMC over a reduced parameter space. With a binary or Poisson first stage, such marginalization is precluded and we have to update the spatial effects in running our Gibbs sampler. We offer both the traditional random-walk Metropolis as well as the adaptive random-walk Metropolis (Roberts and Rosenthal 2009) to update the spatial effects. spBayes also provides low-rank predictive process versions for spatial GLMs. The analogue of (1) is

p(\boldsymbol{\theta}) \times N\left(\boldsymbol{\beta} \mid \boldsymbol{\mu}_\beta, \boldsymbol{\Sigma}_\beta\right) \times N(\boldsymbol{\alpha} \mid \mathbf{0}, \boldsymbol{K}(\boldsymbol{\theta})) \times \prod_{i=1}^n f\left(y\left(\boldsymbol{s}_i\right) \mid \eta\left(\boldsymbol{s}_i\right) \equiv \boldsymbol{x}\left(\boldsymbol{s}_i\right)^{\top} \boldsymbol{\beta}+\boldsymbol{z}_i(\boldsymbol{\theta})^{\top} \boldsymbol{\alpha}\right),

where $f(\cdot)$ represents a Bernoulli or Poisson density with $\eta(\boldsymbol{s})$ represents the mean of $y(\boldsymbol{s})$ on a transformed scale. This model and its predictive process counterpart is implemented in the spgLM function. These models are extended to accommodate multivariate settings, outlined in Section 5, using the spMvGLM function.

7 动态时空模型

There are many different flavors of spatio-temporal data and an extensive statistical literature that addresses the most common settings. The approach adopted here applies to the setting where space is viewed as continuous, but time is assumed to be discrete. Put another way, we view the data as a time series of spatial process realizations and work in the setting of dynamic models. Building upon previous work in the setting of dynamic models by West and Harrison (1997), several authors, including Stroud, Müler, and Sansó (2001) and Gelfand, Banerjee, and Gamerman (2005), proposed dynamic frameworks to model residual spatial and temporal dependence. These proposed frameworks are flexible and easily extended to accommodate nonstationary and multivariate outcomes.

Dynamic linear models, or state-space models, have gained tremendous popularity in recent years in fields as disparate as engineering, economics, genetics, and ecology. They offer a versatile framework for fitting several time-varying models (West and Harrison 1997). Gelfand et al. (2005) adapted the dynamic modeling framework to spatio-temporal models with spatially varying coefficients. Alternative adaptations of dynamic linear models to space-time data can be found in Stroud et al. (2001).

7.1. Model specification

spBayes offers a relatively simple version of the dynamic models in Gelfand et al. (2005). Suppose, $y_t(\boldsymbol{s})$ denotes the observation at location $s$ and time $t$ . We model $y_t(\boldsymbol{s})$ through a measurement equation that provides a regression specification with a space-time varying intercept and serially and spatially uncorrelated zero-centered Gaussian disturbances as measurement error $\epsilon_t(\boldsymbol{s})$ . Next a transition equation introduces a $p \times 1$ coefficient vector, say $\boldsymbol{\beta}_t$ , which is a purely temporal component (i.e., time-varying regression parameters), and a spatio-temporal component $u_t(\boldsymbol{s})$ . Both these are generated through transition equations, capturing their Markovian dependence in time. While the transition equation of the purely temporal component is akin to usual state-space modeling, the spatio-temporal component is generated using Gaussian spatial processes. The overall model is written as

\begin{aligned} y_t(\boldsymbol{s}) &=\boldsymbol{x}_t(\boldsymbol{s})^{\top} \boldsymbol{\beta}_t+u_t(\boldsymbol{s})+\epsilon_t(\boldsymbol{s}), \quad \epsilon_t(\boldsymbol{s}) \stackrel{i n d .}{\sim} N\left(0, \tau_t^2\right) \\ \boldsymbol{\beta}_t &=\boldsymbol{\beta}_{t-1}+\boldsymbol{\eta}_t, \quad \boldsymbol{\eta}_t \stackrel{i . i . d .}{\sim} N\left(0, \boldsymbol{\Sigma}_\eta\right) \\ u_t(\boldsymbol{s}) &=u_{t-1}(\boldsymbol{s})+w_t(\boldsymbol{s}), \quad w_t(\boldsymbol{s}) \stackrel{i n d .}{\sim} G P\left(\mathbf{0}, C_t\left(\cdot, \boldsymbol{\theta}_t\right)\right), \quad t=1,2, \ldots, N_t, \end{aligned}

where the abbreviations ind. and i.i.d are independent and independent and identically distributed, respectively. Here $\boldsymbol{x}_t(\boldsymbol{s})$ is a $p \times 1$ vector of predictors and $\boldsymbol{\beta}_t$ is a $p \times 1$ vector of coefficients. In addition to an intercept, $\boldsymbol{x}_t(\boldsymbol{s})$ can include location specific variables useful for explaining the variability in $y_t(\boldsymbol{s})$ . The $G P\left(\mathbf{0}, C_t\left(\cdot, \boldsymbol{\theta}_t\right)\right)$ denotes a spatial Gaussian process with covariance function $C_t\left(\cdot ; \boldsymbol{\theta}_t\right)$ . We customarily specify $C_t\left(\boldsymbol{s}_1, \boldsymbol{s}_2 ; \boldsymbol{\theta}_t\right)=\sigma_t^2 \rho\left(\boldsymbol{s}_1, \boldsymbol{s}_2 ; \phi_t\right)$ , where $\boldsymbol{\theta}_t=\left\{\sigma_t^2, \phi_t\right\}$ and $\rho(\cdot ; \phi)$ is a correlation function with $\phi$ controlling the correlation decay and $\sigma_t^2$ represents the spatial variance component. We further assume $\boldsymbol{\beta}_0 \sim N\left(\boldsymbol{m}_0, \boldsymbol{\Sigma}_0\right)$ and $u_0(s) \equiv 0$ , which completes the prior specifications leading to a well-identified Bayesian hierarchical model with reasonable dependence structures. In practice, estimation of model parameters are usually very robust to these hyper-prior specifications. Also note that (17) reduces to a simple spatial regression model for $t=1$ .

We consider settings where the inferential interest lies in spatial prediction or interpolation over a region for a set of discrete time points. We also assume that the same locations are monitored for each time point resulting in a space-time matrix whose rows index the locations and columns index the time points, i.e., the $(i, j)$ -th element is $y_j\left(s_i\right)$ . Our algorithm will accommodate the situation where some cells of the space-time data matrix may have missing observations, as is common in monitoring environmental variables.

Conducting full Bayesian inference for (17) is computationally onerous and spBayes also offers a modified predictive process counterpart of (17). This is achieved by replacing $u_t(\boldsymbol{s})$ in (17) with $\tilde{u}_t(\boldsymbol{s})=\sum_{k=1}^t\left[\tilde{w}_k(\boldsymbol{s})+\tilde{\epsilon}_k(\boldsymbol{s})\right]$ , where $\tilde{w}_k(\boldsymbol{s})$ is the predictive process as defined in (14) and the “adjustment” $\tilde{\epsilon}_t(s)$ compensates for the oversmoothing by the conditional expectation component and the consequent underestimation of spatial variability (see Finley, Banerjee, and Gelfand 2012) for details.