【阅读建议】 空间数据集通常被分为三种类型:面元数据、面元数据和点模式数据,本文重点介绍面元数据的形式化定义。

【引文信息】

按照惯例,通常将空间数据集分为三种基本类型:

  • 面元数据 (Point-referenced data)

    其中 $$Y(s)$$ 是位置 $$\mathbf{s} \in \mathit{R}^r$$ 处的随机向量,其中 $$\mathbf{s}$$ 在 $$\mathit{R}^r$$ 的一个固定子集 $$D$$ 上 连续变化,具有 $$r$$ 维矩形的正体积;

  • 面元数据(Areal data)

    其中 $$D$$ 依然是 $$\mathit{R}^r$$ 的一个固定子集,具有规则或不规则的形状,不过现在 $$D$$ 被 划分 为有限数量的、具有明确边界的面元;

  • 点模式数据(Point pattern data)

    $$D$$ 本身是随机的;其索引的集合 (Index set) 给出了作为空间点模式的随机事件的位置。 $$Y(s)$$ 本身对于所有 $$s \in D$$ 可以简单地等于 $$1$$ (表示事件是否发生),或者可能给出一些额外的协变量信息( 指带标记的点模式过程 )。

In models for areal data, the geographic regions or blocks (zip codes, counties, etc.) are denoted by $B_{i}$, and the data are typically sums or averages of variables over these blocks. To introduce spatial association, we define a neighborhood structure based on the arrangement of the blocks in the map. Once the neighborhood structure is defined, models resembling autoregressive time series models are considered. Two very popular models that incorporate such neighborhood information are the simultaneously and conditionally autoregressive models (abbreviated SAR and CAR), originally developed by Whittle (1954) and Besag (1974), respectively. The SAR model is computationally convenient for use with likelihood methods. By contrast, the CAR model is computationally convenient for Gibbs sampling used
6
OVERVIEW OF SPATIAL DATA PROBLEMS
in conjunction with Bayesian model fitting, and in this regard is often used to incorporate spatial correlation through a vector of spatially varying random effects $\phi=\left(\phi_{1}, \ldots, \phi_{n}\right)^{T}$. For example, writing $Y_{i} \equiv Y\left(B_{i}\right)$, we might assume $Y_{i} \stackrel{i n d}{\sim} N\left(\phi_{i}, \sigma^{2}\right)$, and then impose the CAR model
$$
\phi_{i} \mid \phi_{(-i)} \sim N\left(\mu+\sum_{j=1}^{n} a_{i j}\left(\phi_{j}-\mu\right), \tau_{i}^{2}\right)
$$
where $\phi_{(-i)}=\left{\phi_{j}: j \neq i\right}, \tau_{i}^{2}$ is the conditional variance, and the $a_{i j}$ are known or unknown constants such that $a_{i i}=0$ for $i=1, \ldots, n$. Letting $A=\left(a_{i j}\right)$ and $M=\operatorname{Diag}\left(\tau_{1}^{2}, \ldots, \tau_{n}^{2}\right)$, by Brook’s Lemma (c.f. Section 4.2), we can show that
$$
p(\phi) \propto \exp \left{-(\phi-\mu \mathbf{1})^{T} M^{-1}(I-A)(\phi-\mu \mathbf{1}) / 2\right}
$$
where 1 is an $n$-vector of 1 ‘s, and $I$ is a $n \times n$ identity matrix.
A common way to construct $A$ and $M$ is to let $A=\rho \operatorname{Diag}\left(1 / w_{i+}\right) W$ and $M^{-1}=$ $\tau^{-2} \operatorname{Diag}\left(w_{i+}\right)$. Here $\rho$ is referred to as the spatial correlation parameter, and $W=\left(w_{i j}\right)$ is a neighborhood matrix for the areal units, which can be defined as
$$
w_{i j}=\left{\begin{array}{ll}
1 & \text { if subregions } i \text { and } j \text { share a common boundary } i \neq j \
0 & \text { otherwise }
\end{array} .\right.
$$
Thus $\operatorname{Diag}\left(w_{i+}\right)$ is a diagonal matrix with $(i, i)$ entry equal to $w_{i+}=\sum_{j} w_{i j}$. Letting $\boldsymbol{\alpha} \equiv\left(\rho, \tau^{2}\right)$, the covariance matrix of $\phi$ then becomes $C(\boldsymbol{\alpha})=\tau^{2}\left[\operatorname{Diag}\left(w_{i+}\right)-\rho W\right]^{-1}$, where the inverse exists for an appropriate range of $\rho$ values; see Subsection 4.3.1.

In the context of Bayesian hierarchical areal modeling, when choosing a prior distribution $\pi(\phi)$ for a vector of spatial random effects $\phi$, the CAR distribution (1.3) is often used with the $0-1$ weight (or adjacency) matrix $W$ in (1.5) and $\rho=1$. While this results in an improper (nonintegrable) prior distribution, this problem is remedied by imposing a sumto-zero constraint on the $\phi_{i}$ (which turns out to be easy to implement numerically using Gibbs sampling). In this case the more general conditional form (1.3) is replaced by
$$
\phi_{i} \mid \phi_{(-i)} \sim N\left(\bar{\phi}{i}, \tau^{2} / m{i}\right),
$$
where $\bar{\phi}{i}$ is the average of the $\phi{j \neq i}$ that are adjacent to $\phi_{i}$, and $m_{i}$ is the number of these adjacencies (see, e.g., Besag, York, and Mollié, 1991). We discuss areal models in greater detail in Chapters 4 and $6 .$