【阅读建议】 空间数据集通常被分为三种类型:点模式数据、点模式数据和点模式数据,本文重点介绍点模式数据的形式化定义。

【引文信息】

按照惯例,通常将空间数据集分为三种基本类型:

  • 点模式数据 (Point-referenced data)

    其中 Y(s)Y(s) 是位置 sRr\mathbf{s} \in \mathit{R}^r 处的随机向量,其中 s\mathbf{s}Rr\mathit{R}^r 的一个固定子集 DD连续变化,具有 rr 维矩形的正体积;

  • 点模式数据(Areal data)

    其中 DD 依然是 Rr\mathit{R}^r 的一个固定子集,具有规则或不规则的形状,不过现在 DD划分 为有限数量的、具有明确边界的点模式;

  • 点模式数据(Point pattern data)

    DD 本身是随机的;其索引的集合 (Index set) 给出了作为空间点模式的随机事件的位置。 Y(s)Y(s) 本身对于所有 sDs \in D 可以简单地等于 11 (表示事件是否发生),或者可能给出一些额外的协变量信息( 指带标记的点模式过程 )。

In models for areal data, the geographic regions or blocks (zip codes, counties, etc.) are denoted by BiB_{i}, and the data are typically sums or averages of variables over these blocks. To introduce spatial association, we define a neighborhood structure based on the arrangement of the blocks in the map. Once the neighborhood structure is defined, models resembling autoregressive time series models are considered. Two very popular models that incorporate such neighborhood information are the simultaneously and conditionally autoregressive models (abbreviated SAR and CAR), originally developed by Whittle (1954) and Besag (1974), respectively. The SAR model is computationally convenient for use with likelihood methods. By contrast, the CAR model is computationally convenient for Gibbs sampling used
6
OVERVIEW OF SPATIAL DATA PROBLEMS
in conjunction with Bayesian model fitting, and in this regard is often used to incorporate spatial correlation through a vector of spatially varying random effects ϕ=(ϕ1,,ϕn)T\phi=\left(\phi_{1}, \ldots, \phi_{n}\right)^{T}. For example, writing YiY(Bi)Y_{i} \equiv Y\left(B_{i}\right), we might assume YiindN(ϕi,σ2)Y_{i} \stackrel{i n d}{\sim} N\left(\phi_{i}, \sigma^{2}\right), and then impose the CAR model

ϕiϕ(i)N(μ+j=1naij(ϕjμ),τi2)\phi_{i} \mid \phi_{(-i)} \sim N\left(\mu+\sum_{j=1}^{n} a_{i j}\left(\phi_{j}-\mu\right), \tau_{i}^{2}\right)

where ϕ(i)={ϕj:ji},τi2\phi_{(-i)}=\left\{\phi_{j}: j \neq i\right\}, \tau_{i}^{2} is the conditional variance, and the aija_{i j} are known or unknown constants such that aii=0a_{i i}=0 for i=1,,ni=1, \ldots, n. Letting A=(aij)A=\left(a_{i j}\right) and M=Diag(τ12,,τn2)M=\operatorname{Diag}\left(\tau_{1}^{2}, \ldots, \tau_{n}^{2}\right), by Brook’s Lemma (c.f. Section 4.2), we can show that

p(ϕ)exp{(ϕμ1)TM1(IA)(ϕμ1)/2}p(\phi) \propto \exp \left\{-(\phi-\mu \mathbf{1})^{T} M^{-1}(I-A)(\phi-\mu \mathbf{1}) / 2\right\}

where 1 is an nn-vector of 1 's, and II is a n×nn \times n identity matrix.
A common way to construct AA and MM is to let A=ρDiag(1/wi+)WA=\rho \operatorname{Diag}\left(1 / w_{i+}\right) W and M1=M^{-1}= τ2Diag(wi+)\tau^{-2} \operatorname{Diag}\left(w_{i+}\right). Here ρ\rho is referred to as the spatial correlation parameter, and W=(wij)W=\left(w_{i j}\right) is a neighborhood matrix for the areal units, which can be defined as

wij={1 if subregions i and j share a common boundary ij0 otherwise .w_{i j}=\left\{\begin{array}{ll} 1 & \text { if subregions } i \text { and } j \text { share a common boundary } i \neq j \\ 0 & \text { otherwise } \end{array} .\right.

Thus Diag(wi+)\operatorname{Diag}\left(w_{i+}\right) is a diagonal matrix with (i,i)(i, i) entry equal to wi+=jwijw_{i+}=\sum_{j} w_{i j}. Letting α(ρ,τ2)\boldsymbol{\alpha} \equiv\left(\rho, \tau^{2}\right), the covariance matrix of ϕ\phi then becomes C(α)=τ2[Diag(wi+)ρW]1C(\boldsymbol{\alpha})=\tau^{2}\left[\operatorname{Diag}\left(w_{i+}\right)-\rho W\right]^{-1}, where the inverse exists for an appropriate range of ρ\rho values; see Subsection 4.3.1.

In the context of Bayesian hierarchical areal modeling, when choosing a prior distribution π(ϕ)\pi(\phi) for a vector of spatial random effects ϕ\phi, the CAR distribution (1.3) is often used with the 010-1 weight (or adjacency) matrix WW in (1.5) and ρ=1\rho=1. While this results in an improper (nonintegrable) prior distribution, this problem is remedied by imposing a sumto-zero constraint on the ϕi\phi_{i} (which turns out to be easy to implement numerically using Gibbs sampling). In this case the more general conditional form (1.3) is replaced by

ϕiϕ(i)N(ϕˉi,τ2/mi),\phi_{i} \mid \phi_{(-i)} \sim N\left(\bar{\phi}_{i}, \tau^{2} / m_{i}\right),

where ϕˉi\bar{\phi}_{i} is the average of the ϕji\phi_{j \neq i} that are adjacent to ϕi\phi_{i}, and mim_{i} is the number of these adjacencies (see, e.g., Besag, York, and Mollié, 1991). We discuss areal models in greater detail in Chapters 4 and 6.6 .