【摘 要】 本文关注的重点是空间异质性问题。空间异质性是统计学中使用的一个术语,表示一个或多个感兴趣的统计特征在总体的所有子集中不相同。空间异质性的存在与独立同分布假设相冲突,即观测值之间并不是同分布的,这使很多基于独立同分布假设的方法出现问题。如果我们的研究区域很大并且物理或社会经济多样化,或者研究区域在高空间分辨率下被观测到,那么作出数据子集都具有相同统计特征的假设大概率是无效的,因此这个问题值得重视。本文概述了三种基本的空间异质性:空间均值异质性、空间自相关结构(含异方差)异质性、空间分层异质性,其中前两者相对比较成熟,文中给除了相关连接;因此本文重点是空间分层异质性的定义、检验和建模。

【参 考】

  • J. Wang, R. Haining, T. Zhang, C. Xu, and M. Hu, “Statistics for spatially stratified heterogeneous data,” arXiv preprint arXiv:2211.16918, 2022.
  • R. P. Haining and G. Li, Modelling spatial and spatial-temporal data: a Bayesian approach. Boca Raton: CRC Press, Taylor & Francis, 2020.
  • Dutilleul, P. R. L., Spatio-Temporal Heterogeneity: Concepts and Analysis. Cambridge: Cambridge University Press, 2011.

1 引言

本文关注的重点是空间异质性问题。空间异质性是统计学中使用的一个术语,表示一个或多个感兴趣的统计特征在总体的所有子集中不相同。空间异质性的存在与独立同分布假设相冲突,即观测值之间并不是 “同分布的”,从而使很多基于独立同分布假设的方法出现问题。如果我们的研究区域很大并且物理或社会经济多样化,或者研究区域在高空间分辨率下被观测到,那么理论上我们作出的数据的所有子集都具有相同统计特征的假设可能是无效的。在这种情况下,假设数据的所有子集并非都具有相同统计特征,可能是一个更安全的起点。

2 空间异质性的类型

Dutilleul (2011) [13] 描述了两种经常遇到的空间异质性:均值异质性(一阶异质性)、方差异质性(异方差性、二阶异质性)以及与数据中的自相关结构有关的异质性。Wang 等(2016)[62] 提出还存在第三种空间异质性,被称为 `空间分层异质性(Spatially Stratified Hererogeneity, SSHy)`。 ### 2.1 均值的异质性 **(1)检验方法** 计数数据可以通过卡方检验、连续值数据可以通过 ANOVA 方差分析检验来了解均值的异质性(或一阶异质性)(参见 Haining 和 Li 2020,第 6 章)[26]。 **(2)建模方法** 均值异质性可能是一组自变量变化的结果,因此如果可以正确指定这些自变量并且其关系在空间结构上稳定(模型参数在研究区域内是恒定的),则可以通过拟合一个回归模型来解释均值的异质性。 参见: - 面元数据: [空间回归模型综述](7b8d20a2.html) 中的全局回归部分 - 点参考数据:参考多元统计分析中的线性回归、广义线性回归、趋势面分析等相关资料 如果回归参数在空间上也在发生变化,则其关系被称为 “结构不稳定的” 或 “空间变化的”。地理加权回归(Fotheringham 等,2000 年)[16]、空间变系数模型等为数据分析师提供了可用于探索和模拟这种形式的均值异质性建模方法(Lloyd 2010 年 [39];Haining 和 Li,2020 年 [26],第 6 章和第 9 章)。 参见: - [空间变参数模型综述](24233ee7.html) - [地理加权回归方法](292f7d71.html) - 空间基展开方法 - [贝叶斯空间变系数模型](42cfdcdb.html) - Spatial regimes(不同区域采用不同回归系数) - 空间异方差方法 ### 2.2 方差异质性 #### 2.2.1 异方差性 #### 2.2.2 空间自相关结构的异质性 与空间自相关结构有关的异质性在空间数据普遍存在。此类形式的异质性通常以高(或低)值的 “局部聚簇” 形式出现。这与从全局性质角度考虑的高(低)值空间自相关性形成了鲜明对比。前者突出不同聚簇之间的变异性,而后者侧重于聚簇内部的相关性,因此此类空间异质性通常会与空间自相关性结对出现。 **(1)检验方法** 许多统计检验可用于检测此类形式的异质性,包括空间自相关 Moran's I 或 LISA 等局部指标体系(Anselin 1995 [3])、Gi 和 Gi* 统计量(Getis 和 Ord 1992 [19])和空间扫描统计量(Kulldorff 1997 [34])。这些检验使用广泛,尽管其中一些检验受到多重问题的影响。简单地说,对一个样本同时进行的检验越多(例如,当执行 $n$ 次检验时,在一个区域内的 $n$ 个子区域中各执行其中一次检验),拒绝至少其中一次的零假设检验概率就越大。犯 I 类错误(当原假设为真时拒绝原假设)的概率超过研究人员选择的决策规则(例如 5% 或 10%)。有关详细信息,请参阅 Haining 和 Li 2020,第 6 章 [26]。 **(2)建模方法** 此类异质性的建模要点在于如何描述空间自相关性中存在的结构。 例如: - 点参考数据中描述空间自相关性的普通克里金法、泛克里金法、协同克里金法等 - 面元数据中各种描述空间溢出效应的模型:空间滞后模型 (SLM)、空间滞后协变量模型 (SLX)、空间误差模型 (SEM) 、空间杜宾模型 (SDM) 等。

参见:

### 2.3 空间分层异质性 异质性的另一种重要形式是空间分层异质性 (SSHy)。当由一组连续空间单元组成的区域可以被划分为不同空间段(层)时,则各空间层之间可能存在分层异质性,其中在每个层(每个层包含多个空间单元)内,变量的均值或变量之间的关联相同,每个层都显示出层内同质性。同时与其他层相比,这些统计特征共同显示出层间的异质性(Wang et al. 2016 [62])。 与上述其他形式的局部空间异质性相比,空间分层异质性问题似乎并没有受到系统的关注。部分原因可能在于识别同质区域的方法有限或比较复杂。例如,在将基于同质性假设的全局模型应用于 SSH 总体时,空间分层异质性变成了一个混淆来源(辛普森悖论);即便认识到了异质性,也可能没有足够数据来支撑我们使用传统方法提供每个层的良好参数估计,即存在数据稀疏性问题;严重时甚至存在某些层没有抽样的样本偏差问题(Wang et al. 2018 [61]) ; Xu et al. 2018 [65]; Haining and Li 2020 [26])。 此外,空间分层异质性被忽略的另外一个原因可能在于:大量的分类算法似乎解决了类似的问题。而 Wang 等(2016)[62]认为 SSH 是样本偏差、统计偏差、建模混淆和误导 CI 的主要来源,需要鲁棒的解决方案来克服其负面影响。 对空间分层异质性建模存在以下四个潜在的好处: - 创建相同的 PDF - 在分层中的随机抽样; - 层中的空间模式、层与层之间的边界可以作为非线性因果关系的一种特定信息; - 通过叠加两个空间模式进行一般性交互。 **(1)检验方法** - Q 指数 **(2)建模方法** ## 3. 空间分层异质性方程 待补充。 ## 4 空间分层异质性下的推断 在本节中,我们将考虑数据集中存在空间分层异质性时的统计分析示例。 ## 参考文献
  • [1] Aiello, F., Ricotta, F. 2016. Firm heterogeneity in productivity across Europe: evidence from multilevel models. Economics of Innovation and New Technology 25(1): 57-89.
  • [2] Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control AC19: 716-723.
  • [3] Anselin, L. 1995. Local indicators of spatial association – LISA. Geographical Analysis 27(2): 93–115.
  • [4] Anselin, L. 2006. Spatial Heterogeneity, In B. Warff (Ed.), Encyclopedia of Human Geography. Thousand Oaks, CA, Sage Publications, p.452-453.
  • [5] Atkinson, P., Tate, N. 2000. Spatial scale problems and geostatistical solutions: a review. Professional Geography 52(4): 607-623.
  • [6] Bradley VC, Kuriwaki S, Isakov M, Sejdinovic D, Meng XL, Flaxman S. 2021. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature 600 (7890): 695-700.
  • [7] Buttle, J. M., D. M. Allen, D. Caissie, B. Davison, M. Hayashi, D. L. Peters, J. W. Pomeroy, S. Simonovic, A. StHilair, and P. H. Whitfield. 2016. Flood processes in Canada: Regional and special aspects. Canadian Water Resources Journal. DOI:10.1080/07011784.2015.1131629. China National Cancer Centre. 2019. Cancer Atlas in China. China. Beijing: Sinomap Press.
  • [8] Christakos, G. 1992. Random Field Models in Earth Sciences. CA, San Diego: Academic Press.
  • [9] Cliff, A.D., J. K. Ord. 1981. Spatial Processes: Models and Application. London: Pion.
  • [10] Cressie, N. 1993. Statistics for Spatial Data. New York: Wiley.
  • [11] Dormann, C. F., J. M. McPherson, M. B. Araujo, R. Bivand, J. Bolliger, G. Carl, R. D. Davies, A. Hirzel, W. Jetz, W. D. Kissling, I. Kuhn, R. Ohlemuller, P. R. Peres-Neto, B. Reineking, B. Schroder, F. M. Schurr, and R. Wilson R. 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30: 609-628.
  • [12] Dunn, R., and A. R. Harrison. 1993. Two dimensional systematic sampling of land use. Applied Statistics 42: 585601.
  • [13] Dutilleul, P. R. L. 2011. Spatio-Temporal Heterogeneity: Concepts and Analysis. Cambridge: Cambridge University Press.
  • [14] Everitt, B. S., and A. Skrondal. 2010. The Cambridge Dictionary of Statistics. 4th Ed. Cambridge University Press.
  • [15] Fazio, G., Piacentino, D. 2010. A spatial multilevel analysis of Italian SMEs’ productivity. Spatial Economic Analysis 5(3): 299-316.
  • [16] Fotheringham, A. S., C. Brunsdon, M. Charlton. 2000. Quantitative Geography: Perspectives on Spatial Data Analysis. London: Sage.
  • [17] Gao, B. B., J. F. Wang, M. G. Hu, H. M. Fan, and K. Xu. 2015. A stratified optimization method for a multivariate marine environmental monitoring network in the Yangtze River estuary and its adjacent sea. International Journal of Geographical Information Science 29(8): 1332-1349.
  • [18] Ge, Y., Y. Jin, A. Stein, Y. H. Chen, J. H. Wang, J. F. Wang, Q. M. Cheng, H. X. Bai, M. X. Liu, P. Atkinson. 2019. Principles and methods of scaling geospatial earth science data. Earth-Science Review197: 102897.
  • [19] Getis, A., and J. K. Ord. 1992. The analysis of spatial association by use of distance statistics. Geographical Analysis 24: 189-206 (with correction, 1993, 25, p. 276).
  • [20] Goldstein, H. 2011. Multilevel Statistical Models, 4th Edition. Wiley.
  • [21] Goodchild, M. F., L. Anselin, and U. Deichmann. 1993. A framework for the areal interpolation of socioeconomic data. Environment and Planning A 25: 383-297.
  • [22] Goodchild, M., and R. Haining. 2004. GIS and spatial data analysis: converging perspectives. Papers in Regional Science 83: 363-385.
  • [23] Griffith, D. A. 2003. Spatial Autocorrelation and Spatial Filtering, Gaining Understanding Through Theory and Visualization. Springer-Verlag, Berlin.
  • [24] Gujarati, D. N., D. C. Porter. 2009. Basic Econometrics, 5th Edition. McGraw-Hill.
  • [25] Haining, R. 2003. Spatial Data Analysis: Theory and Practice. Cambridge: Cambridge University Press, Cambridge.
  • [26] Haining, R., and G. Q. Li. 2020. Modelling Spatial and Spatial-Temporal Data: A Bayesian Approach. CRC.
  • [27] Heckman, J. J. 1979. Sample selection bias as a specification error. Econometrica 47(1): 153-161.
  • [28] Hox, J. J. 2010. Multilevel Analysis: Techniques and Applications, 2nd Edition. Routledge. p3
  • [29] Hu, M. G., and J. F. Wang. 2011. A meteorological network optimization package using MSN theory. Environmental Modelling & Software 26: 546-548.
  • [30] Hu, M. G., J. F. Wang, and Y. Zhao. 2013. A B-SHADE based best linear unbiased estimation tool for biased samples. Environmental Modelling & Software 48: 93-97.
  • [31] Isaaks, E., and R. Srivastava. 1989. Applied Geostatistics. Oxford University Press.
  • [32] Kolasa, J., C. D. Rollo. 1991. Introduction: The heterogeneity of heterogeneity: A glossary. 1-23 in: J. Kolasa & S. T. A. Pickett [eds.] Ecological Heterogeneity. Springer-Verlag, New York.
  • [33] Kong, L. B., J. Y. Xin, W. Y. Zhang, and Y. S. Wang. 2016. The empirical correlations between PM2.5, PM10 and AOD in the Beijing metropolitan region and the PM2.5, PM10 distributions retrieved by MODIS. Environmental Pollution 216: 350-360.
  • [34] Kulldorff, M. 1997. A spatial scan statistic. Communications in Statistics: Theory and Methods. 26: 1481-1496.
  • [35] Lee, D., and R. Mitchell. 2013. Locally adaptive spatial smoothing using conditional auto-regressive models. Applied Statistics 62, Part4: 593-608.
  • [36] Li, J. M., J. F. Wang, Z. P. Ren, D. Yang, Y. P. Wang, Y. Mu, X. H. Li, M. R. Li, Y. M. Guo, J. Zhu. 2020. Spatiotemporal trends in maternal mortality ratios in 2205 Chinese counties from 2010-2013 and ecological determinants: A Bayesian modelling analysis. PLOS Medicine 17(5): e1003114.
  • [37] Lindley, D. V., M. R. Novick. 1981. The role of exchangeability in inference. Annals of Statistics 9: 45-58.
  • [38] Liu, T. J., J. F. Wang, C. Xu, J. Q. Ma, C. D. Xu, and H. Y. Zhang. 2018. Sandwich mapping of rodent density in Jilin Province, China. Journal of Geographical Sciences 28(4): 445-458.
  • [39] Lloyd, C. D. 2010. Local Models for Spatial Analysis, 2nd Edition. CRC.
  • [40] Longley, P. A., M. F. Goodchild, D. J. Maguire, D. W. Rind. 2005. Geographical Information Systems and Science, 2nd Edition. John Wiley & Sons Ltd.
  • [41] Matheron, G. 1963. Principles of Geostatistics. Economic Geology 58: 1246-1266.
  • [42] MacEachren, A.M. 1982. Map complexity: Comparison and measurement. The American Cartographer 9: 1, 31-46.
  • [43] Monmonier, M.S. 1974. Measures of pattern complexity for choroplethic maps. The American Cartographer 1: 2, 159-169.
  • [44] O’Connell, P. E., R. J. Gurney, D. A. Jones, J. B. Miller, C. A. Nicholas, and M. R. Senior. 1979. A case study of rationalization of a rain guage network in SW England. Water Resources Research 15: 1813-22.
  • [45] Openshaw S. 1984. The Modifiable Areal Unit Problem. CATMOG 38. Norwich: GeoAbstracts.
  • [46] Osborne, P. E., G. M. Foody, S. Suárez-Seoane. 2007. Non-stationarity and local approaches to modelling the distributions of wildlife. Diversity and Distributions 13: 313–323.
  • [47] Pearson, K. 1895. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 58: 240–242.
  • [48] Rao, J. N. K. 2003. Small Area Estimation. New York: John Wiley.
  • [49] Rao, J. N. K. 2014. Small-Area Estimation. Wiley StatsRef: Statistics Reference Online, 1-8.
  • [50] Ripley, B. D. 1981. Spatial Statistics. Wiley. Rodriguez-Iturbe, I., J. M.
  • [51] Mejia. 1974. The design of rainfall networks in time and space. Water Resources Research 10, 713–728.
  • [52] Snijders, T.A.B., Bosker, R.J. 2011. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, 2nd Edition. Sage.
  • [53] Sui, D. 2006. Tobler’s First Law of Geography, In B. Warff (Ed.), Encyclopedia of Human Geography. Thousand Oaks, CA, Sage Publications, 2006: p.454.
  • [54] Thompson, S. K. 2012. Sampling, 3rd Edition. Wiley.
  • [55] Wang, J. F., Christakos, G., Hu, M. G. 2009. Modeling spatial means of surfaces with stratified non-homogeneity. IEEE Transactions on Geoscience and Remote Sensing 47(12): 4167-4174.
  • [56] Wang, J. F., Haining, R. and Z. D. Cao. 2010a. Sample surveying to estimate the mean of a heterogeneous surface: reducing the error variance through zoning. International Journal of Geographical Information Science 24(4): 523-543.
  • [57] Wang, J. F., X. H. Li, G. Christakos, Y. L. Liao, T. Zhang, X. Gu, X. Y. Zheng. 2010b. Geographical detectorsbased health risk assessment and its application in the neural tube defects study of the Heshun Region, China. International Journal of Geographical Information Science 24(1): 107-127.
  • [58] Wang, J. F., B. Y. Reis, M. G. Hu, G. Christakos, W. Z. Yang, Q. Sun, Z. J. Li, X. Z. Li, S. J. Lai, H. Y. Chen, D. C. Wang. 2011. Area disease estimation based on sentinel hospital records. PLoS ONE 6(8): e23428.
  • [59] Wang, J. F., M. G. Hu, C. D. Xu, G. Christakos, Y. Zhao. 2013a. Estimation of citywide air pollution in Beijing. PLoS ONE 8(1): e53400.
  • [60] Wang, J. F., R. Haining, T. J. Liu, L. F. Li, C. S. Jiang. 2013b. Sandwich estimation for multi-unit reporting on a stratified heterogeneous surface. Environment and Planning A 45(10): 2515-2534.
  • [61] Wang, J. F., C. D. Xu, M. G. Hu, Q. X. Li, Z. W. Yan, P. Jones. 2018. Global land surface air temperature dynamics since 1880. International Journal of Climatology 38: e466-e474.
  • [62] Wang, J. F., T. L. Zhang, B. J. Fu. 2016. A measure of spatial stratified heterogeneity. Ecological Indicators 67 (2016): 250-256.
  • [63] Wang, J. X., M. G. Hu, B. B. Gao, H. M. Fan, J. F. Wang. 2019. A spatiotemporal interpolation method for the assessment of pollutant concentrations in the Yangtze River estuary and adjacent areas from 2004 to 2013. Environmental Pollution. DOI: 10.1016/j.envpol.2019.05.132.
  • [64] Xu, C. D., J. F. Wang, M. G. Hu, Q. X. Li. 2013. Interpolation of missing temperature data at meteorological stations using P-BSHADE. Journal of Climate 26: 7452-7463.
  • [65] Xu, C. D., J. F. Wang, Q. X. Li. 2018. A new method for temperatures spatial interpolation based on sparse historical stations. Journal of Climate 31: 1757-1770.
  • [66] Xu, L., Liu, Q. Y., Stige, L. C., T. B. Ari, X. Y. Fang, K. S. Chan, S. C. Wang, N. C. Stenseth, Z. B. Zhang. 2011. Nonlinear effect of climate on plague during the third pandemic in China. PNAS 108(25): 10214-10219.
  • [67] Yin, Q., J. F. Wang, Z. P. Ren, J. Li, Y. M. Guo. 2019. Mapping the increased minimum mortality temperatures in the context of global climate change. Nature Communications 10: 4640.
  • [68] Zhang, D., Brecke, P., Lee, H.F., He, Y.Q., Zhang, J. 2007. Global climate change, war, and population decline in recent human history. Proceedings of National Academy of Sciences of the United States of America 104(49): 19214–9.