with pm.Model() as gkm: a = pm.HalfNormal("a", sigma=1) b = pm.HalfNormal("b", sigma=1) g = pm.HalfNormal("g", sigma=1) k = pm.HalfNormal("k", sigma=1) s = pm.Simulator("s", gk_simulator, params=[a, b, g, k], sum_stat=octo_summary, epsilon=0.1, observed=bsas_co) trace_gk = pm.sample_smc(kernel="ABC", parallel=True)
`select_model` 返回最佳模型的索引值(从 $0$ 开始)和估计得出的模型后验概率。对于示例,我们得到模型 $0$ 的概率为 $0.68$ 。在这个例子中,`LOO` 和`随机森林法`都同意模型选择结论,甚至模型间的相对权重,这让人比较放心。
## 8 为近似贝叶斯计算选择先验
没有封闭形式的似然使得好模型更加难以得到,因此近似贝叶斯计算方法通常比其他近似解更脆弱。因此,我们应格外小心一些建模的选择,包括先验的选择和比有明确似然时更严谨的模型评估。这些都是为获得近似似然而必须付出的成本。
与其他方法相比,在近似贝叶斯计算方法中更仔细的选择先验,可能比在其他方法中更有价值。如果在近似似然时会丢失信息,那我们希望通过包含更多信息的先验来进行部分补偿。此外,更好的先验通常会使我们免于浪费计算资源和时间。对于近似贝叶斯计算拒绝方法,我们使用先验作为采样分布,这是显而易见的。但 `SMC` 方法也是如此,特别是模拟器对输入参数比较敏感时。例如,当使用近似贝叶斯计算推断常微分方程时,某些参数组合可能难以进行数值模拟,从而导致模拟速度极慢。在 `SMC` 和 `SMC-ABC` 的加权采样过程中出现了使用模糊先验的另一个问题,因为在对退火后验进行评估时,除了少数先验样本外,几乎所有样本的权重都非常小。这导致 `SMC` 粒子在几个步骤后变得奇异( 因为只选择了少数重量较大的样本 )。这种现象称为权重崩塌,这也是粒子方法的一个众所周知的问题 [108] 。
良好的先验可以降低计算成本,从而在一定程度上允许我们使用 `SMC` 和 `SMC-ABC` 拟合更复杂的模型。除了提供信息性更强的先验和在本书中讨论过的有关先验选择/评估的内容之外,我们暂时没有针对近似贝叶斯计算方法的进一步推荐。
## 习题
**8E1.** In your words explain how 近似贝叶斯计算 is approximate? What object or quantity is approximated and how.
**8E2.** In the context of 近似贝叶斯计算,what is the problem that SMC is trying to solve compared to rejection sampling?
**8E3.** Write a Python function to compute the Gaussian kernel as in Equation `图 6`, but without the summation.
Generate two random samples of size 100 from the same distribution. Use the implemented function to compute the distances between those two random samples. You will get two distributions each of size 100. Show the differences using a KDE plot, the mean and the standard deviation.
**8E4.** What do you expect to the results to be in terms of accuracy and convergence of the sampler if in model `gauss` model from Code Block [gauss_abc](gauss_abc) we would have used `sum_stat="identity"`. Justify.
**8E5.** Refit the `gauss` model from Code Block [gauss_abc](gauss_abc) using `sum_stat="identity"`.
Evaluate the results using:
1. Trace Plot
2. Rank Plot
3. $\hat R$
4. The mean and HDI for the parameters $\mu$ and $\sigma$.
Compare the results with those from the example in the book (i.e. using `sum_stat="sort"`).
**8E6.** Refit the `gauss` model from Code Block [gauss_abc](gauss_abc) using quintiles as summary statistics.
1. How the results compare with the example in the book?
2. Try other values for `epsilon`. Is 1 a good choice?
**8E7.** Use the `g_and_k_quantile` class to generate a sample (n=500) from a g-and-k distribution with parameters a=0,b=1,g=0.4,k=0. Then use the `gkm` model to fit it using 3 different values of $\epsilon$ (0.05, 0.1, 0.5). Which value of $\epsilon$ do you think is the best for this problem? Use diagnostics tools to help you answer this question.
**8E8.** Use the sample from the previous exercise and the `gkm` model. Fit the using the summary statistics `octo_summary`, the `octile-vector` (i.e. the quantiles 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875) and `sum_stat="sorted"`. Compare the results with the known parameter values, which option provides higher accuracy and lower uncertainty?
**8M9.** In the GitHub repository you will find a dataset of the distribution of citations of scientific papers. Use SMC-ABC to fit a g-and-k distribution to this dataset. Perform all the necessary steps to find a suitable value for `"epsilon"` and ensuring the model converge and results provides a suitable fit.
**8M10.** The Lotka-Volterra is well-know biological model describing how the number of individuals of two species change when there is a predator-prey interaction [109]. Basically, as the population of prey increase there is more food for the predator which leads to an increase in the predator population. But a large number of predators produce a decline in the number of pray which in turn produce a decline in the predator as food becomes scarce. Under certain conditions this leads to an stable cyclic pattern for both populations.
In the GitHub repository you will find a Lotka-Volterra simulator with unknown parameters and the data set `Lotka-Volterra_00`. Assume the unknown parameters are positive. Use a SMC-ABC model to find the posterior distribution of the parameters.
**8H11.** Following with the Lotka-Volterra example. The dataset `Lotka-Volterra_01` includes data for a predator prey with the twist that at some point a disease suddenly decimate the prey population. Expand the model to allow for a "switchpoint", i.e. a point that marks two different predator-prey dynamics (and hence two different set of parameters).
**8H12.** This exercise is based in the sock problem formulated by Rasmus Bååth. The problem goes like this. We get 11 socks out of the laundry and to our surprise we find that they are all unique, that is we can not pair them. What is the total number of socks that we laundry? Let assume that the laundry contains both paired and unpaired socks, we do not have more than two socks of the same kind. That is we either have 1 or 2 socks of each kind.
Assume the number of socks follows a $\text{NB}(30, 4.5)$. And that the proportion of unpaired socks follows a $\text{Beta}(15, 2)$
Generate a simulator suitable for this problem and create a SMC-ABC model to compute the posterior distribution of the number of socks, the proportion of unpaired socks, and the number of pairs.
## 参考文献
[1] John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. Probabilistic programming in python using pymc3. PeerJ Computer Science, 2:e55, 2016.
[2] Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A Saurous. Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017.
[3] Ravin Kumar, Colin Carroll, Ari Hartikainen, and Osvaldo Martin. Arviz a unified library for exploratory analysis of bayesian models in python. Journal of Open Source Software, 4(33):1143, 2019.
[4] P. Westfall and K.S.S. Henning. Understanding Advanced Statistical Methods. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2013. ISBN 9781466512108.
[5] J.K. Blitzstein and J. Hwang. Introduction to Probability, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. CRC Press, 2019. ISBN 9780429766732.
[6] Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
[7] WK HASTINGS. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97–109, 1970.
[8] Marshall N Rosenbluth. Genesis of the monte carlo algorithm for statistical mechanics. In AIP Conference Proceedings, volume 690, 22–30. American Institute of Physics, 2003.
[9] R. McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Chapman & Hall/CRC Texts in Statistical Science. CRC Press, 2020. ISBN 9781482253481.
[10] Daniel Lakens, Federico G. Adolfi, Casper J. Albers, Farid Anvari, Matthew A. J. Apps, Shlomo E. Argamon, Thom Baguley, Raymond B. Becker, Stephen D. Benning, Daniel E. Bradford, Erin M. Buchanan, Aaron R. Caldwell, Ben Van Calster, Rickard Carlsson, Sau-Chin Chen, Bryan Chung, Lincoln J. Colling, Gary S. Collins, Zander Crook, Emily S. Cross, Sameera Daniels, Henrik Danielsson, Lisa DeBruine, Daniel J. Dunleavy, Brian D. Earp, Michele I. Feist, Jason D. Ferrell, James G. Field, Nicholas W. Fox, Amanda Friesen, Caio Gomes, Monica Gonzalez-Marquez, James A. Grange, Andrew P. Grieve, Robert Guggenberger, James Grist, Anne-Laura van Harmelen, Fred Hasselman, Kevin D. Hochard, Mark R. Hoffarth, Nicholas P. Holmes, Michael Ingre, Peder M. Isager, Hanna K. Isotalus, Christer Johansson, Konrad Juszczyk, David A. Kenny, Ahmed A. Khalil, Barbara Konat, Junpeng Lao, Erik Gahner Larsen, Gerine M. A. Lodder, Jiří Lukavský, Christopher R. Madan, David Manheim, Stephen R. Martin, Andrea E. Martin, Deborah G. Mayo, Randy J. McCarthy, Kevin McConway, Colin McFarland, Amanda Q. X. Nio, Gustav Nilsonne, Cilene Lino de Oliveira, Jean-Jacques Orban de Xivry, Sam Parsons, Gerit Pfuhl, Kimberly A. Quinn, John J. Sakon, S. Adil Saribay, Iris K. Schneider, Manojkumar Selvaraju, Zsuzsika Sjoerds, Samuel G. Smith, Tim Smits, Jeffrey R. Spies, Vishnu Sreekumar, Crystal N. Steltenpohl, Neil Stenhouse, Wojciech Swiatkowski, Miguel A. Vadillo, Marcel A. L. M. Van Assen, Matt N. Williams, Samantha E. Williams, Donald R. Williams, Tal Yarkoni, Ignazio Ziano, and Rolf A. Zwaan. Justify your alpha. Nature Human Behaviour, 2(3):168–171, 2018.
[11] David Deming. Do extraordinary claims require extraordinary evidence? Philosophia, 44(4):1319–1331, 2016.
[12] Andrew Gelman, Daniel Simpson, and Michael Betancourt. The prior can often only be understood in the context of the likelihood. Entropy, 19(10):555, 2017.
[13] John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
[14] Persi Diaconis. Theories of Data Analysis: From Magical Thinking Through Classical Statistics, chapter 1, pages 1–36. John Wiley & Sons, Ltd, 2006.
[15] Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. Visualization in bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(2):389–402, 2019.
[16] Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. Bayesian workflow. arXiv preprint arXiv:2011.01808, 2020.
[17] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B. Rubin. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2013. ISBN 9781439840955.
[18] Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. Rank-Normalization, Folding, and Localization: An Improved $\widehat R$ for Assessing Convergence of MCMC. Bayesian Analysis, pages 1 – 38, 2021. URL: https://doi.org/10.1214/20-BA1221, doi:10.1214/20-BA1221.
[19] Stephan Hoyer and Joe Hamman. Xarray: nd labeled arrays and datasets in python. Journal of Open Research Software, 2017.
[20] Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
[21] Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah Gabry. Pareto smoothed importance sampling. arXiv preprint arXiv:1507.02646, 2021.
[22] Topi Paananen, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. Implicitly adaptive importance sampling. Statistics and Computing, 31(2):1–19, 2021.
[23] Jennifer A Hoeting, David Madigan, Adrian E Raftery, and Chris T Volinsky. Bayesian model averaging: a tutorial (with comments by m. clyde, david draper and ei george, and a rejoinder by the authors. Statistical science, 14(4):382–417, 1999.
[24] Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman. Using stacking to average bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3):917–1007, 2018.
[25] Donald B Rubin. Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4):377–401, 1981.
[26] Allison Marie Horst, Alison Presmanes Hill, and Kristen B Gorman. palmerpenguins: Palmer Archipelago (Antarctica) penguin data. 2020. R package version 0.1.0. URL: https://allisonhorst.github.io/palmerpenguins/, doi:10.5281/zenodo.3960218.
[27] Dan Piponi, Dave Moore, and Joshua V. Dillon. Joint distributions for tensorflow probability. arXiv preprint arXiv:2001.11819, 2020.
[28] Junpeng Lao, Christopher Suter, Ian Langmore, Cyril Chimisov, Ashish Saxena, Pavel Sountsov, Dave Moore, Rif A Saurous, Matthew D Hoffman, and Joshua V. Dillon. Tfp.mcmc: modern markov chain monte carlo tools built for modern hardware. arXiv preprint arXiv:2002.01184, 2020.
[29] J. Fox. Applied Regression Analysis and Generalized Linear Models. SAGE Publications, 2015. ISBN 9781483321318.
[30] Kristen B Gorman, Tony D Williams, and William R Fraser. Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PloS one, 9(3):e90081, 2014.
[31] Tomás Capretto, Camen Piho, Ravin Kumar, Jacob Westfall, Tal Yarkoni, and Osvaldo A Martin. Bambi: a simple interface for fitting bayesian linear models in python. arXiv preprint arXiv:2012.10754, 2020.
[32] Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823, 2014.
[33] Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. 2020. R package version 3.1-151. URL: https://CRAN.R-project.org/package=nlme.
[34] Jonah Gabry and Ben Goodrich. Estimating generalized (non-)linear models with group-specific terms with rstanarm. 6 2020. URL: https://mc-stan.org/rstanarm/articles/glmer.html.
[35] Paul-Christian Bürkner. Brms: an r package for bayesian multilevel models using stan. Journal of statistical software, 80(1):1–28, 2017.
[36] C. Davidson-Pilon. Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference. Addison-Wesley Data & Analytics Series. Pearson Education, 2015. ISBN 9780133902921.
[37] Brian Greenhill, Michael D Ward, and Audrey Sacks. The separation plot: a new visual method for evaluating the fit of binary models. American Journal of Political Science, 55(4):991–1002, 2011.
[38] A. Gelman, J. Hill, and A. Vehtari. Regression and Other Stories. Analytical Methods for Social Research. Cambridge University Press, 2020. ISBN 9781107023987.
[39] O. Martin. Bayesian Analysis with Python: Introduction to Statistical Modeling and Probabilistic Programming Using PyMC3 and ArviZ, 2nd Edition. Packt Publishing, 2018. ISBN 9781789341652.
[40] Frank E Grubbs. Procedures for detecting outlying observations in samples. Technometrics, 11(1):1–21, 1969.
[41] Michael Betancourt. Towards a principled bayesian workflow. https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html, 4 2020.
[42] Andrew Gelman. Analysis of variance—why it is more important than ever. The annals of statistics, 33(1):1–53, 2005.
[43] Radford M Neal. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
[44] Omiros Papaspiliopoulos, Gareth O Roberts, and Martin Sköld. A general framework for the parametrization of hierarchical models. Statistical Science, pages 59–73, 2007.
[45] Michael Betancourt. Hierarchical modeling. https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html, 11 2020.
[46] Nathan P Lemoine. Moving beyond noninformative priors: why and how to choose weakly informative priors in bayesian analyses. Oikos, 128(7):912–928, 2019.
[47] S.N. Wood. Generalized Additive Models: An Introduction with R, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. CRC Press, 2017. ISBN 9781498728379.
[48] Catherine Potvin, Martin J Lechowicz, and Serge Tardif. The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures. Ecology, 71(4):1389–1400, 1990.
[49] Eric J Pedersen, David L Miller, Gavin L Simpson, and Noam Ross. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ, 7:e6876, 2019.
[50] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Mass, 2005. ISBN 978-0-262-18253-9.
[51] Sean J Taylor and Benjamin Letham. Forecasting at scale. The American Statistician, 72(1):37–45, 2018.
[52] Ryan Prescott Adams and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742, 2007.
[53] G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, 2009. ISBN 9780980232714.
[54] Andrew C Harvey and Neil Shephard. Structural time series models. Handbook of Statistics,(edited by GS Maddala, CR Rao and HD Vinod), 11:261–302, 1993.
[55] G.E.P. Box, G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics. Wiley, 2008. ISBN 9780470272848.
[56] R. Shumway and D. Stoffer. Time Series: A Data Analysis Approach Using R. Chapman & Hall/CRC Texts in Statistical Science. CRC Press, 2019. ISBN 9781000001563.
[57] M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer Series in Statistics. Springer New York, 2013. ISBN 9781475793659.
[58] S. Särkkä. Bayesian Filtering and Smoothing. Bayesian Filtering and Smoothing. Cambridge University Press, 2013. ISBN 9781107030657.
[59] James Durbin and Siem Jan Koopman. Time series analysis by state space methods. Oxford university press, 2012.
[60] Mohinder S Grewal and Angus P Andrews. Kalman filtering: Theory and Practice with MATLAB. John Wiley & Sons, 2014.
[61] N. Chopin and O. Papaspiliopoulos. An Introduction to Sequential Monte Carlo. Springer Series in Statistics. Springer International Publishing, 2020. ISBN 9783030478445.
[62] Paul-Christian Bürkner, Jonah Gabry, and Aki Vehtari. Approximate leave-future-out cross-validation for bayesian time series models. Journal of Statistical Computation and Simulation, 90(14):2499–2523, 2020.
[63] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480, 2010.
[64] Juho Piironen, Aki Vehtari, and others. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018–5051, 2017.
[65] Juho Piironen and Aki Vehtari. On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In Artificial Intelligence and Statistics, 905–913. PMLR, 2017.
[66] Gabriel Riutort-Mayol, Paul-Christian Bürkner, Michael R Andersen, Arno Solin, and Aki Vehtari. Practical hilbert space approximate bayesian gaussian processes for probabilistic programming. arXiv preprint arXiv:2004.11408, 2020.
[67] Leo Breiman. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):199–231, 2001.
[68] Z.H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC data mining and knowledge discovery series. CRC Press, 2012. ISBN 9781439830055.
[69] Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. Bart: bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298, 2010.
[70] Veronika Ročková and Enakshi Saha. On theory for bart. In The 22nd International Conference on Artificial Intelligence and Statistics, 2839–2848. PMLR, 2019.
[71] Balaji Lakshminarayanan, Daniel Roy, and Yee Whye Teh. Particle gibbs for bayesian additive regression trees. In Artificial Intelligence and Statistics, 553–561. PMLR, 2015.
[72] C. Molnar. Interpretable Machine Learning. Lulu.com, 2020. ISBN 9780244768522.
[73] Christoph Molnar, Giuseppe Casalicchio, and Bernd Bischl. Interpretable machine learning–a brief history, state-of-the-art and challenges. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 417–431. Springer, 2020.
[74] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
[75] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics, 24(1):44–65, 2015.
[76] Yi Liu, Veronika Ročková, and Yuexi Wang. Variable selection with abc bayesian forests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2019.
[77] Colin J. Carlson. Embarcadero: species distribution modelling with bayesian additive regression trees in r. Methods in Ecology and Evolution, 11(7):850–858, 2020.
[78] Justin Bleich, Adam Kapelner, Edward I George, and Shane T Jensen. Variable selection for bart: an application to gene regulation. The Annals of Applied Statistics, pages 1750–1781, 2014.
[79] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[80] Matej Balog and Yee Whye Teh. The mondrian process for machine learning. arXiv preprint arXiv:1507.05181, 2015.
[81] Daniel M Roy and Yee Whye Teh. The mondrian process. In Proceedings of the 21st International Conference on Neural Information Processing Systems, 1377–1384. 2008.
[82] Mikael Sunnåker, Alberto Giovanni Busetto, Elina Numminen, Jukka Corander, Matthieu Foll, and Christophe Dessimoz. Approximate bayesian computation. PLoS computational biology, 9(1):e1002803, 2013.
[83] Ritabrata Dutta, Marcel Schoengens, Jukka-Pekka Onnela, and Antonietta Mira. Abcpy: a user-friendly, extensible, and parallel library for approximate bayesian computation. In Proceedings of the platform for advanced scientific computing conference, 1–9. 2017.
[84] Jarno Lintusaari, Henri Vuollekoski, Antti Kangasraasio, Kusti Skytén, Marko Jarvenpaa, Pekka Marttinen, Michael U Gutmann, Aki Vehtari, Jukka Corander, and Samuel Kaski. Elfi: engine for likelihood-free inference. Journal of Machine Learning Research, 19(16):1–7, 2018.
[85] Emmanuel Klinger, Dennis Rickert, and Jan Hasenauer. Pyabc: distributed, likelihood-free inference. Bioinformatics, 34(20):3591–3593, 2018.
[86] Georges Darmois. Sur les lois de probabilitéa estimation exhaustive. CR Acad. Sci. Paris, 260(1265):85, 1935.
[87] Bernard Osgood Koopman. On distributions admitting a sufficient statistic. Transactions of the American Mathematical society, 39(3):399–409, 1936.
[88] Edwin James George Pitman. Sufficient statistics and intrinsic accuracy. In Mathematical Proceedings of the cambridge Philosophical society, volume 32, 567–579. Cambridge University Press, 1936.
[89] Erling Bernhard Andersen. Sufficiency and exponential families for discrete sample spaces. Journal of the American Statistical Association, 65(331):1248–1255, 1970.
[90] Fernando Pérez-Cruz. Kullback-leibler divergence estimation of continuous distributions. In 2008 IEEE international symposium on information theory, 1666–1670. IEEE, 2008.
[91] Bai Jiang. Approximate bayesian computation with kullback-leibler divergence as data discrepancy. In International conference on artificial intelligence and statistics, 1711–1721. PMLR, 2018.
[92] Espen Bernton, Pierre E Jacob, Mathieu Gerber, and Christian P Robert. Approximate bayesian computation with the wasserstein distance. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(2):235–269, 2019.
[93] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.
[94] S.A. Sisson, Y. Fan, and M. Beaumont. Handbook of Approximate Bayesian Computation. Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, 2018. ISBN 9781439881514.
[95] Mark A. Beaumont, Wenyang Zhang, and David J Balding. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
[96] Mark A. Beaumont. Approximate bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics, 41:379–406, 2010.
[97] Pierre Pudlo, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. Reliable abc model choice via random forests. Bioinformatics, 32(6):859–866, 2016.
[98] John W. Tukey. Modern Techniques in Data Analysis. In proceesings of the Sponsored Regional Research Conference. 1977.
[99] Glen D Rayner and Helen L MacGillivray. Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions. Statistics and Computing, 12(1):57–75, 2002.
[100] Dennis Prangle. Gk: an r package for the g-and-k and generalised g-and-h distributions. arXiv preprint arXiv:1706.06889, 2017.
[101] Christopher C Drovandi and Anthony N Pettitt. Likelihood-free bayesian estimation of multivariate quantile distributions. Computational Statistics & Data Analysis, 55(9):2541–2556, 2011.
[102] A.L. Bowley. Elements of Statistics. Number v. 2 in Elements of Statistics. P.S. King, 1920.
[103] JJA Moors. A quantile alternative for kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 37(1):25–32, 1988.
[104] Jean-Michel Marin, Pierre Pudlo, Christian P Robert, and Robin J Ryder. Approximate bayesian computational methods. Statistics and Computing, 22(6):1167–1180, 2012.
[105] Mark A. Beaumont. Approximate bayesian computation. Annual review of statistics and its application, 6:379–403, 2019.
[106] Christian P Robert, Jean-Marie Cornuet, Jean-Michel Marin, and Natesh S Pillai. Lack of confidence in approximate bayesian computation model choice. Proceedings of the National Academy of Sciences, 108(37):15112–15117, 2011.
[107] François-David Collin, Arnaud Estoup, Jean-Michel Marin, and Louis Raynal. Bringing abc inference to the machine learning realm: abcranger, an optimized random forests library for abc. In JOBIM 2020, volume 2020. 2020.
[108] Peter Bickel, Bo Li, Thomas Bengtsson, and others. Sharp failure rates for the bootstrap particle filter in high dimensions. In Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh, pages 318–329. Institute of Mathematical Statistics, 2008.
[109] S.P. Otto and T. Day. A Biologist's Guide to Mathematical Modeling in Ecology and Evolution. Princeton University Press, 2011. ISBN 9781400840915.
[110] D. R. Cox. Principles of statistical inference. Cambridge University Press, 2006. ISBN 978-0521685672.
[111] Andrew Gelman. The folk theorem of statistical computing. https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/, 5 2008.
[112] Matthew PA. Clark and Brian D. Westerberg. How random is the toss of a coin? Cmaj, 181(12):E306–E308, 2009.
[113] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs. 2018. URL: http://github.com/google/jax.
[114] Wally R Gilks, Andrew Thomas, and David J Spiegelhalter. A language and program for complex bayesian modelling. Journal of the Royal Statistical Society: Series D (The Statistician), 43(1):169–177, 1994.
[115] Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: a probabilistic programming language. Journal of statistical software, 76(1):1–32, 2017.
[116] Maria I Gorinova, Andrew D Gordon, and Charles Sutton. Probabilistic programming with densities in slicstan: efficient, flexible, and deterministic. Proceedings of the ACM on Programming Languages, 3(POPL):1–30, 2019.
[117] Max Kochurov, Colin Carroll, Thomas Wiecki, and Junpeng Lao. Pymc4: exploiting coroutines for implementing a probabilistic programming framework. Program Transformations for ML Workshop at NeurIPS, 2019. URL: https://openreview.net/forum?id=rkgzj5Za8H.
[118] George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762, 2019.
[119] Brandon T Willard. Minikanren as a tool for symbolic computation in python. arXiv preprint arXiv:2005.11644, 2020.
[120] Ohad Kammar, Sam Lindley, and Nicolas Oury. Handlers in action. ACM SIGPLAN Notices, 48(9):145–158, 2013.
[121] Frank Wood, Jan Willem Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In Artificial Intelligence and Statistics, 1024–1032. PMLR, 2014.
[122] David Tolpin, Jan-Willem van de Meent, Hongseok Yang, and Frank Wood. Design and implementation of probabilistic programming language anglican. In Proceedings of the 28th Symposium on the Implementation and Application of Functional programming Languages, 1–12. 2016.
[123] Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: deep universal probabilistic programming. The Journal of Machine Learning Research, 20(1):973–978, 2019.
[124] Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and accelerated probabilistic programming in numpyro. arXiv preprint arXiv:1912.11554, 2019.
[125] Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, and Rif A Saurous. Simple, distributed, and accelerated probabilistic programming. arXiv preprint arXiv:1811.02091, 2018.
[126] Dave Moore and Maria I Gorinova. Effect handling for composable program transformations in edward2. arXiv preprint arXiv:1811.06150, 2018.
[127] Maria Gorinova, Dave Moore, and Matthew Hoffman. Automatic reparameterisation of probabilistic programs. In International Conference on Machine Learning, 3648–3657. PMLR, 2020.
[128] Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756, 2018.
[129] Allen B. Downey. Think Stats: Exploratory Data Analysis. O'Reilly Media;, 2014.
[130] Peter H Westfall. Kurtosis as peakedness, 1905–2014. rip. The American Statistician, 68(3):191–195, 2014.
[131] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley, 2012. ISBN 9781118585771.
[132] Hirotogu Akaike. Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike, pages 199–213. Springer, 1998.
[133] Sumio Watanabe and Manfred Opper. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of machine learning research, 2010.
[134] Aki Vehtari, Andrew Gelman, and Jonah Gabry. Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and computing, 27(5):1413–1432, 2017.
[135] W.R. Gilks, S. Richardson, and D. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC Interdisciplinary Statistics. CRC Press, 1995. ISBN 9781482214970.
[136] Nial Friel and Jason Wyse. Estimating the evidence–a review. Statistica Neerlandica, 66(3):288–308, 2012.
[137] Radford M Neal. Contribution to the discussion of “approximate bayesian inference with the weighted likelihood bootstrap” by michael a. newton and adrian e. raftery. Journal of the Royal Statistical Society. Series B (Methodological), 56:41–42, 1994.
[138] Quentin F Gronau, Alexandra Sarafoglou, Dora Matzke, Alexander Ly, Udo Boehm, Maarten Marsman, David S Leslie, Jonathan J Forster, Eric-Jan Wagenmakers, and Helen Steingroever. A tutorial on bridge sampling. Journal of mathematical psychology, 81:80–97, 2017.
[139] Danielle Navarro. A personal essay on bayes factors. PsyArXiv, 2020.
[140] Daniel J Schad, Bruno Nicenboim, Paul-Christian Bürkner, Michael Betancourt, and Shravan Vasishth. Workflow techniques for the robust use of bayes factors. arXiv preprint arXiv:2103.08744, 2021.
[141] E.A. Abbott and R. Jann. Flatland: A Romance of Many Dimensions. Oxford World's Classics. OUP Oxford, 2008. ISBN 9780199537501.
[142] Gael M Martin, David T Frazier, and Christian P Robert. Computing bayes: bayesian computation from 1763 to the 21st century. arXiv preprint arXiv:2004.06425, 2020.
[143] Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid monte carlo. Physics letters B, 195(2):216–222, 1987.
[144] S. Brooks, A. Gelman, G. Jones, and X.L. Meng. Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, 2011. ISBN 9781420079425.
[145] Michael Betancourt. A conceptual introduction to hamiltonian monte carlo. arXiv preprint arXiv:1701.02434, 2017.
[146] Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman. Yes, but did it work?: evaluating variational inference. In International Conference on Machine Learning, 5581–5590. PMLR, 2018.
[147] David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: a review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
[148] Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M. Blei. Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1):430–474, 2017.
[149] J.K. Kruschke. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Academic Press. Academic Press, 2015. ISBN 9780124058880.