This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.
Citation: |
Figure 1.
Construction of a Pólya tree distribution on
Figure 3.
Application of the proposed Bayesian testing procedure to four synthetic datasets supported on
Figure 5.
Example pairwise dependence graphs output by the Bayesian conditional independence test for five variables from the CalCOFI dataset, conditional on
Figure 6.
Box-plots giving the output posterior probability of conditional dependence
Figure 7.
Top left: Heat map of conditional marginal likelihood values for the three constituent models over
[1] | J. O. Berger and A. Guglielmi, Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives, J. Amer. Statist. Assoc., 96 (2001), 174-184. doi: 10.1198/016214501750333045. |
[2] | W. Bergsma, Testing conditional independence for continuous random variables, Report Eurandom, 2004. |
[3] | T. B. Berrett, Y. Wang, R. F. Barber and R. J. Samworth, The conditional permutation test for independence while controlling for confounders, J. R. Stat. Soc. B, 82 (2020), 175-197. doi: 10.1111/rssb.12340. |
[4] | E. Candès, Y. Fan, L. Janson and J. Lv, Panning for gold: Model-X knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 551-577. doi: 10.1111/rssb.12265. |
[5] | G. Doran, K. Muandet, K. Zhang and B. Schölkopf, A permutation-based kernel conditional independence test, Proc. 30th Conf. UAI, 132–141. |
[6] | M. Escobar and M. West, Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90 (1995), 577-588. doi: 10.1080/01621459.1995.10476550. |
[7] | S. Filippi and C. Holmes, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Anal., 12 (2017), 919-938. doi: 10.1214/16-BA1027. |
[8] | R. Fisher, The distribution of the partial correlation coefficient, Metron, 3 (1924), 329-332. |
[9] | K. Fukumizu, A. Gretton, X. Sun and B. Schölkopf, Kernel measures of conditional dependence, Adv. Neural Inf. Process. Syst., 20, 489–496. |
[10] | S. Ghosal and A. van der Vaart, Fundamentals of Nonparametric Bayesian Inference, Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017. doi: 10.1017/9781139029834. |
[11] | J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics, Springer-Verlag, New York, 2003. |
[12] | P. Giudici, Bayes factors for zero partial covariances, J. Statist. Plann. Inference, 46 (1995), 161-174. doi: 10.1016/0378-3758(94)00101-Z. |
[13] | T. E. Hanson, Inference for mixtures of finite Pólya tree models, J. Amer. Statist. Assoc., 101 (2006), 1548-1565. doi: 10.1198/016214506000000384. |
[14] | T. Hanson and W. O. Johnson, Modeling regression error with a mixture of Pólya trees, J. Amer. Statist. Assoc., 97 (2002), 1020-1033. doi: 10.1198/016214502388618843. |
[15] | N. Harris and M. Drton, PCalgorithm for nonparanormal graphical models, J. Mach. Learn. Res., 14 (2013), 3365-3383. |
[16] | P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf, Nonlinear causal discovery with additive noise models, Adv. Neural Inf. Process. Syst. 21, 689–696. |
[17] | T.-M. Huang, Testing conditional independence using maximal nonlinear conditional correlation, Ann. Statist., 38 (2010), 2047-2091. doi: 10.1214/09-AOS770. |
[18] | R. E. Kass and A. E. Raftery, Bayes factors, J. Amer. Statist. Assoc., 90 (1995), 773-795. doi: 10.1080/01621459.1995.10476572. |
[19] | T. Kunihama and D. B. Dunson, Nonparametric Bayes inference on conditional independence, Biometrika, 103 (2016), 35-47. doi: 10.1093/biomet/asv060. |
[20] | M. Lavine, Some aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 20 (1992), 1222-1235. doi: 10.1214/aos/1176348767. |
[21] | M. Lavine, More aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 22 (1994), 1161-1176. doi: 10.1214/aos/1176325623. |
[22] | L. Ma, Adaptive testing of conditional association through recursive mixture modeling, J. Amer. Statist. Assoc., 108 (2013), 1493-1505. doi: 10.1080/01621459.2013.838899. |
[23] | L. Ma, Recursive partitioning and multi-scale modeling on conditional densities, Electron. J. Stat., 11 (2017), 1297-1325. doi: 10.1214/17-EJS1254. |
[24] | D. J. C. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, 2003. |
[25] | D. Margaritis, Distribution-free learning of bayesian network structure in continuous domains, Proc. 20th Nat. Conf. Artificial Intel., (2005), 825–830. |
[26] | R. D. Mauldin, W. D. Sudderth and S. C. Williams, Pólya trees and random distributions, Ann. Statist., 20 (1992), 1203-1221. doi: 10.1214/aos/1176348766. |
[27] | S. M. Paddock, Randomized Pólya Trees: Bayesian Nonparametrics for Multivariate Data Analysis, Thesis (Ph.D.)–Duke University. 1999. |
[28] | J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, 2009. doi: 10.1017/CBO9780511803161. |
[29] | J. Peters, D. Janzing and B. Schölkopf, Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press, Cambridge, MA, 2017. |
[30] | J. Peters, J. Mooij, D. Janzing and B. Schölkopf, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15 (2014), 2009-2053. |
[31] | J. Ramsey, A scalable conditional independence test for nonlinear, non-Gaussian data, arXiv: 1401.5031. |
[32] | J. Runge, Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information, arXiv: 1709.01447. |
[33] | F. Saad and V. Mansinghka, Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes, Proc. Mach. Learn. Res., 46 (2017), 632-641. |
[34] | R. Shah and J. Peters, The hardness of conditional independence testing and the generalised covariance measure, arXiv: 1804.07203. |
[35] | P. Spirtes and C. Glymour, An algorithm for fast recovery of sparse causal graphs, Soc. Sci. Comput. Rev., 9 (1991), 62-72. doi: 10.1177/089443939100900106. |
[36] | E. Strobl, K. Zhang and S. Visweswaran, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, J. Causal Inference, (2019), 20180017. doi: 10.1515/jci-2018-0017. |
[37] | L. Su and H. White, A consistent characteristic function-based test for conditional independence, J. Econom., 141 (2007), 807-834. doi: 10.1016/j.jeconom.2006.11.006. |
[38] | L. Su and H. White, A nonparametric Hellinger metric test for conditional independence, Econom. Theory, 24 (2008), 829-864. doi: 10.1017/S0266466608080341. |
[39] | W. H. Wong and L. Ma, Optional Pólya tree and Bayesian inference, Ann. Statist., 38 (2010), 1433-1459. doi: 10.1214/09-AOS755. |
[40] | Q. Zhang, S. Filippi, S. Flaxman and D. Sejdinovic, Feature-to-feature regression for a two-step conditional independence test, Proc. 33rd Conf. UAI, 2017. |
[41] | K. Zhang, J. Peters, D. Janzing and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, arXiv: 1202.3775. |
[42] | J. Zhang, L. Yang and X. Wu, Pólya tree priors and their estimation with multi-group data, Stat. Pap., 60 (2019), 499-525. doi: 10.1007/s00362-016-0852-x. |