Article Contents
Article Contents

# A Bayesian nonparametric test for conditional independence

Supported by EPSRC grant EP/R013519/1
• This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.

Mathematics Subject Classification: Primary: 62-08; Secondary: 62G10.

 Citation:

• Figure 1.  Construction of a Pólya tree distribution on $\Omega = [0,1]$. From each set $C_\ast$, a particle of probability mass passes to the left with (random) probability $\theta_{\ast0}$ and to the right with probability $\theta_{\ast1} = 1-\theta_{\ast0}$, with all $\theta_\ast$ being independently Beta-distributed as described in the main text

Figure 2.  Pseudocode for the proposed Bayesian nonparametric test for conditional independence

Figure 3.  Application of the proposed Bayesian testing procedure to four synthetic datasets supported on $[0,1]^3$, chosen such that all combinations of unconditional and conditional dependence/independence are represented. The final column gives the ensemble of probabilities of conditional dependence $p(H_1|W)$ output by the test over 100 repetitions at varying values of data size $N$, with the blue line representing the median, and the dark and light shaded regions representing the (25, 75)-percentile and (5, 95)-percentile ranges respectively

Figure 4.  Marginal scatter plots from the CalCOFI Bottle dataset showing the pairwise relationships between $\texttt{Salnty}$, $\texttt{Oxy_µmol.Kg}$ and $\texttt{T_degC}$. The nonlinear nature of the dependences is immediately apparent

Figure 5.  Example pairwise dependence graphs output by the Bayesian conditional independence test for five variables from the CalCOFI dataset, conditional on $\texttt{T_degC}$, for four different sizes of subsample drawn from the complete dataset. The numbers associated with each edge are the posterior probabilities of conditional dependence $p(H_1|W^{(N)})$ and are given to two decimal places; where no edge is shown, this indicates $p(H_1|W^{(N)})<0.005$

Figure 6.  Box-plots giving the output posterior probability of conditional dependence $p(H_1|W^{(N)})$ for 100 repetitions of the Bayesian conditional independence test applied to randomly-drawn subsamples of various sizes $N$ from the CalCOFI dataset. The left-hand plot gives a representative example of a pair of variables conditionally dependent given $\texttt{T_degC}$, while the right-hand plot gives a representative conditionally independent pair

Figure 7.  Top left: Heat map of conditional marginal likelihood values for the three constituent models over $\Omega_X$, $\Omega_Y$ and $\Omega_XY$ for the second and third models of Figure 3. Top right: 'Slices' from this heatmap with $\rho = 0.5$. Bottom: Test outputs for 100 repetitions of the second and third models of Figure 3. Red plots fix $c = 1$ (output identical to Figure 3), while the blue plots use the optimising values $\hat{c}$ from the plot above

•  [1] J. O. Berger and A. Guglielmi, Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives, J. Amer. Statist. Assoc., 96 (2001), 174-184.  doi: 10.1198/016214501750333045. [2] W. Bergsma, Testing conditional independence for continuous random variables, Report Eurandom, 2004. [3] T. B. Berrett, Y. Wang, R. F. Barber and R. J. Samworth, The conditional permutation test for independence while controlling for confounders, J. R. Stat. Soc. B, 82 (2020), 175-197.  doi: 10.1111/rssb.12340. [4] E. Candès, Y. Fan, L. Janson and J. Lv, Panning for gold: Model-X knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 551-577.  doi: 10.1111/rssb.12265. [5] G. Doran, K. Muandet, K. Zhang and B. Schölkopf, A permutation-based kernel conditional independence test, Proc. 30th Conf. UAI, 132–141. [6] M. Escobar and M. West, Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90 (1995), 577-588.  doi: 10.1080/01621459.1995.10476550. [7] S. Filippi and C. Holmes, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Anal., 12 (2017), 919-938.  doi: 10.1214/16-BA1027. [8] R. Fisher, The distribution of the partial correlation coefficient, Metron, 3 (1924), 329-332. [9] K. Fukumizu, A. Gretton, X. Sun and B. Schölkopf, Kernel measures of conditional dependence, Adv. Neural Inf. Process. Syst., 20, 489–496. [10] S. Ghosal and A. van der Vaart, Fundamentals of Nonparametric Bayesian Inference, Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017. doi: 10.1017/9781139029834. [11] J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics, Springer-Verlag, New York, 2003. [12] P. Giudici, Bayes factors for zero partial covariances, J. Statist. Plann. Inference, 46 (1995), 161-174.  doi: 10.1016/0378-3758(94)00101-Z. [13] T. E. Hanson, Inference for mixtures of finite Pólya tree models, J. Amer. Statist. Assoc., 101 (2006), 1548-1565.  doi: 10.1198/016214506000000384. [14] T. Hanson and W. O. Johnson, Modeling regression error with a mixture of Pólya trees, J. Amer. Statist. Assoc., 97 (2002), 1020-1033.  doi: 10.1198/016214502388618843. [15] N. Harris and M. Drton, PCalgorithm for nonparanormal graphical models, J. Mach. Learn. Res., 14 (2013), 3365-3383. [16] P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf, Nonlinear causal discovery with additive noise models, Adv. Neural Inf. Process. Syst. 21, 689–696. [17] T.-M. Huang, Testing conditional independence using maximal nonlinear conditional correlation, Ann. Statist., 38 (2010), 2047-2091.  doi: 10.1214/09-AOS770. [18] R. E. Kass and A. E. Raftery, Bayes factors, J. Amer. Statist. Assoc., 90 (1995), 773-795.  doi: 10.1080/01621459.1995.10476572. [19] T. Kunihama and D. B. Dunson, Nonparametric Bayes inference on conditional independence, Biometrika, 103 (2016), 35-47.  doi: 10.1093/biomet/asv060. [20] M. Lavine, Some aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 20 (1992), 1222-1235.  doi: 10.1214/aos/1176348767. [21] M. Lavine, More aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 22 (1994), 1161-1176.  doi: 10.1214/aos/1176325623. [22] L. Ma, Adaptive testing of conditional association through recursive mixture modeling, J. Amer. Statist. Assoc., 108 (2013), 1493-1505.  doi: 10.1080/01621459.2013.838899. [23] L. Ma, Recursive partitioning and multi-scale modeling on conditional densities, Electron. J. Stat., 11 (2017), 1297-1325.  doi: 10.1214/17-EJS1254. [24] D. J. C. MacKay,  Information Theory, Inference and Learning Algorithms, Cambridge University Press, 2003. [25] D. Margaritis, Distribution-free learning of bayesian network structure in continuous domains, Proc. 20th Nat. Conf. Artificial Intel., (2005), 825–830. [26] R. D. Mauldin, W. D. Sudderth and S. C. Williams, Pólya trees and random distributions, Ann. Statist., 20 (1992), 1203-1221.  doi: 10.1214/aos/1176348766. [27] S. M. Paddock, Randomized Pólya Trees: Bayesian Nonparametrics for Multivariate Data Analysis, Thesis (Ph.D.)–Duke University. 1999. [28] J. Pearl,  Causality: Models, Reasoning, and Inference, Cambridge University Press, 2009.  doi: 10.1017/CBO9780511803161. [29] J. Peters,  D. Janzing and  B. Schölkopf,  Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press, Cambridge, MA, 2017. [30] J. Peters, J. Mooij, D. Janzing and B. Schölkopf, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15 (2014), 2009-2053. [31] J. Ramsey, A scalable conditional independence test for nonlinear, non-Gaussian data, arXiv: 1401.5031. [32] J. Runge, Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information, arXiv: 1709.01447. [33] F. Saad and V. Mansinghka, Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes, Proc. Mach. Learn. Res., 46 (2017), 632-641. [34] R. Shah and J. Peters, The hardness of conditional independence testing and the generalised covariance measure, arXiv: 1804.07203. [35] P. Spirtes and C. Glymour, An algorithm for fast recovery of sparse causal graphs, Soc. Sci. Comput. Rev., 9 (1991), 62-72.  doi: 10.1177/089443939100900106. [36] E. Strobl, K. Zhang and S. Visweswaran, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, J. Causal Inference, (2019), 20180017. doi: 10.1515/jci-2018-0017. [37] L. Su and H. White, A consistent characteristic function-based test for conditional independence, J. Econom., 141 (2007), 807-834.  doi: 10.1016/j.jeconom.2006.11.006. [38] L. Su and H. White, A nonparametric Hellinger metric test for conditional independence, Econom. Theory, 24 (2008), 829-864.  doi: 10.1017/S0266466608080341. [39] W. H. Wong and L. Ma, Optional Pólya tree and Bayesian inference, Ann. Statist., 38 (2010), 1433-1459.  doi: 10.1214/09-AOS755. [40] Q. Zhang, S. Filippi, S. Flaxman and D. Sejdinovic, Feature-to-feature regression for a two-step conditional independence test, Proc. 33rd Conf. UAI, 2017. [41] K. Zhang, J. Peters, D. Janzing and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, arXiv: 1202.3775. [42] J. Zhang, L. Yang and X. Wu, Pólya tree priors and their estimation with multi-group data, Stat. Pap., 60 (2019), 499-525.  doi: 10.1007/s00362-016-0852-x.

Figures(7)