# American Institute of Mathematical Sciences

June  2020, 2(2): 155-172. doi: 10.3934/fods.2020009

## A Bayesian nonparametric test for conditional independence

 Department of Mathematics, Imperial College London, UK

Published  July 2020

Fund Project: Supported by EPSRC grant EP/R013519/1

This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.

Citation: Onur Teymur, Sarah Filippi. A Bayesian nonparametric test for conditional independence. Foundations of Data Science, 2020, 2 (2) : 155-172. doi: 10.3934/fods.2020009
##### References:

show all references

##### References:
Construction of a Pólya tree distribution on $\Omega = [0,1]$. From each set $C_\ast$, a particle of probability mass passes to the left with (random) probability $\theta_{\ast0}$ and to the right with probability $\theta_{\ast1} = 1-\theta_{\ast0}$, with all $\theta_\ast$ being independently Beta-distributed as described in the main text
Pseudocode for the proposed Bayesian nonparametric test for conditional independence
Application of the proposed Bayesian testing procedure to four synthetic datasets supported on $[0,1]^3$, chosen such that all combinations of unconditional and conditional dependence/independence are represented. The final column gives the ensemble of probabilities of conditional dependence $p(H_1|W)$ output by the test over 100 repetitions at varying values of data size $N$, with the blue line representing the median, and the dark and light shaded regions representing the (25, 75)-percentile and (5, 95)-percentile ranges respectively
Marginal scatter plots from the CalCOFI Bottle dataset showing the pairwise relationships between $\texttt{Salnty}$, $\texttt{Oxy_µmol.Kg}$ and $\texttt{T_degC}$. The nonlinear nature of the dependences is immediately apparent
Example pairwise dependence graphs output by the Bayesian conditional independence test for five variables from the CalCOFI dataset, conditional on $\texttt{T_degC}$, for four different sizes of subsample drawn from the complete dataset. The numbers associated with each edge are the posterior probabilities of conditional dependence $p(H_1|W^{(N)})$ and are given to two decimal places; where no edge is shown, this indicates $p(H_1|W^{(N)})<0.005$
Box-plots giving the output posterior probability of conditional dependence $p(H_1|W^{(N)})$ for 100 repetitions of the Bayesian conditional independence test applied to randomly-drawn subsamples of various sizes $N$ from the CalCOFI dataset. The left-hand plot gives a representative example of a pair of variables conditionally dependent given $\texttt{T_degC}$, while the right-hand plot gives a representative conditionally independent pair
Top left: Heat map of conditional marginal likelihood values for the three constituent models over $\Omega_X$, $\Omega_Y$ and $\Omega_XY$ for the second and third models of Figure 3. Top right: 'Slices' from this heatmap with $\rho = 0.5$. Bottom: Test outputs for 100 repetitions of the second and third models of Figure 3. Red plots fix $c = 1$ (output identical to Figure 3), while the blue plots use the optimising values $\hat{c}$ from the plot above
 [1] Tomáš Smejkal, Jiří Mikyška, Jaromír Kukal. Comparison of modern heuristics on solving the phase stability testing problem. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1161-1180. doi: 10.3934/dcdss.2020227

Impact Factor: