Advanced Search
Article Contents
Article Contents

Causal discovery in machine learning: Theories and applications

  • * Corresponding author: Ana Rita Nogueira

    * Corresponding author: Ana Rita Nogueira 
The first author is supported by Fundação para a Ciência e Tecnologia (FCT) (Portugal) PhD grant SFRH/BD/146197/2019
Abstract Full Text(HTML) Figure(7) / Table(2) Related Papers Cited by
  • Determining the cause of a particular event has been a case of study for several researchers over the years. Finding out why an event happens (its cause) means that, for example, if we remove the cause from the equation, we can stop the effect from happening or if we replicate it, we can create the subsequent effect. Causality can be seen as a mean of predicting the future, based on information about past events, and with that, prevent or alter future outcomes. This temporal notion of past and future is often one of the critical points in discovering the causes of a given event. The purpose of this survey is to present a cross-sectional view of causal discovery domain, with an emphasis in the machine learning/data mining area.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Overview of the evolution of the term "causality" and the main contributors [48]

    Figure 2.  Example of a DAG

    Figure 3.  Example of a v-structure

    Figure 4.  Example of a Causal Neural Network with one hidden layer [100]

    Figure 5.  Example of a Causal Decision Tree [56] and comparison with a normal Decision Tree

    Figure 6.  GeNie [38]

    Figure 7.  Tetrad [21]

    Table 1.  Survey studies overview

    Survey Title Reference Causal Bayesian Networks Non-bayesian methods Causal discovery over Time Causal discovery in statistics Tools/Frameworks for causal discovery Evaluation Metrics Possible Applications
    Reference Assumptions Constraint-Based BN Score-Bases BN
    Review of CausalDiscovery Methods Based on Graphical Models [30]
    A Review on Algorithms for Constraint-based Causal Discovery [104]
    A review of causal inference for biomedical informatics [50]
    Causal discovery and inference: concepts and recent methodological advances [89]
    A Survey of Learning Causality with Data:Problems and Methods [34]
    Causality and Statistical Learning [27]
    Machine learning for causal inference in Biostatistics [79]
    Causal Interpretabilityfor Machine Learning - Problems, Methods and Evaluation [65]
    *metrics to measure how explainable an algorithm is
     | Show Table
    DownLoad: CSV

    Table 2.  Example of a partial contingency table (in where $ c_k = \{A = a1, B = b1\} $)

    $ c_k=\{A,B\} $ $ C=c_1 $ $ C=c_2 $ Total
    $ D=d_1 $ $ n_{11k} $ $ n_{12k} $ $ n_{1.k} $
    $ D=d_2 $ $ n_{21k} $ $ n_{22k} $ $ n_{2.k} $
    Total $ n_{.1k} $ $ n_{.2k} $ $ n_{..k} $
     | Show Table
    DownLoad: CSV
  • [1] J. Abellán, M. Gómez-Olmedo and S. Moral, Some variations on the PC algorithm, Proceedings of the Third European Workshop on Probabilistic Graphical Models (PGM' 06), 1–8.
    [2] A. Agresti and M. Kateri, Categorical Data Analysis, in International encyclopedia of statistical science, Springer, 2011
    [3] C. F. AliferisI. Tsamardinos and A. Statnikov, HITON: A novel Markov Blanket algorithm for optimal variable selection, AMIA Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 2003 (2003), 21-25. 
    [4] B. Andrews, J. Ramsey and G. F. Cooper, Learning high-dimensional directed acyclic graphs with mixed data-types, in Proceedings of Machine Learning Research (eds. T. D. Le, J. Li, K. Zhang, E. K. P. Cui and A. Hyvärinen), vol. 104 of Proceedings of Machine Learning Research, PMLR, Anchorage, Alaska, USA, (2019), 4–21.
    [5] B. Badsha and A. Q. Fu, Learning causal biological networks with the principle of Mendelian randomization, Frontiers in Genetics, 10 (2019). doi: 10.3389/fgene.2019.00460.
    [6] J. Barnes et al.Complete Works of Aristotle, Volume 1: The Revised Oxford Translation, vol. 1, Princeton University Press, 2014. 
    [7] M. W. Birch, The detection of partial association, I: The 2$\times$ 2 case, Journal of the Royal Statistical Society. Series B (Methodological), 26 (1964), 313-324.  doi: 10.1111/j.2517-6161.1964.tb00564.x.
    [8] J. BollenH. Mao and X. Zeng, Twitter mood predicts the stock market, Journal of Computational Science, 2 (2011), 1-8.  doi: 10.1016/j.jocs.2010.12.007.
    [9] S. L. Bressler and A. K. Seth, Wiener-Granger Causality: A well established methodology, NeuroImage, 58 (2011), 323-329.  doi: 10.1016/j.neuroimage.2010.02.059.
    [10] P. BühlmannM. Kalisch and M. H. Maathuis, Variable selection in high-dimensional linear models: Partially faithful distributions and the pc-simple algorithm, Biometrika, 97 (2010), 261-278.  doi: 10.1093/biomet/asq008.
    [11] B. W. Carlson, Simpson's paradox | Definition, Example, and Explanation, Encyclopedia Britannica, (2019).
    [12] W. Chen, Y. Hu, X. Zhang, L. Wu, K. Liu, J. He, Z. Tang, X. Song, L. R. Waitman and M. Liu, Causal risk factor discovery for severe acute kidney injury using electronic health records, BMC Medical Informatics and Decision Making, 18 (2018), 13. doi: 10.1186/s12911-018-0597-7.
    [13] D. M. Chickering, Learning equivalence classes of bayesian-network structures, J. Mach. Learn. Res., 2 (2002), 445-498. 
    [14] T. Claassen and T. Heskes, A structure independent algorithm for causal discovery, Computational Intelligence, 27–29.
    [15] T. Claassen and T. Heskes, Bayesian probabilities for constraint-based causal discovery, IJCAI International Joint Conference on Artificial Intelligence, 2992–2996.
    [16] D. Colombo and M. H. Maathuis, Order-independent constraint-based causal structure learning, J. Mach. Learn. Res., 15 (2014), 3741-3782. 
    [17] D. ColomboM. H. MaathuisM. Kalisch and T. S. Richardson, Learning high-dimensional directed acyclic graphs with latent and selection variables, Annals of Statistics, 40 (2012), 294-321.  doi: 10.1214/11-AOS940.
    [18] A. P. Dawid, Beware of the dag!, in NIPS Causality: Objectives and Assessment, (2008).
    [19] R. De MaesschalckD. Jouan-Rimbaud and D. L. Massart, The mahalanobis distance, Chemometrics and Intelligent Laboratory Systems, 50 (2000), 1-18.  doi: 10.1016/S0169-7439(99)00047-7.
    [20] M. Ding, Y. Chen and S. L. Bressler, 17 granger causality: basic theory and application to neuroscience, Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, 437.
    [21] C. f. C. Discovery, Ccd-2015-1, Summer Workshop - 2015.
    [22] F. K. Došilović, M. Brčić and N. Hlupić, Explainable artificial intelligence: A survey, in 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), IEEE, (2018), 0210–0215.
    [23] M. J. Druzdzel, SMILE: Structural Modeling, Inference, and Learning Engine and GeNie: A Development Environment for Graphical Decision-Theoretic Models, Technical report, 1999.
    [24] I. Ebert-Uphoff and Y. Deng, Causal discovery for climate research using graphical models, Journal of Climate, 25 (2012), 5648-5665.  doi: 10.1175/JCLI-D-11-00387.1.
    [25] A. Falcon, Aristotle on causality, in The Stanford Encyclopedia of Philosophy (ed. E. N. Zalta), spring 2015 edition, Metaphysics Research Lab, Stanford University, (2015).
    [26] J. L. Fleiss, B. Levin and M. C. Paik, Statistical Methods for Rates and Proportions, John Wiley & Sons, 2003. doi: 10.1002/0471445428.
    [27] A. Gelman, Causality and statistical learning, American Journal of Sociology, 117 (2011), 955-966. 
    [28] D. E. GilesL. M. Tedds and G. Werkneh, The Canadian underground and measured economies: Granger causality results, Applied Economics, 34 (2002), 2347-2352.  doi: 10.1080/00036840210148021.
    [29] D. Gillies, Causality, Probability, and Medicine, Routledge, 2018. doi: 10.4324/9781315735542.
    [30] C. Glymour, K. Zhang and P. Spirtes, Review of causal discovery methods based on graphical models, Frontiers in Genetics, 10 (2019), 524. doi: 10.3389/fgene.2019.00524.
    [31] C. N. GlymourThe Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology, MIT press, 2001. 
    [32] C. W. J. Granger, Investigating causal relations by econometric models and cross-spectral methods, Econometrica, 37 (1969), 424-438. 
    [33] H. P. Grice and A. R. White, Symposium: The causal theory of perception, Proceedings of the Aristotelian Society, Supplementary Volumes, 35 (1961), 121-168. 
    [34] R. Guo, L. Cheng, J. Li, P. R. Hahn and H. Liu, A survey of learning causality with data: Problems and methods, ACM Computing Surveys, 53 (2020), 37. doi: 10.1145/3397269.
    [35] I. GuyonA. Elisseeff and C. Aliferis, Causal feature selection, Training, 32 (2007), 1-40. 
    [36] I. Guyon, A. Satnikov and C. Aliferis, Time series analysis with the causality workbench, in NIPS Mini-Symposium on Causality in Time Series, (2011), 115–139.
    [37] A. Hauser and P. Bühlmann, Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs, Journal of Machine Learning Research, 13 (2012), 2409-2464. 
    [38] M. Horný, Bayesian Networks, Technical report, 2014.
    [39] J. Huyssteen, Encyclopedia of Science and Religion, Gale Group, Inc, 2003.
    [40] M. T. Irfan and L. E. Ortiz, Causal strategic inference in a game-theoretic model of multiplayer networked microfinance markets, ACM Trans. Econ. Comput., 6 (2018), Art. 6, 58 pp. doi: 10.1145/3232861.
    [41] A. Janiak, Three concepts of causation in newton, Studies in History and Philosophy of Science Part A, 44 (2013), 396 – 407. doi: 10.1016/j.shpsa.2012.10.009.
    [42] Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun and R. Wang, Discovery of causal rules using partial association, Proceedings - IEEE International Conference on Data Mining, ICDM, (2012), 309–318. doi: 10.1109/ICDM.2012.36.
    [43] M. Kalisch and P. Buehlmann, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, Journal of Machine Learning Research, 8 (2005), 613-636. 
    [44] M. Kalisch, M. Mächler and D. Colombo, Causal Inference with Graphical Models in R Package Pcalg, Technical Report 11, 2012.
    [45] M. KamińskiM. DingW. A. Truccolo and S. L. Bressler, Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance, Biological Cybernetics, 85 (2001), 145-157. 
    [46] K. Karimi, A brief introduction to temporality and causality, preprint, arXiv: 1007.2449.
    [47] A. Khorram, C. W. Ping and L. T. Hui, Causal Knowledge-Driven Approach For Stock Analysis, Technical report, 2011.
    [48] S. KleinbergCausal Inference: Prediction, explanation, and intervention Lecture 2: Regularities, counterfactuals and token causality, Cambridge University Press, Cambridge, 2013. 
    [49] S. Kleinberg, Why: A Guide to Finding and Using Causes, O'Reilly, Sebastopol, CA, 2015.
    [50] S. Kleinberg and G. Hripcsak, A review of causal inference for biomedical informatics, Journal of Biomedical Informatics, 44 (2011), 1102 – 1112. doi: 10.1016/j.jbi.2011.07.001.
    [51] D. Kocacoban and J. Cussens, Online causal structure learning in the presence of latent variables, in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, (2019), 392–395. doi: 10.1109/ICMLA.2019.00073.
    [52] E. KodraS. Chatterjee and A. R. Ganguly, Exploring Granger causality between global average observed time series of carbon dioxide and temperature, Theoretical and Applied Climatology, 104 (2011), 325-335.  doi: 10.1007/s00704-010-0342-3.
    [53] E. Kummerfeld, D. Danks and M. Cognition, Online Learning of Time-varying Causal Structures.,
    [54] T. D. LeT. HoangJ. LiL. LiuH. Liu and S. Hu, A fast pc algorithm for high dimensional causal discovery with multi-core pcs, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16 (2019), 1483-1495. 
    [55] H. D. P. Lee et al., Timaeus and Critias, Penguin, 1971.
    [56] J. LiS. MaT. LeL. Liu and J. Liu, Causal decision trees, IEEE Transactions on Knowledge and Data Engineering, 29 (2017), 257-271. 
    [57] J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin and B. Sun, Mining causal association rules, in Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW, (2013), 114–123. doi: 10.1109/ICDMW.2013.88.
    [58] J. Li, L. Liu and T. D. Le, Practical Approaches to Causal Relationship Analysis, 2015.,
    [59] M. H. Maathuis and D. Colombo, A generalized back-door criterion, The Annals of Statistics, 43 (2015), 1060-1088.  doi: 10.1214/14-AOS1295.
    [60] M. H. MaathuisM. Kalisch and P. Bühlmann, Estimating high-dimensional intervention effects from observational data, Annals of Statistics, 37 (2009), 3133-3164.  doi: 10.1214/09-AOS685.
    [61] D. Malinsky and D. Danks, Causal discovery algorithms: A practical guide, Philosophy Compass, 13 (2017), e12470, 1–11. doi: 10.1111/phc3.12470.
    [62] N. Mantel and W. Haenszel, Statistical aspects of the analysis of data from retrospective studies of disease, Journal of the National Cancer Institute, 22 (1959), 719-748. 
    [63] D. Margaritis and S. Thrun, Bayesian Network Induction via Local neighborhoods, Adv. Neural Inf. Process. Syst., 505–511.
    [64] C. Meek, Graphical Models: Selecting Causal and Statistical Models, PhD thesis.
    [65] R. MoraffahM. KaramiR. GuoA. Raglin and H. Liu, Causal interpretability for machine learning - problems, methods and evaluation, SIGKDD Explor. Newsl., 22 (2020), 18-33. 
    [66] R. E. Neapolitan et al., Learning Bayesian Networks, vol. 38, Pearson Prentice Hall Upper Saddle River, NJ, 2004.
    [67] A. R. Nogueira, J. Gama and C. A. Ferreira, Improving prediction with causal probabilistic variables, in Advances in Intelligent Data Analysis XVIII (eds. M. R. Berthold, A. Feelders and G. Krempl), Springer International Publishing, Cham, (2020), 379–390.
    [68] J. Pearl, On the intepretation of $do(x)$, Journal of Causal Inference, Causal, Casual, and Curious Section, 7.
    [69] J. Pearl, Bayesian networks: A model of self-activated memory for evidential reasoning, in Proceedings of the 7th Conference of the Cognitive Science Society, (1985), 329–334.
    [70] J. Pearl, M. Glymour and N. P. Jewell, Causal Inference in Statistics - A Primer, John Wiley & Sons, Ltd., Chichester, 2016.
    [71] J. M. Pe na, Learning gaussian graphical models of gene networks with false discovery rate control, in European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Springer, (2008), 165–176.
    [72] J. Peters and P. Bühlmann, Structural intervention distance for evaluating causal graphs, Neural Computation, 27 (2015), 771-799.  doi: 10.1162/NECO_a_00708.
    [73] E. Pol, Causality in economics: A menu of approaches, Journal of Reviews on Global Economics, 2 (2013), 356-374. 
    [74] V. K. Raghu, A. Poon and P. V. Benos, Evaluation of causal structure learning methods on mixed data types, Proceedings of Machine Learning Research, 92 (2018), 48.
    [75] J. Ramsey, Improving accuracy and scalability of the pc algorithm by maximizing p-value, preprint, arXiv: 1610.00378.
    [76] J. RamseyM. GlymourR. Sanchez-Romero and C. Glymour, A million variables and more: The Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images, Int. J. Data Sci. Anal., 3 (2017), 121-129.  doi: 10.1007/s41060-016-0032-z.
    [77] J. D. Ramsey, Scaling up greedy causal search for continuous variables, (2015).
    [78] D. A. Rizzi and S. A. Pedersen, Causality in medicine: Towards a Theory and Terminology, (1992).
    [79] S. Rose and D. Rizopoulos, Machine learning for causal inference in Biostatistics, Biostatistics, 21 (2020), 336338. doi: 10.1093/biostatistics/kxz045.
    [80] D. B. Rubin, Estimating causal effects of treatments in randomized and nonrandomized studies, Journal of Educational Psychology, 66 (1974), 688-701.  doi: 10.1037/h0037350.
    [81] D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning Internal Representations by Error Propagation, Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
    [82] F. Russo and J. Williamson, Interpreting causality in the health sciences, International Studies in the Philosophy of Science, 21 (2007), 157-170.  doi: 10.1080/02698590701498084.
    [83] C. Sammut and G. I. Webb, Encyclopedia of Machine Learning, Springer Science & Business Media, 2011. doi: 10.1007/978-0-387-30164-8.
    [84] R. Scheines, P. Spirtes, C. Glymour, C. Meek and T. Richardson, Tetrad 3: Tools for causal modeling–user's manual, CMU Philosophy.
    [85] M. Scutari, Learning Bayesian Networks with the Bnlearn R Package, Technical report, 2009.
    [86] A. Seth, Granger causality, 2007.,
    [87] G. D. Smith and S. Ebrahim, Mendelian randomization: prospects, potentials, and limitations, International Journal of Epidemiology, 33 (2004), 30-42.  doi: 10.1093/ije/dyh132.
    [88] E. SokolovaD. von RheinJ. NaaijenP. GrootT. ClaassenJ. Buitelaar and T. Heskes, Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD, International Journal of Data Science and Analytics, 3 (2017), 105-119.  doi: 10.1007/s41060-016-0034-x.
    [89] P. Spirtes and K. Zhang, Causal discovery and inference: Concepts and recent methodological advances, Applied Informatics, 3 (2016), 1-28.  doi: 10.1186/s40535-016-0018-x.
    [90] P. Spirtes, An anytime algorithm for causal inference, Proceedings of AISTATS, 213–231.
    [91] P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction and Search, Lecture Notes in Statistics, Springer-Verlag, New York, 1993. doi: 10.1007/978-1-4612-2748-9.
    [92] R. StalnakerGame Theory and Decision Theory (Causal and Evidential), Classic Philosophical Arguments, Cambridge University Press, 2018. 
    [93] E. W. Steyerberg, Clinical Prediction Models: A Practical Approach to Development, Validation, vol. 19, 2009.
    [94] B. Stroud, Hume and the idea of causal necessity, Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition, 33 (1978), 39-59.  doi: 10.1007/BF00354280.
    [95] What is Climatology?, The National Drought Mitigation Center, Available from: https://drought.unl.edu/Education/DroughtIn-depth/WhatisClimatology.aspx.
    [96] M. Tsagris, Bayesian network learning with the pc algorithm: An improved and correct variation, Applied Artificial Intelligence, 33 (2019), 101-123.  doi: 10.1080/08839514.2018.1526760.
    [97] I. TsamardinosL. E. Brown and C. F. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning, 65 (2006), 31-78.  doi: 10.1007/s10994-006-6889-7.
    [98] P. Weirich, Causal decision theory, in The Stanford Encyclopedia of Philosophy (ed. E. N. Zalta), winter 2020 edition, Metaphysics Research Lab, Stanford University, (2020).
    [99] N. Wiener, The theory of prediction, Modern Mathematics for Engineers, 1 (1956), 125-139. 
    [100] M. A. Wiering, Evolving causal neural networks, in Benelearn'02: Proceedings of the Twelfth Belgian-Dutch Conference on Machine Learning, (2002), 103–108.
    [101] A. D. Wyner, A definition of conditional mutual information for arbitrary ensembles, Information and Control, 38 (1978), 51-59.  doi: 10.1016/S0019-9958(78)90026-8.
    [102] H. Yamahara and H. Shimakawa, Monitoring of causal relationships on data stream using time segment characteristic, in IEEE International Symposium on Communications and Information Technology, ISCIT 2004., vol. 2, (2004), 779–782.
    [103] H. Yamahara and H. Shimakawa, Monitoring of causal relationships on data stream using time segment characteristic, in IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004., vol. 2, 2004
    [104] K. Yu, J. Li and L. Liu, A Review on Algorithms for Constraint-based Causal Discovery, 2016.,
  • 加载中




Article Metrics

HTML views(3584) PDF downloads(1921) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint