Kenji Fukumizu
Department of Mathematical Analysis and Statistical Inference, The Institute of Statistical Mathematics, Japan

Measuring dependence and conditional dependence with kernels (slides)

Measuring (conditional) dependence and independence is an essential part of causal learning. In this talk, I will present kernel methods for measuring (conditional) dependence and independence. In the first half of this talk, some topics on the Hilbert-Schmidt Independence Criterion (HSIC), which is a kernel measure of independence, are discussed: a connection with a recent independence measure called distance covariance is presented, and a way of choosing a kernel is discussed for the efficiency of independence tests. In the latter half, I will introduce a normalized (conditional) covariance operator and discuss its properties, including a relation to the chi-square divergence and others.

Zhi Geng
School of Mathematical Sciences, Peking University, China

Causal Effect Evaluation and Causal Network Learning (slides)

Statistical approaches for evaluating causal effects and for learning causal networks are discussed. A spurious association between two variables may appear due to the existence of an unobserved variable, called Yule-Simpson paradox. To evaluate the effect of a treatment variable on an endpoint variable, a surrogate variable is often used when it is difficult to observe the endpoint variable. We present a surrogate paradox, that is, treatment has a positive effect on the surrogate and the surrogate has a positive effect on the endpoint, but the treatment has a negative effect on the endpoint. We discuss the conditions for avoiding Yule-Simpson paradox and the surrogate paradox. Causal relationships among variables can be represented by a causal network which is depicted by a directed acyclic graph (DAG) or a Bayesian network. We present three approaches for learning the causal network from data: decomposing learning, active learning and local learning.

Isabelle Guyon

Causal discovery as a machine learning problem: the contribution of challenges

Since 2007, we have been organizing challenges in causal discovery. But, it in not until recently that we started making significant improvements over the state-of-the-art using challenges as a means of crowdsourcing hard research problems. In two challenges we organized in 2014 and 2015, we formulated causal discovery as a simple classification problem: given observations of samples from a pair of variables, does A cause B or not? In contrast with model-based approaches, which try to recover causal relationships by making hypotheses about the data generating process (the causal mechanism), this setup favors treating causal discovery as a pattern recognition problem.

We let the participants solve this difficult inverse problem by providing them with lots of "training examples" (examples of pairs for which A causes B and other pairs). Many of them treated the problem as a learning problem: they extracted features of the joint distribution and trained a learning machine to "recognize" causal relationships.

In the first challenge (the "Cause-Effect Pair" challenge -- phase 1:, phase 2: thousands of real and artificial pairs of variable were provided. The variable pairs were isolated from their context and samples were drawn independently. In the second challenge (Neural Connectomics Challenge, the problem was to reconstruct a network of neural cells using simulated temporal time series of neural activity. Both challenges made a significant advance over baseline methods (from AUC~0.60 to AUC~0.83 in the first challenge and from AUC~0.88 to AUC~0.94 in the second one). The code of the winners has been made publicly available.

To demystify these amazing results, our presentation will analyze them and provide insights into how and why these methods work.

Acknowledgements: Both challenges were organized by ChaLearn, a non-profit organization with US tax exempt status. They involved large teams of volunteer co-workers, see the complete list on our websites. The main contributors of the "cause-effect pair" challenge are Ben Hamner, Mikael Henaff, Mehreen Saeed, and Alexander Statnikov. The main contributors of the connectomics challenge are Demian Battaglia, Javier Orlandi, Jordi Soriano Fradera, Olav Stetter, Bisakha Ray, Mehreen Saeed, and Alexander Statnikov. The "cause-effect pair" challenge was supported by the EU Pascal2 network of excellence and hosted by in its phase 1 and was supported by Microsoft and hosted by in its phase 2. The connectomics challenge was supported by the Marie Curie Action Program of the EU framework FP7 and Microsoft and was hosted by Other sponsors who donated prizes are listed on our website and are gratefully thanked.

Yan Liu
Computer Science Department, University of Southern California

Recent Advances of Granger Causality in Large-scale Time Series Data

In the era of data deluge, we are confronted with large-scale time series data, i.e., sequences of observations of concerned variables over a period of time. For example, petabytes of climate and meteorological data, such as temperature, solar radiation, and precipitation, are collected over the years; and exa-bytes of social media contents are generated over time on the Internet. A major task for time series data analysis is to uncover the temporal causal relationships among the time series. For example, in the climatology, we want to identify the factors that impact the climate patterns of certain regions. In social networks, we are interested in identification of the patterns of influence among users and how topics activate or suppress each other. Therefore developing effective and scalable data mining algorithms to uncover temporal dependency structures between time series and reveal insights from data has become a key problem in machine learning and data mining. In this talk, we will discuss recent developments on Granger causality for large-scale time series data. In particular, we will focus on the practical challenges in large-scale applications, such as nonlinearity, hidden variables and missing data.

Yi Liu
School of Computer and Information Technology, Beijing Jiaotong University, China

Emerging causal inference problems in molecular systems biology (slides)

In this talk, I introduce several emerging causal inference problems in molecular systems biology. Nowadays, fast development and the wide application of the Next Generation Sequencing (NGS) technology have made the analysis of big deep-sequencing data sets a routine task in systems biology research. While general data mining and machine learning methods are broadly applicable in this vein, I argue that causal inference will specifically play a vital role in the mining and elicitation of precise biological knowledge from those data since causality is implicit yet of key importance in many forms of biological investigations. For example, we all eager to know which somatic mutations and DNA copy number alternations (CNAs) are true causes for a certain cancer, and via which metabolic, genetic and epigenetic perturbations can this malignancy be (causally) reversed. While such goal might be too grand to achieve now, we report our preliminary attempts in applying causal inference approaches to study biological questions: 1) inferring causal relationships between transcription factors, epigenetic modifications and gene expression level in human/mouse embryonic stem cells and CD4+ T cells from heterogeneous deep sequencing data; 2) reverse-engineering the Yeast genetic regulatory network from deletion-mutant gene expression data; 3) Discovering uncategorized subtypes of ovarian cancer that differ significantly in survival time and uncovering key molecular signatures that distinguish these subtypes. Finally, I will also briefly sketch other causal inference problems in biology that machine learning researchers might be interested.

Po-Ling Loh
Dept. Statistics, University of California, Berkeley

High-dimensional learning of linear causal networks via inverse covariance estimation (paper)

We present a new framework for statistical estimation of directed acyclic graphs (DAGs) when data are generated from a linear, possibly non-Gaussian structural equation model. Our framework consists of two parts: (1) inferring the moralized graph from the support of the inverse covariance matrix; and (2) selecting the best-scoring graph amongst DAGs that are consistent with the moralized graph. We show that when the error variances are known or estimated to close precision, the true DAG is the unique minimizer of the reweighted squared l_2-loss. Using a dynamic programming algorithm, the true DAG may be obtained in linear time when the moralized graph has bounded treewidth. We also provide conditions for statistical consistency of our algorithm in high-dimensional settings.

Karthika Mohan
Dept. Computer Science, University of California, Los Angeles

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data (paper)

We propose an efficient family of algorithms to learn the parameters of a Bayesian network from incomplete data. In contrast to textbook approaches such as EM and the gradient method, our approach is non-iterative, yields closed form parameter estimates, and eliminates the need for inference in a Bayesian network. Our approach provides consistent parameter estimates for missing data problems that are MCAR, MAR, and in some cases, MNAR. Empirically, our approach is orders of magnitude faster than EM (as our approach requires no inference). Given sufficient data, we learn parameters that can be orders of magnitude more accurate.

Shohei Shimizu
Department of Reasoning for Intelligence, Osaka University, Japan

Estimation of causal direction in the presence of latent confounders and linear non-Gaussian structural equation models (slides)

Several existing methods have been shown to consistently estimate causal direction assuming linear or some form of nonlinear relationship and no latent confounders. However, the estimation results could be distorted if either assumption is violated. We develop an approach to determining the possible causal direction between two observed variables when latent confounding variables are present. We first propose a new linear non-Gaussian acyclic structural equation model with individual-specific effects that are sometimes the source of confounding. Thus, modeling individual-specific effects as latent variables allows latent confounding to be considered. We then propose an empirical Bayesian approach for estimating possible causal direction using the new model. We demonstrate the effectiveness of our method using artificial and real-world data.

Nevin L. Zhang
Department of Computer Science & Engineering, The Hong Kong University of Science & Technology, Hong Kong

Latent Tree Analysis of Unlabeled Data

Latent tree models (LTMs) are tree-structured probabilistic graphical models where the variables at leaf nodes are observed while those at internal nodes are latent. They represent complex relationships among observed variables and yet are computationally simple to work with. When used for data analysis, they divide observed variables into groups such that the correlations among variables in each group are properly modeled by a single latent variable. As such, LTMs are an effective tool for detecting co-occurrence patterns in binary data and correlation patterns in general data. An LTM typically contains multiple latent variables and each latent variable represents a soft partition of data. As such, LTMs are also a novel tool for clustering that produces multiple partitions of data. In this talk, I will give a high-level introduction to latent tree analysis and its applications.

Kun Zhang
Dept. Empirical Inference, Max-Planck Institute for Intelligent Systems, Germany

Learning causal knowledge and learning based on causal knowledge (slides)

Recently causal discovery has benefited a great deal from statistics and machine learning, and on the other hand, causal information has been demonstrated to be able to facilitate understanding and solving certain machine learning problems. The first part of this talk is concerned with how three types of "independence" -- namely, conditional independence, independent noise, and independent mechanism, enable causal discovery, i.e., learning causal information from purely observational data. In the second part of the talk, I will consider two machine learning problems--semi-supervised learning and domain adaptation--from a causal point of view, and briefly discuss why and how the underlying causal information helps to understand them.

Acknowledgements: The speaker would like to thank coworkers Bernhard Schölkopf, Dominik Janzing, Joris Mooij, Jonas Peters, Jakob Zscheischler, and Eleni Sgouritsa.