


Talks


Kenji Fukumizu
Department of Mathematical Analysis and Statistical Inference, The Institute of Statistical Mathematics, Japan


Measuring dependence and conditional dependence with kernels (slides)
Measuring (conditional) dependence and independence is an essential part of causal learning. In this talk, I will present kernel methods for measuring (conditional) dependence and independence. In the first half of this talk, some topics on the HilbertSchmidt Independence Criterion (HSIC), which is a kernel measure of independence, are discussed: a connection with a recent independence measure called distance covariance is presented, and a way of choosing a kernel is discussed for the efficiency of independence tests. In the latter half, I will introduce a normalized (conditional) covariance operator and discuss its properties, including a relation to the chisquare divergence and others.



Zhi Geng
School of Mathematical Sciences, Peking University, China


Causal Effect Evaluation and Causal Network Learning
(slides)
Statistical approaches for evaluating causal effects and for learning causal networks are discussed. A spurious association between two variables may appear due to the existence of an unobserved variable, called YuleSimpson paradox. To evaluate the effect of a treatment variable on an endpoint variable, a surrogate variable is often used when it is difficult to observe the endpoint variable. We present a surrogate paradox, that is, treatment has a positive effect on the surrogate and the surrogate has a positive effect on the endpoint, but the treatment has a negative effect on the endpoint. We discuss the conditions for avoiding YuleSimpson paradox and the surrogate paradox. Causal relationships among variables can be represented by a causal network which is depicted by a directed acyclic graph (DAG) or a Bayesian network. We present three approaches for learning the causal network from data: decomposing learning, active learning and local learning.



Isabelle Guyon
ClopiNet


Causal discovery as a machine learning problem: the contribution of challenges
Since 2007, we have been organizing challenges in causal discovery. But, it in not until recently that we started making significant improvements over the stateoftheart using challenges as a means of crowdsourcing hard research problems. In two challenges we organized in 2014 and 2015, we formulated causal discovery as a simple classification problem: given observations of samples from a pair of variables, does A cause B or not? In contrast with modelbased approaches, which try to recover causal relationships by making hypotheses about the data generating process (the causal mechanism), this setup favors treating causal discovery as a pattern recognition problem.
We let the participants solve this difficult inverse problem by providing them with lots of "training examples" (examples of pairs for which A causes B and other pairs). Many of them treated the problem as a learning problem: they extracted features of the joint distribution and trained a learning machine to "recognize" causal relationships.
In the first challenge (the "CauseEffect Pair" challenge  phase 1: http://www.causality.inf.ethz.ch/causeeffect.php, phase 2: https://www.codalab.org/competitions/1381) thousands of real and artificial pairs of variable were provided. The variable pairs were isolated from their context and samples were drawn independently. In the second challenge (Neural Connectomics Challenge http://connectomics.chalearn.org/), the problem was to reconstruct a network of neural cells using simulated temporal time series of neural activity. Both challenges made a significant advance over baseline methods (from AUC~0.60 to AUC~0.83 in the first challenge and from AUC~0.88 to AUC~0.94 in the second one). The code of the winners has been made publicly available.
To demystify these amazing results, our presentation will analyze them and provide insights into how and why these methods work.
Acknowledgements: Both challenges were organized by ChaLearn, a nonprofit organization with US tax exempt status. They involved large teams of volunteer coworkers, see the complete list on our websites. The main contributors of the "causeeffect pair" challenge are Ben Hamner, Mikael Henaff, Mehreen Saeed, and Alexander Statnikov. The main contributors of the connectomics challenge are Demian Battaglia, Javier Orlandi, Jordi Soriano Fradera, Olav Stetter, Bisakha Ray, Mehreen Saeed, and Alexander Statnikov. The "causeeffect pair" challenge was supported by the EU Pascal2 network of excellence and hosted by Kaggle.com in its phase 1 and was supported by Microsoft and hosted by Codalab.org in its phase 2. The connectomics challenge was supported by the Marie Curie Action Program of the EU framework FP7 and Microsoft and was hosted by Kaggle.com. Other sponsors who donated prizes are listed on our website and are gratefully thanked.



Yan Liu
Computer Science Department, University of Southern California


Recent Advances of Granger Causality in Largescale Time Series Data
In the era of data deluge, we are confronted with largescale time series data, i.e., sequences of observations of concerned variables over a period of time. For example, petabytes of climate and meteorological data, such as temperature, solar radiation, and precipitation, are collected over the years; and exabytes of social media contents are generated over time on the Internet. A major task for time series data analysis is to uncover the temporal causal relationships among the time series. For example, in the climatology, we want to identify the factors that impact the climate patterns of certain regions. In social networks, we are interested in identification of the patterns of influence among users and how topics activate or suppress each other. Therefore developing effective and scalable data mining algorithms to uncover temporal dependency structures between time series and reveal insights from data has become a key problem in machine learning and data mining. In this talk, we will discuss recent developments on Granger causality for largescale time series data. In particular, we will focus on the practical challenges in largescale applications, such as nonlinearity, hidden variables and missing data.



Yi Liu
School of Computer and Information Technology, Beijing Jiaotong University, China


Emerging causal inference problems in molecular systems biology
(slides)
In this talk, I introduce several emerging causal inference problems in molecular systems biology. Nowadays, fast development and the wide application of the Next Generation Sequencing (NGS) technology have made the analysis of big deepsequencing data sets a routine task in systems biology research. While general data mining and machine learning methods are broadly applicable in this vein, I argue that causal inference will specifically play a vital role in the mining and elicitation of precise biological knowledge from those data since causality is implicit yet of key importance in many forms of biological investigations. For example, we all eager to know which somatic mutations and DNA copy number alternations (CNAs) are true causes for a certain cancer, and via which metabolic, genetic and epigenetic perturbations can this malignancy be (causally) reversed. While such goal might be too grand to achieve now, we report our preliminary attempts in applying causal inference approaches to study biological questions: 1) inferring causal relationships between transcription factors, epigenetic modifications and gene expression level in human/mouse embryonic stem cells and CD4+ T cells from heterogeneous deep sequencing data; 2) reverseengineering the Yeast genetic regulatory network from deletionmutant gene expression data; 3) Discovering uncategorized subtypes of ovarian cancer that differ significantly in survival time and uncovering key molecular signatures that distinguish these subtypes. Finally, I will also briefly sketch other causal inference problems in biology that machine learning researchers might be interested.



PoLing Loh
Dept. Statistics, University of California, Berkeley


Highdimensional learning of linear causal networks via inverse covariance estimation
(paper)
We present a new framework for statistical estimation of directed acyclic graphs (DAGs) when data are generated from a linear, possibly nonGaussian structural equation model. Our framework consists of two parts: (1) inferring the moralized graph from the support of the inverse covariance matrix; and (2) selecting the bestscoring graph amongst DAGs that are consistent with the moralized graph. We show that when the error variances are known or estimated to close precision, the true DAG is the unique minimizer of the reweighted squared l_2loss. Using a dynamic programming algorithm, the true DAG may be obtained in linear time when the moralized graph has bounded treewidth. We also provide conditions for statistical consistency of our algorithm in highdimensional settings.



Karthika Mohan
Dept. Computer Science, University of California, Los Angeles


Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
(paper)
We propose an efficient family of algorithms to learn the parameters of a Bayesian network from incomplete data.
In contrast to textbook approaches such as EM and the gradient method, our approach is noniterative, yields closed form parameter estimates, and eliminates the need for inference in a Bayesian network. Our approach provides consistent parameter estimates for missing data problems that are MCAR, MAR, and in some cases, MNAR. Empirically, our approach is orders of magnitude faster than EM (as our approach requires no inference). Given sufficient data, we learn parameters that can be orders of magnitude more accurate.



Shohei Shimizu
Department of Reasoning for Intelligence, Osaka University, Japan


Estimation of causal direction in the presence of latent confounders and linear nonGaussian structural equation models
(slides)
Several existing methods have been shown to consistently estimate causal direction assuming linear or some form of nonlinear relationship and no latent confounders. However, the estimation results could be distorted if either assumption is violated. We develop an approach to determining the possible causal direction between two observed variables when latent confounding variables are present. We first propose a new linear nonGaussian acyclic structural equation model with individualspecific effects that are sometimes the source of confounding. Thus, modeling individualspecific effects as latent variables allows latent confounding to be considered. We then propose an empirical Bayesian approach for estimating possible causal direction using the new model. We demonstrate the effectiveness of our method using artificial and realworld data.



Nevin L. Zhang
Department of Computer Science & Engineering, The Hong Kong University of Science & Technology, Hong Kong


Latent Tree Analysis of Unlabeled Data
Latent tree models (LTMs) are treestructured probabilistic graphical models where the variables at leaf nodes are observed while those at internal nodes are latent. They represent complex relationships among observed variables and yet are computationally simple to work with. When used for data analysis, they divide observed variables into groups such that the correlations among variables in each group are properly modeled by a single latent variable. As such, LTMs are an effective tool for detecting cooccurrence patterns in binary data and correlation patterns in general data. An LTM typically contains multiple latent variables and each latent variable represents a soft partition of data. As such, LTMs are also a novel tool for clustering that produces multiple partitions of data. In this talk, I will give a highlevel introduction to latent tree analysis and its applications.



Kun Zhang
Dept. Empirical Inference, MaxPlanck Institute for Intelligent Systems, Germany


Learning causal knowledge and learning based on causal knowledge
(slides)
Recently causal discovery has benefited a great deal from statistics and machine learning, and on the other hand, causal information has been demonstrated to be able to facilitate understanding and solving certain machine learning problems. The first part of this talk is concerned with how
three types of "independence"  namely, conditional independence, independent noise, and independent mechanism, enable causal discovery, i.e., learning causal information from purely observational data.
In the second part of the talk, I will consider two machine learning problemssemisupervised learning and domain adaptationfrom a causal point of view, and briefly discuss why and how the underlying causal information helps to understand them.
Acknowledgements: The speaker would like to thank coworkers Bernhard Schölkopf, Dominik Janzing, Joris Mooij, Jonas Peters, Jakob Zscheischler, and Eleni Sgouritsa.



 