Causality: Perspectives from different disciplines

Talks



Joachim M. Buhmann Dept. Computer Science, ETH Zürich

What is the information content of an algorithm? Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to show robustness to input fluctuations or randomization during execution. This talk elaborates a new framework to measure the "informativeness" of algorithmic procedures and their "stability" against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and thereby the pairwise comparisons are subject to fluctuations.



Patricia Cheng Dept. Psychology, UCLA

Causal invariance in intuitive and scientific causal inference (slides) Scientists' concern with objectivity has led to the dominance of associative statistics, which define the basic concept of independence (i.e., no interaction) on observations only. Our analysis shows that to infer causation, the relevant concept of independence is causal invariance: a causal mechanism remaining unchanged across contexts. To infer causes of a binary outcome (e.g., whether or not a tumor cell is malignant), the associative definition (for both frequentist and Bayesian statistics) results in a logical inconsistency, even for data from an ideal experiment. Moreover, removing the logical incoherence requires defining independence on strictly imaginary causal events. We encapsulated the distinction between the two rather abstract definitions in a simple scenario involving the effects of two treatments. The associative and causal invariances yield opposite recommendations regarding which treatment best achieves a desired outcome. The divergence does not diminish with increased sample size. We posed the scenario as a problem to preschool children in story form. The rationality of the correct answer is so compelling that even preschoolers have no trouble choosing the treatment entailed by the logically coherent causal definition, despite its greater computational demand. The preschoolers' choice cannot be explained by heuristic shortcuts or learning. Our theoretical and empirical findings together suggest that 1) coherence in causal generalization made a sufficiently large difference to survival that it shaped the causal-discovery process in humans, and 2) introducing a new causal statistics may result in more consistent and generalizable causal discoveries in medicine and other sciences. Joint work with Mimi Liljeholm and Catherine Sandhofer.



Frederick Eberhardt Division of the Humanities and Social Sciences, Caltech

Path constraints for causal discovery (slides) A linear Gaussian parameterization of a causal model has the advantage that one can characterize the causal effect of an individual pathway, and that the causal effect from one variable on another decomposes into the causal effects of each connecting pathway, which themselves decompose into the causal effects of each direct cause on such a pathway. This feature, characterized in terms of so-called 'trek-rules' enables the use of efficient discovery algorithms for causal models with feedback and latent variables. These discovery procedures can be adapted to handle discrete models with a noisy-or parameterization by using a version of Patricia Cheng's Power-PC statistic. The identifiability results remain the same. Most recently, we have extended this path-based approach to causal discovery to a general non-parametric setting. Naturally, the identifiability results are now much weaker, since general causal relations, including interactions, are permitted. But the approach enables the inclusion of very general background knowledge and provides a constraint based discovery procedure for an extremely general search space that includes cyclic models with latent variables. This is joint work with Antti Hyttinen, Patrik Hoyer and Matti Järvisalo.



Malcolm R. Forster Dept. Philosophy, University of Wisconsin-Madison

Evidence for probabilistic hypotheses: With applications to causal modeling (slides) All probabilistic models present a well known epistemic puzzle. They are not subject to the logical rule of falsification known as modus tollens (M implies observation O, O is false, therefore M is false). Probabilistic models have probabilistic consequences, and probabilities are not directly observable. Surprisingly, naïve ways of extending modus tollens to probabilistic context do not work (as Sober has pointed out). I present a different proposal couched in terms of the agreement of independent measurements. It applies to probabilistic modeling in general, and has interesting consequences for probabilistic causal modeling in particular: It explains why the probabilistic independence conditions implied by causal models is evidentially important, gives an evidential justification for a faithfulness condition, and even shows how data concerning just two variables X and Y can provide evidence for X -> Y and against Y -> X even though there are no obvious independencies implied by either model.



Alison Gopnik Dept. Psychology, University of California at Berkeley

How children learn about causes: Search, sampling and simulated annealing (slides) How do young children learn so much about the world so quickly and accurately? Many researchers have proposed that children implicitly formulate structured hypotheses about the world and then use evidence to test and revise those hypotheses. I'll describe extensive research has shown over the past ten years that even two-year-olds formulate causal hypotheses and test and evaluate them against the data in a normatively accurate way. But this work raises several problems. Where do those hypotheses come from? What algorithms could children implicitly use to approximate ideal Bayesian inference? How can we reconcile children's striking inferential success with their apparent variability and irrationality? And are there developmental differences in the ways that children and adults learn? I will suggest that all these questions can be illuminated by thinking about children's hypothesis generation as a sampling process - an idea with a long and successful history in computer science. Preschoolers may use sampling to generate hypotheses, and this may explain both their successful inferences and the variability in their behavior. In fact, in recent research we have discovered that children's causal learning has some of the signatures of sampling. In particular, the variability in children's responses reflects the probability of their hypotheses. Moreover, we have shown that children may use a particular type of sampling algorithm. We also found that preschoolers were actually more open-minded learners than older children and adults in some tasks. They seemed more likely to accept a wide range of novel hypotheses. This suggests that they may search at a "higher temperature" than adults do. From an evolutionary perspective the transition from childhood to adulthood may be nature's way of performing simulated annealing.



Chris Hitchcock Dept. Philosophy, California Institute of Technology

Actual causation: Looking backward and looking forward (slides) There has been considerable interest among philosophers and legal theorists in trying to understand the concept of 'actual causation'. This concept figures in causal attributions such as: 'a meteor strike in the Yucatan caused the extinction of the dinosaurs' and 'loose lending practices caused the dramatic increase in housing prices between 2002 and 2007'. I will situate this problem within the context of formal tools for causal modeling that have been developed over the past twenty years. In particular, I will suggest a connection between judgments of actual causation, and certain kinds of planning decisions. One question that will emerge concerns the extent to which judgments of actual causation concern objective features of the causal structure, as opposed to facts about norms, expectations, and the like.



Aapo Hyvärinen Dept. Computer Science and Dept. statistics & Mathematics, University of Helsinki

Estimation of structural equation models using non-Gaussianity (slides) Structural equation models, or linear Bayesian networks, are a fundamental tool for causal analysis of passively observed data sets. However, the classical framework, based on Gaussian variables, suffers from unidentifiability, i.e. the parameters cannot be estimated without some prior information. We have been developing, with Shimizu, Hoyer, Zhang and others, a framework for estimating structural equation models in a non-Gaussian setting which makes such prior information unnecessary. In this talk, I will describe the basic framework and recent developments.



Dominik Janzing MPI for Intelligent Systems

What does `strong causal influence' mean? (slides) Even when the joint distribution and the causal structure among a set of variables is perfectly known, it is still unclear how to quantify the strength of a causal arrow. On the one hand, it seems natural to ask to what extent the statistical dependence between X and Y is due to the arrow X --> Y and to what extent it is due to a different causal path. On the other hand, the question implicitly suggests that dependences caused by different paths behave additively, which is certainly not true. I will mention a few postulates that we considered reasonable for a measure of causal strength. I will explain why known measures don't satisfy the postulates and present a new one that does. However, I will explain why I think that there is not only one reasonable answer to this difficult question. Reference: D. Janzing, D. Balduzzi, M. Grosse-Wentrup, B. Schölkopf: Quantifying causal influences. To appear in Annals of Statistics.



Stephen F. LeRoy Dept. Economics, University of California, Santa Barbara

Causality in Economics (slides) The first part of this lecture consists of a review of analysis of causality in economics. In the immediate postwar period economic theorists associated with the Cowles Commission (Yale University, University of Chicago) were in the forefront of causality analysis. Herbert Simon's proposal---that one variable causes another if the former is determined in a lower-order causal block---was the basis for much subsequent analysis. In more recent years leadership in causality analysis shifted from economics to the natural sciences and the other social sciences. Contributions by economists, such as Granger causality, mostly consisted of pronouncements that various correlations can in fact be associated with causation, an assertion that in fields other than economics is viewed as an error. Economists have made little use of the diagrammatic analyses of causation that play a major role in other areas of research. This is at least partly due to the fact that diagrammatic analyses of causation are based on the assumed availability of a preexisting model that encodes causal orderings, modeled as equations in which the right-hand side variables cause the left-hand side variables. In contrast, economic models use the equality symbol with its usual mathematical meaning, which presumes symmetry. The latter portion of the presentation proposes a definition of causation that allows a transition between structural equations as formulated by economists and causal models as used in the other disciplines.



Thomas Richardson Dept. Statistics, University of Washington

Unifying the counterfactual and graphical approaches to causality via single world intervention graphs (SWIGs) (slides) Models based on potential outcomes, also known as counterfactuals, were introduced by Neyman (1923) and later popularized by Rubin (1974). Such models are used extensively within Biostatistics, Statistics, Political Science, Economics, and Epidemiology for reasoning about causation. Directed acyclic graphs (DAGs), introduced by Wright (1921) are another formalism used to represent causal systems. Graphs are also extensively used in Computer Science, Bioinformatics, Sociology and Epidemiology. In this talk I will present a simple approach to unifying these two approaches via a new graph, termed the Single-World Intervention Graph (SWIG). The SWIG encodes the counterfactual independences associated with a specific hypothetical intervention on a set of treatment variables. The nodes on the SWIG are the corresponding counterfactual random variables. The SWIG is derived from a causal DAG via a simple node splitting transformation. I will illustrate the graphical theory with a number of examples. In particular, I show that SWIGs avoid a number of pitfalls that are present in an alternative approach to unification, based on `twin networks' that has been advocated by Pearl (2000). A simple modification of the SWIG allows one for the first time to encode on a single graph (and thus distinguish) the two possible causal interpretations of missing arrows: an absence of a causal effect for each individual versus the absence of an average causal effect at the population level. This is joint work with James Robins (Harvard School of Public Health).



Bernhard Schölkopf MPI for Intelligent Systems

Causal inference and anticausal learning (slides) The talk discusses implications of underlying cause-effect structures for popular machine learning scenarios such as covariate shift and semi-supervised learning. It argues that causal knowledge may facilitate some approaches for a given problem, and rule out others.



Peter Spirtes Dept. Philosophy, Carnegie Mellon University

Bayes net perspectives on causation and causal inference (slides) In this talk, I will give an overview of how Bayesian Networks represent causal relations, the kinds of questions they can be used to answer from given causal relations, and how they can be used for the learning of causal relations from data. I will also describe the fundamental assumptions underlying each kind of inference, some limitations of the standard Bayes Network representation, and some extensions of Bayesian Networks to remove the limitations.



Wolfgang Spohn Dept. Philosophy, University of Konstanz

Clearing up the murky issue of frame-relative versus absolute causation I will discuss differences in interpeting the Bayesian net account between the Carnegie Mellon group and me. They consist in the fact that the CM people are trying to grasp absolute causation whereas I capture only frame-relative causation with the Bayesian net apparatus. This entails a lot of mutual misunderstanding, which has not been explicitly treated in the literature and which I would like to clear up at this occasion.



Naoki Tanaka Dept. Reasoning for Intelligence, Osaka University, Japan

Estimation of causal direction in the presence of latent confounders using a Bayesian LiNGAM mixture model (slides) Recently, large amount of observed data has been accumulated in various fields and there is a growing need for estimating generating process of these data. It has been considered to estimate the data generating processes of variables using a linear acyclic model based on non-Gaussianity of external in uences (LiNGAM). However, the estimation results can be biased if there are latent confounding variables. Several methods have been proposed to estimate LiNGAM with latent confounders, but they suffer from local optima and are computationally demanding. In this talk, we propose a computationally simpler alternative. We propose to reduce LiNGAM with latent confounders to a mixture of LiNGAMs, a LiNGAM mixture model, by making an assumption that latent confounders can be approximated by discrete variables. Since previous estimation methods of LiNGAM mixture model also have computational problems, we further propose to use Bayesian approach.



Michael Waldmann Dept. Psychology, University of Göttingen

Agents and causes: Reconciling competing theories of causal reasoning (slides) Currently in both psychology and philosophy two important frameworks of causal reasoning compete. Whereas dependency theories (e.g., causal Bayes nets) focus on causally motivated statistical or counterfactual dependencies between events, dispositional theories model causation as arising from the interaction between causal participants endowed with intrinsic dispositions or forces (e.g., force dynamics). The main goal of the present project is to reconcile these two competing frameworks. In a series of experiments we have focused on one of the most fundamental assumptions underlying causal Bayes nets, the Markov constraint. According to this constraint, an inference between a cause and an effect should be invariant across conditions in which other effects of this cause are present or absent. Previous research has demonstrated that reasoners tend to violate this assumption systematically over a wide range of domains. We hypothesize that people are guided by abstract assumptions about the mechanisms underlying otherwise identical causal relations. In particular, we suspect that the distinction between agents and patients, which can be disentangled from the distinction between causes and effects, influences which causal variable people blame when an error occurs. We have developed and tested a causal Bayes net model which captures different error attributions using a hidden common preventive noise source that provides a rational explanation of the presence or absence of Markov violations.



James Woodward Dept. History and Philosophy of Science, University of Pittsburgh

Interactions between philosophical theories of causation and empirical research on causal judgment (slides) This talk will explore some of the interactions, actual and potential, between, on the one hand, theoretical/philosophical/normative/ computational theorizing (hereafter "theoretical" work) about causal learning and judgment and, on the other hand, experimental investigations of causal learning and judgment. My guiding idea is that work in each of these areas can inform the other. Theoretical work can help to suggest experiments (and also that certain experiments are not worth doing or should not be interpreted in the way that they commonly are.) More controversially, empirical work can sometimes help to motivate new theoretical ideas. I will provide illustrations from a variety of sources including experimental work concerning causal learning from interventions and observations by young children, experimental results concerning judgments about double prevention relations and, time permitting, experiments concerning choice of a preferred level of abstraction n causal judgment.



Jiji Zhang Dept. Philosophy, Lingnan University, Hong Kong

Weakening the causal faithfulness assumption (slides) This talk will examine the exact role of the causal faithfulness assumption in the inference of causal structure from facts of conditional dependence/independence, and explore the possibility of weakening the assumption. The basic idea is that given the causal Markov assumption, the causal Faithfulness assumption is partially testable, and in principle the testable parts of the faithfulness assumption need not be assumed. I will illustrate the idea by presenting a couple of generalizations of the basic SGS algorithm, under increasingly weaker assumptions. I will also discuss some connections of this work to the issue of uniform consistency in causal inference. (This is based on joint work with Peter Spirtes).



Kun Zhang MPI for Intelligent Systems

Causal discovery with functional causal models: Different types of "independence" (slides) Recently a class of causal discovery methods based on functional causal models has been proposed, which, under certain conditions, is able to fully identify the causal structure. Generally speaking, those methods make use of additional properties of a causal system other than conditional independence relationships. In this talk I will talk about three types of "independence" in the functional causal models that help tell cause from effect. They are 1) statistical independence between the cause and noise, 2) independence between the distribution of the cause and the transformation from the cause to the effect, and 3) independence between the parameters in the generating process of the cause and those that generate the effect from the cause. I will illustrate their differences, and compare functional causal model based causal discovery approaches again constraint-based ones.