Thursday, May 5, 2016

What I've Been Reading

Thousands of years after Aristotle’s seminal work on causality, hundreds of years after Hume gave us two definitions of it, and decades after automated inference became a possibility through powerful new computers, causality is still an unsolved problem. Humans are prone to seeing causality where it does not exist and our algorithms aren’t foolproof. Even worse, once we find a cause it’s still hard to use this information to prevent or produce an outcome because of limits on what information we can collect and how we can understand it. After looking at all the cases where methods haven’t worked and researchers and policy makers have gotten causality really wrong, you might wonder why you should bother.
          […]
Rather than giving up on causality, what we need to give up on is the idea of having a black box that takes some data straight from its source and emits a stream of causes with no need for interpretation or human intervention. Causal inference is necessary and possible, but it is not perfect and, most importantly, it requires domain knowledge.
Why: A Guide to Finding and Using Causes by Samantha Kleinber. Beautiful book, full of insights  not only for ML/AI aficionados but also to anyone who want to improve their knowledge about the world around them.

The main thing is to realize is that there is not just one method for all causal inference problems. None of the existing approaches can find causes without any errors in every single case (leaving out lot of opportunities for research). Some make more general claims than others, but these depend on assumptions that may not be true in reality. Instead of knowing about one method and using it diligently for every problem you have, you need a toolbox. Most methods can be adapted to fit most cases, but this will not be easiest or most efficient approach.

Given that there is not one perfect method, possibly the most important thing is to understand the limits of each. For instance, if your inferences are based on bivariate Granger causality, understand that you are finding a sort of direct correlation and consider the multivariate approach. Bayesian networks may be a good choice when the casual structure (connection between variables) is already known and you want to find its parameters (probability distribution) from some data. However, if time is important for the problem, dynamic Bayesian networks or methods that find the timing of casual relationships from the data may be more appropriate. Whether you data are continuous or discrete will narrow down your options, as many methods handle one or the other (but not both). If the data include large number of variables or you do not need the full structure, methods for calculating casual strength are more efficient than those that infer models.  However, when using these consider whether you will need to model interactions between causes to enable prediction. Thus causes are used for is as important as the available data in determining which methods to use. And finally, recognize that all the choices made in collecting and preparing data affect what inferences can be made. 


No comments: