Galton had discovered that regression toward the mean was not the result of biological change, but rather was a simple consequence of the imperfect correlation between parents and offspring, and that lack of prefect correlation was a necessary requirement for Darwin, or else there would be no intergenerational variation and no natural selection.The Seven Pillars of Statistical Wisdom by Stephen M. Stigler. A small book but each page is power packed with extremely relevant information. Brilliant read for folks who are interested in machine learning.
The influence was not only in biology. The variance components idea became key to much quantitative and educational psychology. And Galton's idea of separating permanent and transient effects was at the heart of the model that economist Milton Friedman proposed in 1957 in this Theory of the Consumption Function, for which he won the 1976 Nobel Prize.
- Aggregation - It allows one to gain information by discarding information (mean).
- Information Measurement - Information on accuracy does not come linearly with data but its promotional to square root of number of observations.
- Likelihood - The use of probability to calibrate inference, be it confidence interval or a Bayesian posterior probability (p-value, bayesian inference etc).
- Intercomparison - Statistical comparisons does not need to be made with respect to external data but can often be made in terms interior to the data themselves (t-test etc).
- Regression - Regression introduced modern multivariate analysis and the tools needed for any theory of inference. Before this apparatus of conditional distributions was introduced, truly general Bayes's theorem was not feasible. So this pillar is central to Bayesian as well as causal inference.
- Experimental Design - This involves great subtleties: the ability to structure models for the exploration of high dimensional data with the simultaneous consideration of multiple factors, and the creation through randomization of a basis for inference that relied only minimally upon modeling.
- Residual - The most common appearances in statistics are our model diagnostics (plotting residuals), but more important is the way we explore high-dimensional spaces by fitting and comparing nested models. Every test for significance of a regression coefficient is an example, as is every exploration of a time series.
No comments:
Post a Comment