Before doing any statistics it’s essential to know what type of question you have, on a continuum from descriptive to ``mechanistic”-explanatory:
- Leek, J.T. and Peng, R.D. (2015) What is the question? Science 347: 1314-1315.
Equally important is to realize that decisions on data collection (sample, variables and operationalizations) and modeling are much more important for the results than p-levels. Moveover, “a p value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result as least as extreme as the one observed. And a p value cannot indicate the importance of a finding” (Monya Baker, Nature 2016, p.151). See:
- Leek, J.T. and Peng, R.D. (2015) P values are just the tip of the iceberg. Nature 520: 612.
- Brown, A.W., Kaiser, K.A. and Allison, D.B. (2018) Issues with data and analysis: Errors, underlying themes, and potential solutions. Proceedings of the National Academy of Sciences 115: 2563-2570.
- Cumming, G. (2008) Replication and p intervals. Perspectives on Psychological Science 3: 286-300.
- Endogeneity [a lucid clip on Youtube].
Statistics at a very elementary level:
- Paul D. Allison. (1999) Multiple Regression. Thousand Oaks: Sage.
- Wonnacott, R.H. and Wonnacott, R.J. (1990) Introductory Statistics. New York: Wiley.
Much better is to first learn a little bit of math, and then everything else is much better comprehensible:
- Fox, John (2009), A Mathematics Primer for Social Statistics. Thousand Oaks: Sage.
Quick introductions to, and overviews of, many statistical topics can be had through Cosma Shalizi’s notebooks. To see the forest through the trees:
- Kass, Robert E. (2011) Statistical Inference: The Big Picture. Statistical Science 26:1-9.
Non-technical treatise of the most important statistical insights and techniques:
- Stigler, S.M. (2016) The Seven Pillars of Statistical Wisdom. Harvard U.P.
f you use R, which you should anyway (see motivation under "general"):
- Snijders, Tom A.B. and Bosker, Roel J. (2012, 2nd ed) Multilevel Analysis. Los Angeles: Sage.
- Wooldridge, Jeffrey M. (2012, 5th ed) Introductory Econometrics. Mason: South-Western.
- Angrist, Joshua D. and Pischke, Jörn-Steffen (2009). Mostly Harmless Econometrics. Princeton: Princeton U.P.
- Gelman, Andrew and Hill, Jennifer (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge U.P. Features computer code in R.
- Stock, James H. and Watson, Mark W. (2017) Twenty years of time series econometrics in ten pictures. Journal of Economic Perspectives 31: 59-86. Has references to important studies and textbooks, with an emphasis on macro phenomena.
- Cameron, A. Colin and Pravin K. Trivedi (2005) Microeconometrics. Cambridge U.P.
- Shalizi, C. (2010) The bootstrap. American Scientist 98: 186-190. Also contains an excellent one-page overview of what statistics is about.
- John Fox and Sanford Weisberg (2018, 3rd ed.) An R Companion to Applied Regression. Thousand Oaks: Sage.
You then need to know how to import data into R. John Fox has course material on Structural Equation Models on his website, and he made the same package for it in R. There is also the lavaan package in R. Arguably the best introductory textbook on structural equation modeling: Bill Shipley (2000) Cause and Correlation in Biology. Cambridge U.P.
- Efron, B. (2013) Bayes Theorem in the 21st Century. Science 340: 1177-1178.
- Gelman and Hill (see above) provide a one chapter introduction with R code.
- History of the conflict between frequentists and Bayesians: Mathias W. Madsen (2015) The Kid, the Clerk, and the Gambler. University of Amsterdam, PhD.
- Textbook for R users: Kruschke, J.K. (2014) Doing Bayesian Data Analysis. New York: Elsevier.
On the internet
Introductory (and not so introductory) talks on Youtube
The debate on causality is ongoing for about 2500 years, and the references below are only to a very small portion of the pertaining literature, yet touching upon some of the most salient issues that social scientists have to deal with. See also experimental research.
- Hubert M. Blalock (1961) Causal inferences in nonexperimental research. Univ. North Car. Press: Chapel Hill. Arguably the most classical text for modern survey researchers.
- Andrew Gelman (2011) Causality and statistical learning American Journal of Sociology 117: 955-966. A review of three books by one of the top statisticians around (see his own book above).
- Mott Greene (2001) A tool, not a tyrant. Nature 410: 875. On mechanisms.
- Kenneth A. Bollen and Mark D. Noble (2011) Structural equation models and the quantification of behavior. Proceedings of the National Academy of Sciences 108: 15639-15646. A brief introduction to structural equation models by their father, an approach that is also used by Judea Pearl:
- Judea Pearl (2010) The foundations of causal inference. Sociological Methodology 40: 75-149.
- Paul W. Holland (1986) Statistics and causal inference. J. Am. Stat. Association 81: 945-960.
- D.R. Cox and Nanny Wermuth (2001) Some statistical aspects of causality. European Sociological Review 17: 65-74.
- Donald B. Rubin (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66: 688-701.
- Susan Athey and Guido Imbens (2016) Recursive partitioning for heterogeneous causal effects. PNAS 113: 7353-7360.
- Susan Athey and Guido Imbens (2017) The state of applied econometrics: causality and policy evaluation. Journal of Economic Perspectives 31: 3-32.
(Im)proper use of statistics
- Gigerenzer, G. (2004) Mindless statistics. The Journal of Socio-Economics 33: 587-606.
- Simonsohn, U., Nelson, L.D. and Simmons, J.P. (2014) P-curve: a key to the file-drawer. Journal of Experimental Psychology 143: 534-547.
- Watts, D.J. (2014) Common sense and sociological explanation. American Journal of Sociology 120: 313-351.
- Loken, E and Gelman, A (2017) Measurement error and the replication crisis: The assumption that measurement error always reduces effect sizes is false. Science 355: 584-585.
- Young, C. (2009) Model uncertainty in sociological research. American Sociological Review 74: 380-397.
- Freese, J. (2014) Defending the decimals: Why foolishly false precision might strengthen social science. Sociological Science 1: 532-541.
- Nuzzo, R. (2014). Statistical errors. Nature 506: 150-152.
- Goodman, S.N. (2016) Aligning statistical and scientific reasoning. Science 352: 1180.
- Ellen Hamaker and Oisin Ryan (2019) A squared standard error is not a measure of individual differences. PNAS 116: 6544-6545.
- Cosma Shalizi on abuse of factor analysis
- See also our special page on scientific misconduct.
Experts at the AISSR on questionnaires and modeling include Gijs Schumacher, Matthijs Kamijn, Brian Burgoon, Wouter van der Brug, Joost Berkhout, Armen Hakhverdian, Tom van der Meer. Time series: Brian Burgoon, Gijs Schumacher, Wouter van der Brug, Theresa Kuhn, Armen Hakhverdian, Ursula Daxecker, Julia Bader, Lee Seymour, Imke Harbers. Event/hazard models: Brian Burgoon, Ursula Daxecker. Survey experiments: Brian Burgoon, Tom van der Meer, Armen Hakhverdian. Spatial econometrics: Imke Harbers, Ursula Daxecker, Brian Burgoon.