Our Research
# Survey and Statistics

Before doing any statistics it’s essential to know what type of question you have, on a continuum from descriptive to ``mechanistic”-explanatory:

- Leek, J.T. and Peng, R.D. (2015) What is the question?
*Science*347: 1314-1315.

Equally important is to realize that decisions on data collection (sample, variables and operationalizations) and modeling are *much* more important for the results than *p*-levels. Moveover, “a *p* value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result as least as extreme as the one observed. And a *p* value cannot indicate the importance of a finding” (Monya Baker, *Nature* 2016, p.151). See:

- Leek, J.T. and Peng, R.D. (2015)
*P*values are just the tip of the iceberg.*Nature*520: 612. - Brown, A.W., Kaiser, K.A. and Allison, D.B. (2018) Issues with data and analysis: Errors, underlying themes, and potential solutions.
*Proceedings of the National Academy of Sciences*115: 2563-2570. - Cumming, G. (2008) Replication and p intervals.
*Perspectives on Psychological Science*3: 286-300. - Endogeneity [a lucid clip on Youtube].

Statistics at a very elementary level:

- Paul D. Allison. (1999)
*Multiple Regression*. Thousand Oaks: Sage. - Wonnacott, R.H. and Wonnacott, R.J. (1990)
*Introductory Statistics*. New York: Wiley.

Much better is to first learn a little bit of math, and then everything else is much better comprehensible:

- Fox, John (2009),
*A Mathematics Primer for Social Statistics*. Thousand Oaks: Sage.

Quick introductions to, and overviews of, many statistical topics can be had through Cosma Shalizi’s notebooks. To see the forest through the trees:

- Kass, Robert E. (2011) Statistical Inference: The Big Picture.
*Statistical Science*26:1-9.

Non-technical treatise of the most important statistical insights and techniques:

- Stigler, S.M. (2016)
*The Seven Pillars of Statistical Wisdom*. Harvard U.P.

f you use R, which you should anyway (see motivation under "general"):

- Snijders, Tom A.B. and Bosker, Roel J. (2012, 2nd ed)
*Multilevel Analysis*. Los Angeles: Sage. - Wooldridge, Jeffrey M. (2012, 5th ed)
*Introductory Econometrics*. Mason: South-Western. - Angrist, Joshua D. and Pischke, Jörn-Steffen (2009).
*Mostly Harmless Econometrics*. Princeton: Princeton U.P. - Gelman, Andrew and Hill, Jennifer (2007)
*Data Analysis Using Regression and Multilevel/Hierarchical Models*. Cambridge U.P. Features computer code in R. - Stock, James H. and Watson, Mark W. (2017) Twenty years of time series econometrics in ten pictures.
*Journal of Economic Perspectives*31: 59-86. Has references to important studies and textbooks, with an emphasis on macro phenomena. - Cameron, A. Colin and Pravin K. Trivedi (2005)
*Microeconometrics.*Cambridge U.P.

- Shalizi, C. (2010) The bootstrap.
*American Scientist*98: 186-190. Also contains an excellent one-page overview of what statistics is about. - John Fox and Sanford Weisberg (2018, 3
^{rd}ed.)*An R Companion to Applied Regression*. Thousand Oaks: Sage.

You then need to know how to import data into R. John Fox has course material on *Structural Equation Models *on his website, and he made the same package for it in R. There is also the lavaan package in R. Arguably the best introductory textbook on *structural equation modeling*: Bill Shipley (2000) *Cause and Correlation in Biology*. Cambridge U.P.

- Efron, B. (2013) Bayes Theorem in the 21st Century.
*Science*340: 1177-1178. - Gelman and Hill (see above) provide a one chapter introduction with R code.
- History of the conflict between
*frequentists*and*Bayesians*: Mathias W. Madsen (2015)*The Kid, the Clerk, and the Gambler*. University of Amsterdam, PhD. - Textbook for R users: Kruschke, J.K. (2014)
*Doing Bayesian Data Analysis*. New York: Elsevier.

Introductory (and not so introductory) talks on Youtube

- An elementary online statistics textbook
- Choosing the correct statistical test
- Handbook of statistical methods
- Statistical terms
- Factor analysis [but see a comment on its abuse, below]
- Survey-related methods (University of Manchester)
- An econometric site for statistical matters
- For R users: econometric and panel methods
- Advanded data analysis from an elementary point of view. Cosma Shalizi's book, wherein he also points out many mistakes that are often made.

- Czaja, R. F. and Blair, J. E. (2005)
*Designing surveys. A guide to decisions and procedures*. Thousand Oaks: Pine Forge. - De Leeuw, E. D., Hox, J. J. and Dillman, D. A. (Eds.) (2008)
*International Handbook of Survey Methodology*. New York: Lawrence Erlbaum Associates - Fowler, F. J. (1995)
*Improving Survey Questions. Design and Evaluation*. Thousand Oaks: SAGE. - Fowler, F. J. (2009)
*Survey Research Methods*. Thousand Oaks: SAGE. - Groves, R. M. et al. (2009)
*Survey Methodology*. Hoboken: Wiley. - Krosnick, J. A., and Fabrigar, L. R. (2013)
*The handbook of questionnaire design*. New York: Oxford University Press - Schaeffer, N. C. and Presser, S. (2003) The Science of Asking Questions.
*Annual Review of Sociology*29, 65–88. - Sudman, S., Bradburn, N. M. and Schwarz, N. (1996)
*Thinking about answers. The application of cognitive processes to survey methodology*. San Francisco: Jossey-Bass. - Tourangeau, R., Rips, L. J. and Rasinski, K. A. (2000)
*The psychology of survey response*. Cambridge: Cambridge University Press. - Kahneman, D. et al. (2004) A survey method for characterizing daily life experience: The day reconstruction method.
*Science*306: 1776-1780. - Alan B. Krueger and Arthur A. Stone (2014) Progress in measuring subjective well-being.
*Science*346: 42-43. - Zwane, A.P. et al (2011) Being surveyed can change later behavior and related parameter estimates.
*PNAS*108: 1821-1826. - Rogers, T., ten Brinke, L. and Carney, D.R. (2016) Unacquanted callers can predict which citizens will vote over and above citizens’ stated self-predictions.
*PNAS*113: 6449-6453.

- Spiegelhalter, D., Pearson, M. and Short, I. (2011) Visualizing uncertainty about the future.
*Science*333: 1393-1400. - Use bar charts instead of pie charts: W.S. Cleveland and R. McGill (1984) Graphical perception.
*J. Am. Stat. Assoc*79: 531-554.

- Jessica Gurevitch, e.a. (2018) Meta-analysis and the science of research synthesis.
*Nature*555: 175-182. - Jop de Vrieze (2018) The metawars: meta-analyses were supposed to end scientific debates. Often, they only cause more controversy.
*Science*361: 1185 – 1188.

The debate on causality is ongoing for about 2500 years, and the references below are only to a very small portion of the pertaining literature, yet touching upon some of the most salient issues that social scientists have to deal with. See also experimental research.

- Hubert M. Blalock (1961) Causal inferences in nonexperimental research. Univ. North Car. Press: Chapel Hill. Arguably the most classical text for modern survey researchers.
- Andrew Gelman (2011) Causality and statistical learning
*American Journal of Sociology*117: 955-966. A review of three books by one of the top statisticians around (see his own book above). - Mott Greene (2001) A tool, not a tyrant.
*Nature*410: 875. On mechanisms. - Kenneth A. Bollen and Mark D. Noble (2011) Structural equation models and the quantification of behavior.
*Proceedings of the National Academy of Sciences*108: 15639-15646. A brief introduction to structural equation models by their father, an approach that is also used by Judea Pearl: - Judea Pearl (2010) The foundations of causal inference.
*Sociological Methodology*40: 75-149. - Paul W. Holland (1986) Statistics and causal inference.
*J. Am. Stat. Association*81: 945-960. - D.R. Cox and Nanny Wermuth (2001) Some statistical aspects of causality.
*European Sociological Review*17: 65-74. - Donald B. Rubin (1974) Estimating causal effects of treatments in randomized and nonrandomized studies.
*Journal of Educational Psychology*66: 688-701. - Susan Athey and Guido Imbens (2016) Recursive partitioning for heterogeneous causal effects.
*PNAS*113: 7353-7360. - Susan Athey and Guido Imbens (2017) The state of applied econometrics: causality and policy evaluation.
*Journal of Economic Perspectives*31: 3-32.

- Gigerenzer, G. (2004) Mindless statistics.
*The Journal of Socio-Economics*33: 587-606. - Simonsohn, U., Nelson, L.D. and Simmons, J.P. (2014)
*P*-curve: a key to the file-drawer.*Journal of Experimental Psychology*143: 534-547. - Watts, D.J. (2014) Common sense and sociological explanation.
*American Journal of Sociology*120: 313-351. - Loken, E and Gelman, A (2017) Measurement error and the replication crisis: The assumption that measurement error always reduces effect sizes is false.
*Science*355: 584-585. - Young, C. (2009) Model uncertainty in sociological research.
*American Sociological Review*74: 380-397. - Freese, J. (2014) Defending the decimals: Why foolishly false precision might strengthen social science.
*Sociological Science*1: 532-541. - Nuzzo, R. (2014). Statistical errors.
*Nature*506: 150-152. - Goodman, S.N. (2016) Aligning statistical and scientific reasoning.
*Science*352: 1180. - Ellen Hamaker and Oisin Ryan (2019) A squared standard error is not a measure of individual differences.
*PNAS*116: 6544-6545. - Cosma Shalizi on abuse of factor analysis
- See also our special page on scientific misconduct.

Experts at the AISSR on *questionnaires* and *modeling* include Gijs Schumacher, Matthijs Kamijn, Brian Burgoon, Wouter van der Brug, Joost Berkhout, Armen Hakhverdian, Tom van der Meer. *Time series*: Brian Burgoon, Gijs Schumacher, Wouter van der Brug, Theresa Kuhn, Armen Hakhverdian, Ursula Daxecker, Julia Bader, Lee Seymour, Imke Harbers. *Event/hazard models*: Brian Burgoon, Ursula Daxecker. *Survey experiments*: Brian Burgoon, Tom van der Meer, Armen Hakhverdian. *Spatial econometrics*: Imke Harbers, Ursula Daxecker, Brian Burgoon.