For best experience please turn on javascript and use a modern browser!
Our Research

Big data and machine learning

The focus here is on data retrieval from large databases and the Web as well as quantitative text analysis, not on doing Web surveys or using mobile phones or social badges; for those, see  reality mining at MIT or Internet Methods at Manchester. 

A brief introduction:

  • David Lazer e.a. (2009) Computational social science.  Science 323: 721-723.

A first place to look is the eHumanities group. Also the Vrije Universiteit has a group specialized in Web research, with software tutorials online and a half-day course on their software. Collecting and analyzing big data can be done with a rapidly increasing number of R-packages, or with Python packages and scripts assembled by Javier Garcia-Bernardo (AISSR).

A recommended website:  digital methods initiative. Another one, by the University of Manchester:  e-science.

An overview of big data, and its pitfalls: 

  • Scott. A. Golder and Michael W. Macy (2014) Digital Footprints. Annual Review of Sociology 40: 6.1-6.24.
  • David Lazer et al (2014) The parable of Google flu: traps in big data analysis. Science 343: 1203.
  • Zou, J and Schiebinger, L (2018) Design AI so that it’s fair. Nature 559: 324-326.
  • Barenboim, E and Pearl, J. (2016) Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences 113: 7345-7352.
  • Spiegelhalter, D.J. (2014) The future lies in uncertainty. Science 345: 264-265. 
  • The Journal of Economic Perspectives (Vol 28, 2014) had a special issue on big data.
  • On privacy, Science (30 Jan 2015) had a special issue.
  • Caliskan, A., Bryson, J., and Narayanan, A. (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356: 183-186.
  • David Lazer and Jason Radford (2017) Data ex machina: Introduction to big data. Annual Review of Sociology 43: 19-39. 
  • Keith Hampton (2017) Studying the digital: directions and challenges for digital methods. Annual Review of Sociology 43: 167-188. 

A good textbook that is completely accessible on the Web:

Another good textbook of which at least the lecture sheets are on the Web:

Pattern finding in big data through machine learning:

  • Murphy, K. P. (2012) Machine Learning: A Probabilistic Perspective. MIT Press.
  • LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep learning. Nature 521: 436-444.
  • Zoubin Ghahramani (2015) Probabilistic machine learning and artificial intellingence. Nature 521: 452-459. This paper mentions a software tool that not only analyzes longitudinal data by comparing many different models, but also writes a paper about its findings.
  • Sendhil Mullainathan and Jann Spiess (2017) Machine learning: An applied econometric approach. Journal of Economic Perspectives 31: 87-106.

Along with many offline source like newspapers, journals and patents, ever more historical data are digitized and put online, e.g. the  republic of lettersdigital scholarship; see also the journal  digital humanities quarterly.  

Interesting, and possibly relevant for your research: 

  • Kosinski, M., Stillwell, D. and Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. PNAS 110: 5802-5805.

Big data often lack (big) theory, which some people want to change:

  • Bentley R.A., O’Brien M.J and Brock, W.A. (2014) Mapping collective behavior in the big-data era. Behavioral and Brain Sciences 37: 63-119.


Large computation jobs beyond the abilities of a PC can be done on the computers of the University of Amsterdam ( SARA).

Experts on quantitative text analysis at the AISSR are Wouter van der Brug, Tom van der Meer, Sarah de Lange, and Javier Garcia-Bernardo for big data. Since 2018 there is a Computational Social Science group at UvA.