A list of publicly available open datasets

Photo with sign "open"
Photo by Artem Bali from Pexels

It is becoming increasingly clear that we as researchers need to analyze large-scale, publicly available open datasets. The replication crisis in psychology vividly illustrated that most studies are strongly underpowered and that results are not replicable (e.g., Munafo et al., 2017). Besides, and maybe less well-known, small samples also increase the probability of finding extreme results, which leads to a literature full of effect sizes that are artificially inflated (e.g., Button et al., 2013).

Specifically, in my own field of media psychology the median sample size for survey-based studies is = 327 and for experiments = 107, which results in low statistical power to detect small or even medium-sized effects (Elson & Przybylski, 2017). However, must studies presumably address research questions with small to medium-sized effects (e.g., Richard et al, 2003). In other words, the majority of studies in media psychology is, unfortunately, not informative.

The situation is even more precarious for students who have to write their own empirical theses: It’s almost impossible to collect a sufficiently powered sample all by yourself. As a consequence, the results of most theses are everything but reliable. (Which is kind of ironic, if we consider that a thesis should document, above all, that the author knows how to correctly go about doing research.)

Fortunately, there is an increasing number of publicly available open datasets, which both researchers and students can (should?) use (more frequently!). In what follows, you can hence find a list of general open data repositories and specific open datasets, mostly for the social sciences, that I was able to find so far. I will update the list continuously. If you should know of other repositories or datasets, feel free to leave a comment and I will include them.

Also, please check out this overview of freely available datasets in the context of Psychology, curated by Cameron Brick.

Repositories of open datasets

Individual large-scale open datasets

Search engines

Further readings

  • Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., . . . Frank, M. C. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1), 20. https://doi.org/10.1525/collabra.158

Literature

  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews. Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
  • Elson, M., & Przybylski, A. K. (2017). The science of technology and human behavior. Journal of Media Psychology, 29(1), 1–7. https://doi.org/10.1027/1864-1105/a000212
  • Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Du Percie Sert, N., . . . Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1). https://doi.org/10.1038/s41562-016-0021
  • Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7(4), 331–363. https://doi.org/10.1037/1089-2680.7.4.331

2 Comments

  1. Great blog post 🙂
    Just s few additions that you might find interesting:
    1. We provide an overview of repositories that hold psych data in our (open access) paper on Transparency in Psychological Science published in Collabra: https://www.collabra.org/articles/10.1525/collabra.158/
    2. Technically, GESIS has two different repositories: Datorium (which you list here) and the DBK (see https://dbk.gesis.org/dbksearch/index.asp?db=e). However, AFAIK, you should be able to find one through the search function of the other.
    3. Many European countries have their own social science archives. Most of them are members of CESSDA which now also provides a combined data catalog: https://datacatalogue.cessda.eu/
    4. Many repositories (incl. the ones offered by GESIS, e.g.) are also indexed in Google Dataset Search: https://toolbox.google.com/datasetsearch

    1. Wow, that’s awesome, thanks very much Johannes!!! For example, hadn’t yet heard of the Google Dataset Search … looks interesting! I’ll update the post asap.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.