It is becoming increasingly clear that we as researchers need to analyze large-scale, publicly available open datasets. The replication crisis in psychology vividly illustrated that most studies are strongly underpowered and that results are not replicable (e.g., Munafo et al., 2017). Besides, and maybe less well-known, small samples also increase the probability of finding extreme results, which leads to a literature full of effect sizes that are artificially inflated (e.g., Button et al., 2013).

Specifically, in my own field of media psychology the median sample size for survey-based studies is = 327 and for experiments = 107, which results in low statistical power to detect small or even medium-sized effects (Elson & Przybylski, 2017). However, must studies presumably address research questions with small to medium-sized effects (e.g., Richard et al, 2003). In other words, the majority of studies in media psychology is, unfortunately, not informative.

The situation is even more precarious for students who have to write their own empirical theses: It’s almost impossible to collect a sufficiently powered sample all by yourself. As a consequence, the results of most theses are everything but reliable. (Which is kind of ironic, if we consider that a thesis should document, above all, that the author knows how to correctly go about doing research.)

Fortunately, there is an increasing number of publicly available open datasets, which both researchers and students can (should?) use (more frequently!). In what follows, you can hence find a list of general open data repositories and specific open datasets, mostly for the social sciences, that I was able to find so far. I will update the list continuously. If you should know of other repositories or datasets, feel free to leave a comment and I will include them.

Repositories of open datasets

Individual large-scale open datasets

Search engines

  1. Great blog post 🙂
    Just s few additions that you might find interesting:
    1. We provide an overview of repositories that hold psych data in our (open access) paper on Transparency in Psychological Science published in Collabra: https://www.collabra.org/articles/10.1525/collabra.158/
    2. Technically, GESIS has two different repositories: Datorium (which you list here) and the DBK (see https://dbk.gesis.org/dbksearch/index.asp?db=e). However, AFAIK, you should be able to find one through the search function of the other.
    3. Many European countries have their own social science archives. Most of them are members of CESSDA which now also provides a combined data catalog: https://datacatalogue.cessda.eu/
    4. Many repositories (incl. the ones offered by GESIS, e.g.) are also indexed in Google Dataset Search: https://toolbox.google.com/datasetsearch

    1. Wow, that’s awesome, thanks very much Johannes!!! For example, hadn’t yet heard of the Google Dataset Search … looks interesting! I’ll update the post asap.

