Reddit Corpus Keyword Search (ReCKS)

Ein Tool für die Gewinnung und Auswertung von Sprachdaten aus der Social Media-Plattform Reddit

Authors

  • Jenia Yudytska Universität Hamburg
  • Jannis Androutsopoulos Universität Hamburg

DOI:

https://doi.org/10.21248/idsopen.16.2026.79

Keywords:

ReCKS, Reddit, user comments, RegEx, natively digital language corpora, microdiachronic analysis of digitally written language

Abstract

ReCKS (“Reddit Corpus Keyword Search”) is a web application for the linguistic research of Reddit comments. The current underlying dataset comes from the largest German-language subreddit, r/de, and includes user comments from 2006 to 2023 with a total of ca. 41 million tokens. As input, ReCKS allows both simple fixed keyword searches and complex search queries using regular expressions (RegEx). The output is given in the form of an exportable online table and a diagram that visualises the normalised frequency of the search term per year. This paper first explains the technical architecture of the application. It then briefly describes various usage scenarios and discusses in detail how the tool can be used for microdiachronic analyses. This is illustrated with an analysis of Genderzeichen (‘gender signs’, i.e., spelling variants that index gender-inclusivity, such as Student:in or Student*in) by r/de users over the last 15 years.

Downloads

Published

2026-03-09

How to Cite

Yudytska, J., & Androutsopoulos, J. (2026). Reddit Corpus Keyword Search (ReCKS): Ein Tool für die Gewinnung und Auswertung von Sprachdaten aus der Social Media-Plattform Reddit. Online-Only Publications of the Leibniz Institute for the German Language, 16. https://doi.org/10.21248/idsopen.16.2026.79