Reddit Corpus Keyword Search (ReCKS)

Ein Tool für die Gewinnung und Auswertung von Sprachdaten aus der Social Media-Plattform Reddit

Autor/innen

  • Jenia Yudytska Universität Hamburg
  • Jannis Androutsopoulos Universität Hamburg

DOI:

https://doi.org/10.21248/idsopen.16.2026.79

Schlagworte:

ReCKS, Reddit, user comments, RegEx, natively digital language corpora, microdiachronic analysis of digitally written language

Abstract

ReCKS (“Reddit Corpus Keyword Search”) is a web application for the linguistic research of Reddit comments. The current underlying dataset comes from the largest German-language subreddit, r/de, and includes user comments from 2006 to 2023 with a total of ca. 41 million tokens. As input, ReCKS allows both simple fixed keyword searches and complex search queries using regular expressions (RegEx). The output is given in the form of an exportable online table and a diagram that visualises the normalised frequency of the search term per year. This paper first explains the technical architecture of the application. It then briefly describes various usage scenarios and discusses in detail how the tool can be used for microdiachronic analyses. This is illustrated with an analysis of Genderzeichen (‘gender signs’, i.e., spelling variants that index gender-inclusivity, such as Student:in or Student*in) by r/de users over the last 15 years.

Downloads

Veröffentlicht

2026-03-09

Zitationsvorschlag

Yudytska, J., & Androutsopoulos, J. (2026). Reddit Corpus Keyword Search (ReCKS): Ein Tool für die Gewinnung und Auswertung von Sprachdaten aus der Social Media-Plattform Reddit. Online-Only Publikationen Des Leibniz-Instituts für Deutsche Sprache, 16. https://doi.org/10.21248/idsopen.16.2026.79