Vol. 7 (2024): Linguistisches Impact-Assessment: Maschinelle Prognose mit Realitätsabgleich im Projekt TextTransfer (Norman Fiedler, Christoph Köller, Jutta Bopp, Felix Schneider)
Empirical approaches are increasingly finding their way into the methodology of research in the humanities. Linguistics visibly relies on research data and language models to generate a digital image of natural languages. On this basis, it becomes possible to automatically recognize semantic patterns in texts along user-specific search queries via distant reading. Since such models, for example in search engines, web-based translators or conversation tools, can be used to reproduce linguistic information in meaningful contexts, the implications of so-called artificial intelligence have become a topic of discourse in society as a whole. Many linguists are therefore concerned to open up their findings to new fields of application beyond their immediate disciplinary environment and to contribute to a well founded debate. This statement is contrasted by the insight that research results of all disciplines are indeed archived, but due to the lack of targeted interpretability of large and complex data sets, they are frequently not used for this broad discourse. A demonstrable impact remains missing. At this interface, the TextTransfer project, funded by the German Federal Ministry of Education and Research, is developing an approach to use a language model to infer by distant reading the type and probability of a social, economic or political impact of text-bound research knowledge. To this end, TextTransfer is building a machine learning procedure based on empirical experiential knowledge. However, an essential component of this experiential learning is the verifiability of the learning results. This article shows a first approach in the project to train a language model in a supervised machine learning procedure with robust learning data in order to achieve the highest possible precision in impact assessment.