Standardising language data through the conversion pipeline TEIWorLD

Jennifer Ecker

doi:10.21248/idsopen.15.2026.54

Standardising language data through the conversion pipeline TEIWorLD

Autor/innen

Jennifer Ecker IDS Mannheim

DOI:

https://doi.org/10.21248/idsopen.15.2026.54

Schlagworte:

conversion, formats, pipeline, use case, TEIWorLD

Abstract

The conversion of data into a standard format is a crucial step in many research workflows. Standardisation enables data exchange, reuse, and analysis, which are essential for advancing knowledge in various fields. In this publication, we describe the conversion pipeline TEIWorLD (TEI Workflow for Language Data) that transforms written and spoken language data into standardised formats, specifically I5/TEI P5 XML for written data and ISO/TEI Transcriptions of Spoken Language for spoken data. The pipeline leverages existing tools to convert specific formats into these standards, with an additional transformation step for written data into the archival I5 (short for IDS TEI P5) format used at the Leibniz Institute for the German Language (IDS). We also present two use cases that demonstrate the practical application of standardisation with our conversion pipeline TEIWorLD in language data management on a corpus consisting of more than one format, enabling researchers to efficiently analyse and share their data.

Downloads

PDF (English)

Veröffentlicht

2026-03-02

Zitationsvorschlag

Ecker, J. (2026). Standardising language data through the conversion pipeline TEIWorLD. Online-Only Publikationen Des Leibniz-Instituts für Deutsche Sprache, 15. https://doi.org/10.21248/idsopen.15.2026.54

Bibliografische Angaben herunterladen

Ausgabe

Bd. 15 (2026): Standardising language data through the conversion pipeline TEIWorLD (Jennifer Ecker)

Rubrik

Orange Literatur

Standardising language data through the conversion pipeline TEIWorLD

Autor/innen

DOI:

Schlagworte:

Abstract

Downloads

Veröffentlicht

Zitationsvorschlag

Ausgabe

Rubrik

Aktuelle Ausgabe

Informationen

Sprache