Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Authors
In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribed oral history recordings (90,000 tokens). We show that neither resource is sufficient for assessing factors in writing variations of users and describe a data collection project involving a citizen science community for solving this problem. Laymen will independently and redundantly transcribe 1,200 short samples (15-20 seconds) of audio material in Swiss German according to their own best practice.
Original language | English |
---|---|
Title of host publication | Proceedings of the 13th Conference on Natural Language Processing (KONVENS) : Bochum, GermanySeptember 19–21, 2016 |
Editors | Stefanie Dipper, Friedrich Neubarth, Heike Zinsmeister |
Number of pages | 6 |
Place of Publication | Bochum |
Publisher | Ruhr-Universität Bochum |
Publication date | 01.09.2016 |
Pages | 62-67 |
Publication status | Published - 01.09.2016 |
Externally published | Yes |
Event | 13th Conference on Natural Language Processing (KONVENS) - Linguistics Department / Ruhr-Universität Bochum, Bochum, Germany Duration: 19.09.2016 → 21.09.2016 https://www.linguistics.rub.de/konvens16/ |
- Informatics