Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Simon Clematide
  • Karina Frick
  • Noëmi Aeppli
  • Jean-Philippe Goldmann
In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribed oral history recordings (90,000 tokens). We show that neither resource is sufficient for assessing factors in writing variations of users and describe a data collection project involving a citizen science community for solving this problem. Laymen will independently and redundantly transcribe 1,200 short samples (15-20 seconds) of audio material in Swiss German according to their own best practice.
Original languageEnglish
Title of host publicationProceedings of the 13th Conference on Natural Language Processing (KONVENS) : Bochum, GermanySeptember 19–21, 2016
EditorsStefanie Dipper, Friedrich Neubarth, Heike Zinsmeister
Number of pages6
Place of PublicationBochum
PublisherRuhr-Universität Bochum
Publication date01.09.2016
Pages62-67
Publication statusPublished - 01.09.2016
Externally publishedYes
Event13th Conference on Natural Language Processing (KONVENS) - Linguistics Department / Ruhr-Universität Bochum, Bochum, Germany
Duration: 19.09.201621.09.2016
https://www.linguistics.rub.de/konvens16/

Documents

Links