首页> 外文会议>International Conference on Computational Linguistics >Detecting de minimis Code-Switching in Historical German Books
【24h】

Detecting de minimis Code-Switching in Historical German Books

机译:检测历史德国书籍中的De Minimis Code-Switch

获取原文

摘要

Code-switching has long interested linguists, with computational work in particular focusing on speech and social media data (Sitaram et al., 2019). This paper contrasts these informal instances of code-switching to its appearance in more formal registers, by examining the mixture of languages in the Deutsches Textarchiv (DTA), a corpus of 1406 primarily German books from the 17th to 19th centuries. We automatically annotate and manually inspect spans of six embedded languages (Latin, French, English, Italian, Spanish, and Greek) in the corpus. We quantitatively analyze the differences between code-switching patterns in these books and those in more typically studied speech and social media corpora. Furthermore, we address the practical task of predicting code-switching from features of the matrix language alone in the DTA corpus. Such classifiers can help reduce errors when optical character recognition or speech transcription is applied to a large corpus with rare embedded languages.
机译:代码切换具有长期感兴趣的语言学家,特别关注语音和社交媒体数据(Sitaram等,2019)的计算工作。 本文将这些非正式转换在更正式的寄存器中的外观上的这些非正式实例,通过检查Deutsches TextArchiv(DTA)的混合,1406个主要德国书籍从17日到19世纪的德国书籍。 我们在语料库中自动注释并手动检查六种嵌入语言(拉丁文,法语,英语,意大利语,西班牙语和希腊语)的跨度。 我们定量分析这些书中的代码切换模式与更常见的言论和社交媒体语料库之间的差异。 此外,我们解决了在DTA语料库中单独从矩阵语言的特征预测代码切换的实际任务。 当光学字符识别或语音转录应用于具有罕见嵌入语言的大语料库时,这种分类器可以帮助减少错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号