首页> 外文会议>Workshop on language technology for cultural heritage, social sciences, and humanities >Using Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization
【24h】

Using Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization

机译:使用可比较的历史文本集合构建历史文本,用于拼写规范化

获取原文

摘要

In this paper, we argue that comparable collections of historical written resources can help overcoming typical challenges posed by heritage texts enhancing spelling normalization, POS-tagging and subsequent diachronic linguistic analyses. Thus, we present a comparable corpus of historical German recipes and show how such a comparable text collection together with the application of innovative MT inspired strategies allow us (ⅰ) to address the word form normalization problem and (ⅱ) to automatically generate a diachronic dictionary of spelling variants. Such a diachronic dictionary can be used both for spelling normalization and for extracting new "translation" (word formation/change) rules for diachronic spelling variants. Moreover, our approach can be applied virtually to any diachronic collection of texts regardless of the time span they represent. A first evaluation shows that our approach compares well with state-of-art approaches.
机译:在本文中,我们争辩说,历史书面资源的可比收藏可以帮助克服遗产文本提高拼写标准化,POS标记和随后的历时语言语言分析的典型挑战。因此,我们展示了历史德国食谱的可比较语料库,并展示了这种可比较的文本集合如何以及创新的MT启发策略的应用,允许我们(Ⅰ)解决单词形式归一化问题和(Ⅱ)自动生成历前的字典拼写变种。这种探讨词典可以用于拼写标准化和用于提取新的“翻译”(Word Flingeration / Change)规则,用于DiaChronic拼写变体。此外,无论它们所代表的时间跨度如何,我们的方法都可以实际上应用于任何探讨文本的任何历前的文本集合。第一个评估表明,我们的方法与最先进的方法相比良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号