首页> 外文会议>Workshop on the use of computational methods in the study of endangered languages >Seeing more than whitespace — Tokenisation and disambiguation in a North Sami grammar checker
【24h】

Seeing more than whitespace — Tokenisation and disambiguation in a North Sami grammar checker

机译:在北萨米语法检查器中看到的不止空间 - 令牌化和歧义

获取原文

摘要

Communities of lesser resourced languages like North Sami benefit from language tools such as spell checkers and grammar checkers to improve literacy. Accurate error feedback is dependent on well-tokenised input, but traditional tokenisation as shallow preprocessing is inadequate to solve the challenges of real-world language usage. We present an alternative where tokenisation remains ambiguous until we have linguistic context information available. This lets us accurately detect sentence boundaries, multiwords and compound error detection. We describe a North Sami grammar checker with such a tokenisation system, and show the results of its evaluation.
机译:较少资源语言的社区,如北萨米的福利,如拼写检查和语法检查员,以提高识字率。准确的错误反馈取决于令牌的输入,但传统的令牌令人作用是解决现实世界语言使用的挑战不足。在我们有语言上下文信息可用之前,我们展示了另一种替代方案,直到我们有语言上下文信息。这使我们可以准确地检测句子边界,多字和复合错误检测。我们描述了一种具有这种令牌化系统的北萨米语法检查器,并显示了评估结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号