首页> 外文会议>Language and technology conference >Towards Better Text Processing Tools for the Ainu Language
【24h】

Towards Better Text Processing Tools for the Ainu Language

机译:对AINU语言的更好的文本处理工具

获取原文

摘要

In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
机译:在本文中,我们展示了我们对AINU语言的自然语言处理技术的发展的研究,由AINU人口,日本群岛北部的北部的本土居民口中突出的危害濒危语言孤立。 特别是,我们专注于改善现有的转录标准化工具,单词分割(令牌化)和语音标记。 在实验中,我们应用了来自不同域名(文学和口语)的两个AINU语言词典,并通过组合它们创建了一个新的数据集。 该实验证实了这些修改对工具总体性能的积极影响,特别是与培训数据无关的客观样本。 我们还通过将语料品驱动的语言模型应用于单词分割问题并使用最先进的工具来讨论进一步的改进来培训语音分配标记器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号