首页> 外文会议>Workshop on NLP for similar langues, varieties and dialects >Part of Speech Tagging in Luyia: A Bantu Macrolanguage
【24h】

Part of Speech Tagging in Luyia: A Bantu Macrolanguage

机译:卢伊的讲话标记的一部分:一位苍白的macrolanuage

获取原文

摘要

Luyia is a macrolanguage in central Kenya. The Luyia languages, like other Bantu languages, have a complex morphological system. This system can be leveraged to aid in part of speech tagging. Bag-of-characters taggers trained on a source Luyia language can be applied directly to another Luyia language with some degree of success. In addition, mixing data from the target language with data from the source language does produce more accurate predictive models compared to models trained on just the target language data when the training set size is small. However, for both of these tagging tasks, models involving the more distantly related language, Tiriki, are better at predicting part of speech tags for Wanga data. The models incorporating Bukusu data are not as successful despite the closer relationship between Bukusu and Wanga. Overlapping vocabulary between the Wanga and Tiriki corpora as well as a bias towards open class words help Tiriki outperform Bukusu.
机译:吕伊是肯尼亚中部的一宏图语。与其他班图语言一样,瑞典语言具有复杂的形态系统。该系统可以利用以帮助部分语音标记。在源紫地语言培训的人物袋式标签可以直接应用于另一份柳类语言,以一定程度的成功。此外,与来自源语言的数据与来自源语言数据的数据混合数据确实会产生更准确的预测模型,而当训练集大小很小时,与目标语言数据的培训的模型相比产生更准确的预测模型。然而,对于这两个标记任务,涉及更远距离相关语言Tiriki的模型更好地预测Wanga数据的一部分语音标签。虽然Bukusu和Wanga之间的关系更紧密,但包含Bukusu数据的模型并不成功。王子和蒂里基集团之间的重叠词汇以及对开放类词语的偏见有助于蒂里基优于Bukusu。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号