首页> 外文会议>International conference on computational linguistics >Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource
【24h】

Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

机译:濒危语言的词性标注:并行Griko-Italian资源

获取原文

摘要

Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language. Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supcrviscd method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set. we achieve improvements of more than 21 percentage points.
机译:词性(POS)标记的大多数工作都集中在高资源语言上,或者通过模拟研究来检查低资源和主动学习的设置。我们在一种实际濒临灭绝的语言上评估POS标记技术。格里科。我们提供的资源包含Griko中的114种叙述以及意大利语中的句子级翻译,并为测试集提供了金色注释。基于先前收集的小型语料库,我们研究了几种传统方法以及利用单语数据或项目跨语言POS标签的方法。我们表明,半支持方法与跨语言传输的结合更适合于这种极具挑战性的设置,最佳标记器可达到72.9%的准确性。通过应用的主动学习方案,我们可以使用该方案来收集测试集上的句子级注释。我们实现了超过21个百分点的改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号