首页> 外文会议>Workshop on the use of computational methods in the study of endangered languages >Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region
【24h】

Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

机译:口语和书面komi eLan Corpora的即时注释,濒临灭绝的牧人海域语言

获取原文

摘要

The paper describes work-in-progress by the Izhva Komi language documentation project, which records new spoken language data, digitizes available recordings and annotate these multimedia data in order to provide a comprehensive language corpus as a databases forfuture research on and for this endangered - and under-described - Uralic speech community. While working with a spoken variety and in the framework of documentary linguistics, we apply language technology methods and tools, which have been applied so far only to normalized written languages. Specifically, we describe a script providing interactivity between ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora, and different morphosyntactic analysis modules implemented as Finite State Transducers and Constraint Grammar for rule-based morphosyntactic tagging and disambiguation. Our aim is to challenge current manual approaches in the annotation of language documentation corpora.
机译:本文介绍了Izhva Komi语言文档项目的工作,它记录了新的口语数据,数字化可用记录并注释这些多媒体数据,以便为数据库提供全面的语言语料库,以便对此濒临灭绝的数据库和对此濒临灭绝的研究和下面描述的 - 尿路讲话界。在使用纪录片语言学框架的口语和框架中,我们应用了语言技术方法和工具,该工具仅适用于规范化的书面语言。具体而言,我们描述了一种提供ELAN之间的交互性的脚本,用于注释和呈现多模式语料库的图形用户界面工具,以及实现为有限状态传感器和约束语法的不同的形态学分析模块,用于基于规则的语气交互标记和消除歧义。我们的目的是挑战当前在语言文件的注释中的手工方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号