...
首页> 外文期刊>Procedia Computer Science >Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation
【24h】

Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation

机译:使用Wikipedia构建有意识标记的语料库,以进行有监督的词义消歧

获取原文
           

摘要

Building of sense-tagged data is a main challenge for supervised techniques that achieved promising results in word sense disambiguation. The manual building of sense-tagged data is a labor and a time-consuming task because each ambiguous word has to be labeled in collected contexts by linguistic experts. Therefore, this paper proposes a knowledge-based method for building the Arabic sense-tagged corpus from Wikipedia. The method starts with mapping Arabic WordNet and Wikipedia to select the Wikipedia article for the corresponding sense in WordNet. In this mapping step, the cross-lingual method is used to measure the similarity between features of a Wikipedia article and a WordNet sense separately. Then, the incoming-links of Wikipedia articles are exploited to extract instances for the sense of each ambiguous word in WordNet. For handling the lack of instances of some articles in Wikipedia, the multiword-based technique is proposed to increase a number of instances for each concept. Experimental results show that the cross-lingual method outperforms monolingual method that is based on Arabic features only. The sense-tagged corpus is created for 50 ambiguous words yielding 148 senses with 30,961 instances.
机译:建立带有标签的数据是监督技术的主要挑战,该技术在消除歧义上取得了可喜的结果。手动构建带有意义标签的数据是一项艰巨且耗时的任务,因为每个不明确的单词都必须由语言专家在收集的上下文中进行标注。因此,本文提出了一种基于知识的方法,用于建立维基百科中带有阿拉伯语标记的语料库。该方法从映射阿拉伯语WordNet和Wikipedia开始,以针对WordNet中的相应含义选择Wikipedia文章。在此映射步骤中,跨语言方法用于分别衡量Wikipedia文章的功能和WordNet感官之间的相似度。然后,利用Wikipedia文章的传入链接提取实例,以了解WordNet中每个歧义词的含义。为了处理Wikipedia中某些文章的实例不足的情况,提出了基于多字的技术来增加每个概念的实例数量。实验结果表明,跨语言方法优于仅基于阿拉伯语功能的单语言方法。为50个歧义词创建了带有感官标记的语料库,在30,961个实例中产生了148个感官。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号