...
首页> 外文期刊>Computer speech and language >Unsupervised and supervised exploitation of semantic domains in lexical disambiguation
【24h】

Unsupervised and supervised exploitation of semantic domains in lexical disambiguation

机译:词汇歧义化中语义域的无监督和监督利用

获取原文
获取原文并翻译 | 示例
           

摘要

Domains are common areas of human discussion, such as economics, politics, law, science, etc., which are at the basis of lexical coherence. This paper explores the dual role of domains in word sense disambiguation (WSD). On one hand, domain information provides generalized features at the paradigmatic level that are useful to discriminate among word senses. On the other hand, domain distinctions constitute a useful level of coarse grained sense distinctions, which lends itself to more accurate disambiguation with lower amounts of knowledge. In this paper we extend and ground the modeling of domains and the exploitation of Wordent domains, an extension of Wordent in which each synset is labeled with domain information. We propose a novel unsupervised probabilistic method for the critical step of estimating domain relevance for contexts, and suggest utilizing it within unsupervised domain driven disambiguation for word senses, as well as within a traditional supervised approach. The paper presents empirical assessments of the potential utilization of domains in WSD at a wide range of comparative settings, supervised and unsupervised. Following the dual role of domains we report experiments that evaluate both the extent to which domain information provides effective features for WSD, as well as the accuracy obtained by WSD at domain-level sense granularity. Furthermore, we demonstrate the potential for either avoiding or minimizing manual annotation thanks to the generalized level of information provided by domains.
机译:领域是人类讨论的共同领域,例如经济学,政治,法律,科学等,这些领域是词汇连贯性的基础。本文探讨了词义歧义消除(WSD)中领域的双重作用。一方面,领域信息提供了范式上的通用特征,这些特征可用于区分词义。另一方面,领域区分构成了有用的粗粒度意义区分,这有助于以较少的知识进行更准确的歧义消除。在本文中,我们扩展并奠定了域的建模和对Wordent域的利用的基础,Wordent域的扩展是Wordent的扩展,其中每个同义词集都标记有域信息。我们提出了一种新的无监督概率方法,用于估算上下文的领域相关性的关键步骤,并建议在无监督域驱动的词义歧义化以及传统的有监督方法中使用该方法。本文介绍了在有监督和无监督的各种比较设置下,WSD中域的潜在利用的实证评估。遵循域的双重作用,我们报告了一些实验,这些实验评估了域信息为WSD提供有效功能的程度,以及WSD在域级感知粒度下获得的准确性。此外,由于域提供的信息的普遍性,我们展示了避免或最小化手动注释的潜力。

著录项

  • 来源
    《Computer speech and language》 |2004年第3期|p. 275-299|共25页
  • 作者单位

    ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica, 1-38050, Trento, Italy;

    ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica, 1-38050, Trento, Italy;

    Department of Computer Science, Bar Ilan University, Ramat Gan, Israel;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号