首页> 外文期刊>Mathematical Problems in Engineering >Distributional Similarity for Chinese: Exploiting Characters and Radicals
【24h】

Distributional Similarity for Chinese: Exploiting Characters and Radicals

机译:汉语的分布相似性:利用汉字和部首

获取原文
获取原文并翻译 | 示例
           

摘要

Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.
机译:分布相似性作为一种自动手段来应对普遍存在的稀疏数据问题,在自然语言处理领域引起了极大的关注。作为对数语言,汉字由字符组成,每个字符由一个或多个部首组成。字符的含义通常与包含它们的单词高度相关。同样,部首经常对字符的含义做出可预测的贡献:具有相同组成部分的字符往往具有相似或相关的含义。在本文中,我们利用汉语的这些特性来改进汉语单词相似度的计算。给定一个内容词,我们首先基于一个大型语料库和一个相似度分数来提取相似词以进行排名。然后根据相似词和目标词之间共享的字符和成分来调整此等级。在两个黄金标准数据集上进行的实验表明,调整后的等级比原始等级更好,更接近于人类的判断。除了定量评估外,我们还研究了利用语言现象进行错误解释的原因。

著录项

  • 来源
    《Mathematical Problems in Engineering》 |2012年第8期|347257.1-347257.11|共11页
  • 作者单位

    School of Computer Science, Leshan Normal University, 614004 Leshan, China;

    Department of Informatics, Sussex University, Brighton BN1 9QJ, UK;

    Institute of Computational Linguistics, Peking University, 100871 Beijing, China;

    Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge CB3 9DB, UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号