【24h】

Building a Collocation Net

机译:建立搭配网

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an approach to build a novel two-level collocation net, which enables calculation of the collocation relationship between any two words, from a large raw corpus. The first level consists of atomic classes (each atomic class consists of one word and feature bigram), which are clustered into the second level class set. Each class in both levels is represented by its collocation candidate distribution, extracted from the linguistic analysis of the raw training corpus, over possible collocation relation types. In this way, all the information extracted from the linguistic analysis is kept in the collocation net. Our approach applies to both frequently and less-frequently occurring words by providing a clustering mechanism and resolve the data sparseness problem through the collocation net. Experimentation shows that the collocation net is efficient and effective in solving the data sparseness problem and determining the collocation relationship between any two words.
机译:本文提出了一种构建新颖的两级搭配网络的方法,该网络可以从大型原始语料库计算任意两个单词之间的搭配关系。第一级由原子类组成(每个原子类由一个单词和特征二元组组成),它们被聚集成第二级类集。从可能的搭配关系类型上,从原始训练语料库的语言分析中提取的搭配候选分布表示两个级别中的每个班级。这样,将从语言分析中提取的所有信息都保存在配置网中。通过提供聚类机制,我们的方法既适用于频繁出现的单词,也适用于不经常出现的单词,并通过搭配网络解决数据稀疏问题。实验表明,搭配网络在解决数据稀疏问题和确定任意两个单词之间的搭配关系方面是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号