...
首页> 外文期刊>Language Resources and Evaluation >Word sense learning based on feature selection and MDL principle
【24h】

Word sense learning based on feature selection and MDL principle

机译:基于特征选择和MDL原理的词义学习

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.
机译:在本文中,我们提出了一种词义学习算法,该算法能够进行无监督的特征选择和聚类数识别。用于词义学习的特征选择建立在基于熵的过滤器上,并形式化为约束优化问题,其输出是一组重要特征。聚类数识别基于具有基于MDL准则的高斯混合模型,并通过最小化准则来推断最佳模型顺序。为了评估具有基础真实性类的学习感官聚类之间的接近性,我们引入了一种加权F度量来对从聚类中重建类所需要的工作进行建模。实验表明,该算法能够检索重要特征,自动粗略估计类数,并且在加权F度量方面优于其他算法。此外,我们还将尝试将算法应用于将新词添加到中文词库中的特定任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号