首页> 外文会议>International Conference on Computational Linguistics and Intelligent Text Processing >Evaluation of Internal Validity Measures in Short-Text Corpora
【24h】

Evaluation of Internal Validity Measures in Short-Text Corpora

机译:简短文本基层内部有效性措施评估

获取原文

摘要

Short texts clustering is one of the most difficult tasks in natural language processing due to the low frequencies of the document terms. We are interested in analysing these kind of corpora in order to develop novel techniques that may be used to improve results obtained by classical clustering algorithms. In this paper we are presenting an evaluation of different internal clustering validity measures in order to determine the possible correlation between these measures and that of the F-Measure, a well-known external clustering measure used to calculate the performance of clustering algorithms. We have used several short-text corpora in the experiments carried out. The obtained correlation with a particular set of internal validity measures let us to conclude that some of them may be used to improve the performance of text clustering algorithms.
机译:短信集群是由于文档术语的低频频率导致的自然语言处理中最困难的任务之一。我们有兴趣分析这些语料库,以开发可用于改进通过古典聚类算法获得的结果的新技术。在本文中,我们正在评估不同内部聚类有效性措施,以便确定这些措施与F测量值之间的可能相关性,用于计算聚类算法性能的众所周知的外部聚类测量。我们在执行的实验中使用了几种短文本语料库。与特定内部有效性措施的相关性,让我们得出结论,其中一些可以用于改善文本聚类算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号