首页> 外文会议>International Conference on Cyber and IT Service Management >The application of Centroid linkage hierarchical method and Hill climbing method in comments clustering online discussion forum
【24h】

The application of Centroid linkage hierarchical method and Hill climbing method in comments clustering online discussion forum

机译:质心链接分层方法和爬山方法在评论聚类在线讨论论坛中的应用

获取原文

摘要

Several problems are rised in order to enhance the effectiveness of communication in online discussion. The similarity and repetition of comments in terms of questions in the sentences or text meanings as well as triggers the emerging of miscommunication amongst participants in a forum discussion are investigated. Moreover, some comments seems are ignored or not been touched by other participants and in advance the effective used of forum discussion as knowledge acquisition and sharing can not be achieved. This paper studies the application of Centroid Linkage Hierarchical Method (CLHM) Algorithm and Hill Climbing methods in findings the similarity value of participants comments and clustering based on it. The analysis follows the text mining process including text processing, text transformation, attribute selection and pattern discovery. In order to test the validity and accuracy of both application methods, confusion matrix in euclidean and consine similarity were calculated. As the results, from variety numbers of comments groups, including Bersosial.com in 17 comments, Indowebster.com in 27 comments and Teknojurnal.com in 51 data comments provided the value of well-separated clusters performed. This testing also defined that the alteration of threshold and altitude did not affect the clustering process. From the calculation of F-measure values in confusion matrix explained that consine similarity provided better result that euclidean distance where teknojurnal 0.89, indowebster,com 0.71 and bersosial.com 0.57. This showed that CLHM algorithm and Hill climbing methods are effective approaches and have been successfully applied in comments clustering of online discussion.
机译:为了提高在线讨论中交流的有效性,出现了一些问题。调查了句子或文本含义中的问题在评论上的相似性和重复性,以及引发了论坛讨论参与者之间沟通不畅的情况。而且,一些评论似乎被其他参与者忽略或未触及,并且由于无法实现知识的获取和共享,因此预先有效地利用了论坛讨论。本文研究了质心链接分层方法(CLHM)算法和爬坡方法在发现参与者评论的相似性值和基于其的聚类中的应用。分析遵循文本挖掘过程,包括文本处理,文本转换,属性选择和模式发现。为了检验两种应用方法的有效性和准确性,计算了欧几里得中的混淆矩阵和余弦相似度。结果,来自各种各样的评论组,包括Bersosial.com的17条评论,Indowebster.com的27条评论和Teknojurnal.com的51条数据评论提供了执行良好的群集分离的价值。该测试还定义了阈值和高度的变化不会影响聚类过程。从混淆矩阵中F度量值的计算可以看出,余弦相似度提供了更好的结果,欧氏距离为Teknojurnal为0.89,indowebster,com为0.71,bersosial.com为0.57。这表明CLHM算法和爬山方法是有效的方法,并已成功地应用于在线讨论的评论聚类中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号