The feature is a key to the tasks of emotional analysis and opinion mining. Particularly for unsupervised text clustering task, the text feature quality directly affects the clustering results. This paper studies three kinds of semantic features, namely nouns features, noun phrase features, semantic role features and their role on the text topic clustering. And considering the compatibility between the different features, a method is proposed to eliminate redundant features. The method can effectively remove redundant features to improve the clustering accuracy. Also another method is proposed based on semantic role labeling to directly and effectively locate word features for topic clustering. The experimental results indicate that the method is direct and effective, and a new approach to feature selection method is provided.%特征是一切观点挖掘和情感分析任务的关键所在.对于无监督的文本聚类任务,文本特征的优劣直接影响聚类效果.砻察三种语义特征(名词、名词短语、语义角色)对主题聚类的作用以及不同特征之间的相容关系,提出一种消除冗余特征的方法.该方法能有效地去除冗余特征,提高聚类精度.同时还提出一种基于语义角色标注的直接定位有效词特征的聚类方法,实验表明该方法是直接的和有效的,并为特征选择方法提供了新思路.
展开▼