—Sub-topic clustering is a crucial step in multidocument summarization. The traditional k-means clustering method is not effective for topic clustering because the number of clusters k must be given in advance. This paper describes a new method for sub-topic clustering based on semi-supervised learning: the method firstly partition the set of sentences into disjoint subsets, each of which contained sentences covering exactly one topic, and labels the sentences which have high scores in the topic, then use the method of constrained-k-means to decide the number of topics, and finally get the sub-topic sets by k- Means clustering. This algorithm can dynamically generate the number of k-means clustering, and the experiment result indicates that the accuracy of clustering is improved.
展开▼