首页> 外文期刊>Neural computing & applications >Research on topic discovery technology for Web news
【24h】

Research on topic discovery technology for Web news

机译:Web新闻主题发现技术研究

获取原文
获取原文并翻译 | 示例
           

摘要

With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space model, but it has the defects such as high dimension and data sparsity. However, the latent semantic analysis can map the high-dimensional and sparse words to k-dimensional semantic space and improve the similarity of the news of the same topic by the semantic correlation between words. In this paper, Web news topic discovery is studied. First, the set of Web news text is vectored and the weight of each feature in the texts is calculated by improved TFIDF. After the original text vector set is analysed by latent semantic analysis, the semantic relation is fully exploited between the texts and the words, and the news topics are extracted by clustering approach. For the extraction of sub-topics, the co-occurrence of words is used to display the sub-topics. In essence, the sub-topic vector is established through these co-occurrence words. The experimental results show that the proposed method can effectively capture the current hot topics of Web news and related sub-topics. It is meaningful for the technology of information retrieval and data mining.
机译:随着信息技术的发展,Web新闻已成为信息传播的主要方式。 Web新闻主题发现对于用户很有用,可以快速找到有价值的信息,并且其研究不断提高。传统主题发现研究基于矢量空间模型,但它具有高维和数据稀疏性等缺陷。然而,潜在语义分析可以将高维和稀疏词语映射到k维语义空间,并通过单词之间的语义相关性来提高相同主题的新闻的相似性。在本文中,研究了Web新闻主题发现。首先,将传送到的Web新闻文本并通过改进的TFIDF来计算文本中的每个功能的权重。通过潜在语义分析分析原始文本向量集之后,在文本和单词之间完全利用语义关系,并通过聚类方法提取新闻主题。为了提取子主题,使用单词的共同发生来显示子主题。实质上,通过这些共同发生的单词建立子主题向量。实验结果表明,该方法可以有效地捕获Web新闻和相关子主题的当前热门话题。信息检索和数据挖掘技术是有意义的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号