首页> 外文期刊>Frontiers of computer science in China >Ranking and tagging bursty features in text streams with context language models
【24h】

Ranking and tagging bursty features in text streams with context language models

机译:使用上下文语言模型对文本流中的突发特征进行排名和标记

获取原文
获取原文并翻译 | 示例
           

摘要

Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.
机译:检测和使用突发模式来分析文本流已成为许多时间文本挖掘应用程序中的基本方法之一。到目前为止,大多数现有研究都集中在开发仅基于项频率变化来检测突发特征的方法。很少有人考虑到突发性特征的语义上下文,因此,检测到的突发性特征可能并不总是很有趣,并且可能难以解释。在本文中,我们建议使用语言建模方法对突发特征的上下文进行建模。我们提出了两种基于句子级上下文和文档级上下文估计上下文语言模型的方法。然后,我们使用上下文模型提出一种新颖的基于主题多样性的度量,以找到具有新闻价值的突发特征。我们还建议使用上下文模型自动将有意义的标签分配给突发特征。使用大量新闻文章,我们定量地显示了针对突发特征的上下文语言模型可以有效地帮助基于突发事件的新闻价值对突发特征进行排名,并分配有意义的标签来注释突发特征。我们还使用两个示例文本挖掘应用程序定性地展示了突发特征排名和标记的有用性。

著录项

  • 来源
    《Frontiers of computer science in China》 |2017年第5期|852-862|共11页
  • 作者单位

    School of Information, Renmin University of China, Beijing 100872, China,Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China, Beijing 100872, China;

    Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing 100144, China;

    School of Information, Renmin University of China, Beijing 100872, China,Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China, Beijing 100872, China;

    School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    bursty features; bursty features ranking; bursty feature tagging; context modeling;

    机译:突发特征突发特征排名;突发特征标签;上下文建模;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号