...
首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Keep It Simple with Time: A Reexamination of Probabilistic Topic Detection Models
【24h】

Keep It Simple with Time: A Reexamination of Probabilistic Topic Detection Models

机译:与时俱进:概率主题检测模型的重新检验

获取原文
获取原文并翻译 | 示例
           

摘要

Topic detection (TD) is a fundamental research issue in the Topic Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate the wheat from the chaff among the thousands of incoming news streams. In this paper, we propose a simple and effective topic detection model called the temporal Discriminative Probabilistic Model (DPM), which is shown to be theoretically equivalent to the classic vector space model with feature selection and temporally discriminative weights. We compare DPM to its various probabilistic cousins, ranging from mixture models like von-Mises Fisher (vMF) to mixed membership models like Latent Dirichlet Allocation (LDA). Benchmark results on the TDT3 data set show that sophisticated models, such as vMF and LDA, do not necessarily lead to better results; in the case of LDA, notably worst performance was obtained under variational inference, which is likely due to the significantly large number of LDA model parameters involved for document-level topic detection. On the contrary, using a relatively simple time-aware probabilistic model such as DPM suffices for both offline and online topic detection tasks, making DPM a theoretically elegant and effective model for practical topic detection.
机译:主题检测(TD)是主题检测和跟踪(TDT)社区中的一项基本研究问题,具有实际意义; TD帮助分析师从成千上万的新闻流中脱颖而出。在本文中,我们提出了一个简单有效的主题检测模型,称为时间区分概率模型(DPM),在理论上与具有特征选择和时间区分权重的经典向量空间模型等效。我们将DPM与它的各种概率表亲进行了比较,从von-Mises Fisher(vMF)的混合模型到Latent Dirichlet Allocation(LDA)的混合成员模型。 TDT3数据集的基准结果表明,复杂的模型(例如vMF和LDA)不一定会带来更好的结果。在LDA的情况下,根据变量推断获得的性能最差,这很可能是由于涉及文档级主题检测的LDA模型参数数量过多。相反,对于离线和在线主题检测任务,使用相对简单的具有时间感知能力的概率模型(例如DPM)就足够了,这使DPM在理论上是优雅且有效的实用主题检测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号