首页> 外文会议>International conference on world wide web >Fast Topic Discovery From Web Search Streams
【24h】

Fast Topic Discovery From Web Search Streams

机译:通过Web搜索流快速发现主题

获取原文

摘要

Web search involves voluminous data streams that record millions of users' interactions with the search engine. Recently latent topics in web search data have been found to be critical for a wide range of search engine applications such as search personalization and search history warehousing. However, the existing methods usually discover latent topics from web search data in an offline and retrospective fashion. Hence, they are increasingly ineffective in the face of the ever-increasing web search data that accumulate in the format of online streams. In this paper, we propose a novel probabilistic topic model, the Web Search Stream Model (WSSM), which is delicately calibrated for handling two salient features of the web search data: it is in the format of streams and in massive volume. We further propose an efficient parameter inference method, the Stream Parameter Inference (SPI) to efficiently train WSSM with massive web search streams. Based on a large-scale search engine query log, we conduct extensive experiments to verify the effectiveness and efficiency of WSSM and SPI. We observe that WSSM together with SPI discovers latent topics from web search streams faster than the state-of-the-art methods while retaining a comparable topic modeling accuracy.
机译:Web搜索涉及大量数据流,这些数据流记录了数百万用户与搜索引擎的互动。最近发现,网络搜索数据中的潜在主题对于广泛的搜索引擎应用(例如搜索个性化和搜索历史仓库)至关重要。但是,现有方法通常以脱机和追溯方式从Web搜索数据中发现潜在主题。因此,面对以在线流格式累积的不断增长的Web搜索数据,它们的效率越来越低。在本文中,我们提出了一种新颖的概率主题模型,即Web搜索流模型(WSSM),该模型经过精心校准以处理Web搜索数据的两个显着特征:它以流的形式且数量庞大。我们进一步提出了一种有效的参数推断方法,即流参数推断(SPI),以通过大量的Web搜索流有效地训练WSSM。基于大规模搜索引擎查询日志,我们进行了广泛的实验,以验证WSSM和SPI的有效性和效率。我们观察到,WSSM与SPI一起从Web搜索流中发现潜在主题的速度比最新方法快,同时保持了可比拟的主题建模准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号