首页> 外文会议>International conference on very large data bases >MLJ: Language-Independent Real-Time Search of Tweets Reported by Media Outlets and Journalists
【24h】

MLJ: Language-Independent Real-Time Search of Tweets Reported by Media Outlets and Journalists

机译:MLJ:媒体出口商和记者报道的推文的语言独立实时搜索

获取原文

摘要

In this demonstration, we introduce MLJ, a first Web-based system that enables users to search any topic of latest tweets posted by media outlets and journalists beyond languages. Handling multilingual tweets in real time involves many technical challenges: language barrier, sparsity of words, and realtime data stream. To overcome the language barrier and the sparsity of words, MLJ harnesses CL-ESA, a Wikipedia-based language-independent method to generate a vector of Wikipedia pages (entities) from an input text. To continuously deal with tweet stream, we propose one-pass DP-means, an online clustering method based on DP-means. Given a new tweet as an input, MLJ generates a vector using CL-ESA and classifies it into one of clusters using one-pass DP-means. By interpreting a search query as a vector, users can instantly search clusters containing latest related tweets from the query without being aware of language differences. MLJ as of March 2014 supports nine languages including English, Japanese, Korean, Spanish, Portuguese, German, French, Italian, and Arabic covering 24 countries.
机译:在此演示中,我们介绍MLJ,这是第一个基于Web的系统,它使用户可以搜索媒体和新闻工作者发布的最新推文的任何主题,而不仅仅是语言。实时处理多语言推文涉及许多技术挑战:语言障碍,单词稀疏性和实时数据流。为了克服语言障碍和单词稀疏性,MLJ利用CL-ESA(基于Wikipedia的语言独立方法)从输入文本生成Wikipedia页面(实体)的向量。为了持续处理tweet流,我们提出了一种单向DP-means,一种基于DP-means的在线聚类方法。给定一条新的推文作为输入,MLJ使用CL-ESA生成向量,并使用一次通过DP手段将其分类为一个集群。通过将搜索查询解释为向量,用户可以立即搜索包含来自查询的最新相关推文的群集,而无需了解语言差异。截至2014年3月,MLJ支持9种语言,包括英语,日语,韩语,西班牙语,葡萄牙语,德语,法语,意大利语和阿拉伯语,覆盖24个国家/地区。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号