首页> 外文会议>International Conference on Service Systems and Service Management >Online Detection of Domain-Specific New Words in Text Streams
【24h】

Online Detection of Domain-Specific New Words in Text Streams

机译:在线检测文本流中特定于域的新单词

获取原文

摘要

With the tremendous development of Internet, many domain-specific new words appear in various media text streams such as forums, Sina Weibo, Wechat, etc. These new words are always a group of important words in specific domains and are significant for NLP tasks. Most existing models have time-consuming processing or cannot handle out of vocabulary (OOV) words on streaming and online scenes. In this paper, we propose an unsupervised method, D-TopWords with Gaussian LDA, to perform online detection of domain-specific new words effectively. Different from traditional new words detection models, our method is a joint statistical model based on a finite word dictionary without any handcraft features. By further introducing Gaussian LDA into our model, we solve properly the problem of OOV words from new text streams. Experimental results show that our work can successfully extract domain-specific new words and it has a better performance in online detection task than some state-of-the-art methods.
机译:随着互联网的巨大发展,许多域的新词出现在各种媒体文本流中,如论坛,新浪微博,微信等。这些新单词始终是特定域中的一组重要词语,并且对于NLP任务很重要。大多数现有模型都有耗时的处理或无法处理流和在线场景上的词汇(OOV)单词。在本文中,我们提出了一种无人监督的方法,与高斯LDA的D-Topwords,有效地在线检测域特定的新单词。与传统的新单词检测模型不同,我们的方法是基于有限文字字典的联合统计模型,没有任何手动功能。通过进一步将高斯LDA进入我们的模型,我们妥善解决新文本流的OOV单词问题。实验结果表明,我们的工作可以成功提取特定于域的新词,它在在线检测任务中具有比某些最先进的方法更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号