首页> 外文会议>EACL Workshop on Semantic Analysis in Social Media 2012 >Unsupervised Part-of-Speech Tagging in Noisy and Esoteric Domains with a Syntactic-Semantic Bayesian HMM
【24h】

Unsupervised Part-of-Speech Tagging in Noisy and Esoteric Domains with a Syntactic-Semantic Bayesian HMM

机译:带有语义语义贝叶斯HMM的嘈杂和深奥域中的无监督词性标记

获取原文
获取原文并翻译 | 示例

摘要

Unsupervised part-of-speech (POS) tagging has recently been shown to greatly benefit from Bayesian approaches where HMM parameters are integrated out, leading to significant increases in tagging accuracy. These improvements in unsupervised methods are important especially in specialized social media domains such as Twitter where little training data is available. Here, we take the Bayesian approach one step further by integrating semantic information from an LDA-like topic model with an HMM. Specifically, we present Part-of-Speech IDA (POSLDA), a syntactically and semantically consistent generative probabilistic model. This model discovers POS specific topics from an unla-belled corpus. We show that this model consistently achieves improvements in unsupervised POS tagging and language modeling over the Bayesian HMM approach with varying amounts of side information in the noisy and esoteric domain of Twitter.
机译:最近,无监督的词性(POS)标记已被证明可以从贝叶斯方法中受益,贝叶斯方法将HMM参数进行了整合,从而大大提高了标记精度。无监督方法的这些改进非常重要,尤其是在专门的社交媒体领域(例如Twitter)中,培训数据很少。在这里,我们通过将来自类似LDA的主题模型的语义信息与HMM集成,使贝叶斯方法更进一步。具体来说,我们提出词性IDA(POSLDA),这是一种句法和语义上一致的生成概率模型。该模型从无言语料库中发现POS特定主题。我们表明,该模型通过贝叶斯HMM方法在Twitter嘈杂而深奥的领域中使用了不同数量的辅助信息,从而持续改进了无监督POS标记和语言建模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号