【24h】

Drop-out Conditional Random Fields for Twitter with Huge Mined Gazetteer

机译:带有巨大地名词典的Twitter的条件退出随机字段

获取原文

摘要

In named entity recognition task especially for massive data like Twitter, having a large amount of high quality gazetteers can alleviate the problem of training data scarcity. One could collect large gazetteers from knowledge graph and phrase embeddings to obtain high coverage of gazetteers. However, large gazetteers cause a side-effect called "feature under-training", where the gazetteer features overwhelm the context features. To resolve this problem, we propose the dropout conditional random fields, which decrease the influence of gazetteer features with a high weight. Our experiments on named entity recognition with Twitter data lead to higher F1 score of 69.38%, about 4% better than the strong baseline presented in Smith and Osborne (2006).
机译:在特别是针对Twitter之类的海量数据的命名实体识别任务中,拥有大量高质量的地名词典可以缓解训练数据稀缺的问题。一个人可以从知识图和短语嵌入中收集大型地名词典,以获得高覆盖率的地名词典。但是,大型地名词典会引起一种称为“功能不足训练”的副作用,其中,地名词典功能会淹没上下文功能。为解决此问题,我们提出了丢弃条件随机场,该条件场降低了权重较高的地名词典特征的影响。我们使用Twitter数据进行的命名实体识别实验导致F1得分更高,为69.38%,比Smith和Osborne(2006年)提出的强基准高出约4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号