...
首页> 外文期刊>ACM journal of data and information quality >A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction
【24h】

A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction

机译:用于人群辅助文本标签和提取的概率集成系统

获取原文
获取原文并翻译 | 示例
           

摘要

The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text annotations in a database. The current state-of-the-art structured prediction methods, however, are likely to contain errors and it is important to be able to manage the overall uncertainty of the database. On the other hand, the advent of crowdsourcing has enabled humans to aid machine algorithms at scale. In this article, we introduce pi-CASTLE, a system that optimizes and integrates human and machine computing as applied to a complex structured prediction problem involving Conditional Random Fields (CRFs). We propose strategies grounded in information theory to select a token subset, formulate questions for the crowd to label, and integrate these labelings back into the database using a method of constrained inference. On both a text segmentation task over academic citations and a named entity recognition task over tweets we show an order of magnitude improvement in accuracy gain over baseline methods.
机译:近年来,文本数据的数量呈指数级增长,从而产生存储数据库中的文本注释的自动信息提取方法。然而,目前的最先进的结构预测方法可能包含错误,并且能够管理数据库的整体不确定性是重要的。另一方面,众包的出现使人类能够以规模挽救机器算法。在本文中,我们介绍了PI-Castle,一个系统,该系统优化和集成了人员和机器计算,其应用于涉及条件随机字段(CRF)的复杂结构预测问题。我们提出了在信息理论中接地的策略,以选择令牌子集,制定人群的问题,并将这些栏写集成回数据库中,使用受约束推断。关于学术引用的文本分段任务和推文的名称实体识别任务,我们在基线方法上显示了准确性增益的大小提高顺序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号