首页> 外文会议>Conference on empirical methods in natural language processing >Estimation of Discourse Segmentation Labels from Crowd Data
【24h】

Estimation of Discourse Segmentation Labels from Crowd Data

机译:从人群数据估计话语细分标签

获取原文

摘要

For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.
机译:对于涉及独立判断的注释任务,已使用概率模型从数据中推断出地面真相标签,在该数据中,许多注释者会标记相同的项目。已经表明,此类模型所产生的结果要优于获得多数表决的结果,但尚未应用于顺序数据。基于观察者注释段的长度都趋于数个发音的观察,我们提出了两种方法来从顺序注释中推断出地面真相标签,在这些假设中,我们认为判断不是独立的。数据由人群标签组成,用于注释话语段边界。新方法扩展了隐马尔可夫模型以放宽独立性假设。两种方法截然不同,因此两者所提出的肯定标签被视为事实依据。另外,使用度量标准检查模型的结果,该度量标准测试注释者相对于给定模型的准确性在不同对话之间是否保持一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号