首页> 外文会议>Annual meeting of the Association for Computational Linguistics >TWEETQA: A Social Media Focused Question Answering Dataset
【24h】

TWEETQA: A Social Media Focused Question Answering Dataset

机译:TWEETQA:以社交媒体为中心的问答数据集

获取原文

摘要

With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on realtime knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text. ~1
机译:随着社交媒体变得越来越流行,在社交媒体上报道了大量新闻和实时事件,开发自动问答系统对于依赖实时知识的许多应用程序的有效性至关重要。虽然先前的数据集主要集中在新闻和Wikipedia等正式文本的问答(QA)上,但我们还是通过社交媒体数据展示了第一个大规模的QA数据集。为了确保我们收集的推文有用,我们只收集新闻工作者用来撰写新闻文章的推文。然后,我们要求人类注释者在这些推文上写问题和答案。与其他QA数据集(如SQuAD)中的答案是可提取的不同,我们允许答案是抽象的。我们展示了两个最近提出的在形式文本上表现良好的神经模型,在应用于我们的数据集时,其性能受到限制。此外,即使是经过微调的BERT模型也仍然远远落后于人类绩效。因此,我们的结果表明需要针对社交媒体文本的改进的质量检查系统。 〜1

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号