首页> 外文会议>Mexican international conference on artificial intelligence >A New Corpus of the Russian Social Network News Feed Paraphrases: Corpus Construction and Linguistic Feature Analysis
【24h】

A New Corpus of the Russian Social Network News Feed Paraphrases: Corpus Construction and Linguistic Feature Analysis

机译:俄罗斯社交网络新闻提要释义的新语料库:语料库构建和语言特征分析

获取原文

摘要

In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.ru, collected from official news headlines only, against the constructed dataset, and explore its linguistic and pragmatic features.
机译:在本文中,我们介绍了一种新的源自社交网络新闻源的俄语释义语料,并对其进行了初步分析。大多数媒体代理机构将其新闻报道发布在社交网络上的页面上,消息的标题通常与该机构官方网站上相应新闻文章的标题相同。但是,有时这些标题对是不同的,在这种情况下,来自社交网络的标题可被视为原始标题的压缩或释义。换句话说,来自社交网络的这种新闻馈送是文本蕴含的丰富资源,并且正如本文所显示的,各种语言现象,例如讽刺,预设和注意力吸引标记。我们收集描述的标题对,并基于它们构建俄罗斯社交网络新闻提要释义语料库。我们针对构建的数据集测试仅从官方新闻头条收集的,在其他现有的俄罗斯复述语料库ParaPhraser.ru上训练的复述检测模型,并探讨其语言和语用特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号