首页> 外文会议>Proceedings of the ACM special interest group for information technology education conference >Analysis and Extraction of Sentence-Level Paraphrase Sub-Corpus in CS Education
【24h】

Analysis and Extraction of Sentence-Level Paraphrase Sub-Corpus in CS Education

机译:CS教育中句级释义子语料库的分析与提取

获取原文
获取原文并翻译 | 示例

摘要

Since the advent of the Internet, plagiarism ha? become a widespread problem in student submissions. Paraphrasing is one of the several types of plagiarism employed by students to mask the original source. In this work, we construct a sub-corpus of paraphrased sentences by extracting all lightly and heavily revised sentences from the Corpus of Plagiarized Short Answers, using modified criteria for sentences. We then apply document similarity measures on this sub-corpus and derive some interesting features of this sub-corpus. Our findings suggest that this sub-corpus is more suited for testing paraphrase detection techniques by providing sentence-level paraphrasing samples instead of the file-level classification provided in the original corpus. Additional sentence samples may also be added to this sub-corpus to achieve variety and scale.
机译:自从互联网问世以来,窃哈?成为学生提交书中普遍存在的问题。释义是学生用来掩盖原始来源的几种of窃类型之一。在这项工作中,我们使用修改后的句子标准,从Pla窃简答语料库中提取所有轻度和大量修订的句子,从而构建了释义句子的子语料库。然后,我们在此子主体上应用文档相似性度量,并得出该子主体的一些有趣特征。我们的发现表明,该子语料库通过提供句子级别的意译样本而不是原始语料库中提供的文件级别分类,更适合于测试意译短语检测技术。也可以将其他句子样本添加到此子语料库中,以实现变化和规模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号