...
首页> 外文期刊>Journal of the American Society for Information Science >Methods for Identifying Versioned and Plagiarized Documents
【24h】

Methods for Identifying Versioned and Plagiarized Documents

机译:识别版本化和抄袭的文档的方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The widespread use of on-line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of Co-derivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents. However, for fingerprinting parameters must be carefully chosen, and even so the identity measure is clearly superior.
机译:文本在线发布的广泛使用促进了文档的多个版本的存储和文档在多个位置的镜像,并大大简化了窃他人工作的任务。我们评估了两种用于搜索集合的方法,以查找具有代码衍生性的文档,即彼此的版本或抄袭。第一个是排名族,使用信息检索技术。为了扩展这个家族,我们提出了一种身份识别措施,该措施专门用于识别共衍生文档。第二个是指纹家族,使用散列来生成紧凑的文档描述,然后可以将其与集合中文档的指纹进行比较。我们介绍了一种评估这些技术有效性的新方法,并在实践中进行了演示。通过对两个集合的实验,我们证明了身份度量和最佳指纹技术都能够准确识别代码衍生文档。但是,对于指纹识别,必须仔细选择参数,即使如此,身份度量也显然是优越的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号