【24h】

Fast Plagiarism Detection System

机译:快速抄袭检测系统

获取原文
获取原文并翻译 | 示例

摘要

We have developed a new fast algorithm for plagiarism detection. Our method is based on indexing the code database with a suffix array, which allows rapid retrieval of blocks of code that are similar to the query file. This idea makes rapid pairwise file comparison possible. Evaluation shows that this algorithm's quality is not worse than the quality of existing widely used methods, while its speed performance is much higher. For the all-against-all problem our method achieves O(γn) (with suffix arrays) or O(n) (with suffix trees) average time for the comparison phase. Traditional methods, such as JPlag, need at least O((n/N)~2N~2) = O(n~2) average time for the same task. In addition, computing the similarity matrix takes O(N~2) additional time, and this cannot be improved, as it is also the size of the output.
机译:我们已经开发了一种用于fast窃检测的新的快速算法。我们的方法基于使用后缀数组为代码数据库建立索引,从而可以快速检索类似于查询文件的代码块。这个想法使快速的成对文件比较成为可能。评估表明,该算法的质量并不比现有广泛使用的方法的质量差,同时其速度性能要高得多。对于所有问题,我们的方法在比较阶段达到O(γn)(带有后缀数组)或O(n)(带有后缀树)平均时间。 JPlag等传统方法至少需要O((n / N)〜2N〜2)= O(n〜2)平均时间才能完成同一任务。另外,计算相似度矩阵需要花费O(N〜2)的额外时间,并且由于输出大小的原因而无法改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号