首页> 外文会议>Program Comprehension, 2009. ICPC '09 >Syntax tree fingerprinting for source code similarity detection
【24h】

Syntax tree fingerprinting for source code similarity detection

机译:语法树指纹识别,用于源代码相似性检测

获取原文

摘要

Numerous approaches based on metrics, token sequence pattern-matching, abstract syntax tree (AST) or program dependency graph (PDG) analysis have already been proposed to highlight similarities in source code: in this paper we present a simple and scalable architecture based on AST fingerprinting. Thanks to a study of several hashing strategies reducing false-positive collisions, we propose a framework that efficiently indexes AST representations in a database, that quickly detects exact (w.r.t source code abstraction) clone clusters and that easily retrieves their corresponding ASTs. Our aim is to allow further processing of neighboring exact matches in order to identify the larger approximate matches, dealing with the common modification patterns seen in the intra-project copy-pastes and in the plagiarism cases.
机译:已经提出了许多基于度量,令牌序列模式匹配,抽象语法树(AST)或程序依赖图(PDG)分析的方法来强调源代码中的相似性:在本文中,我们提出一种基于AST的简单且可扩展的体系结构指纹。多亏了对减少误报冲突的多种哈希策略的研究,我们提出了一个框架,该框架可有效索引数据库中的AST表示形式,可快速检测准确的(无源代码抽象)克隆集群,并轻松检索其对应的AST。我们的目标是允许对邻近的精确匹配项进行进一步处理,以便识别较大的近似匹配项,从而处理在项目内复制粘贴和抄袭案例中看到的常见修饰模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号