【24h】

Confusion Detection in Code Reviews

机译:代码评论中的混淆检测

获取原文

摘要

Code reviews are an important mechanism for assuring quality of source code changes. Reviewers can either add general comments pertaining to the entire change or pinpoint concerns or shortcomings about a specific part of the change using inline comments. Recent studies show that reviewers often do not understand the change being reviewed and its context.Our ultimate goal is to identify the factors that confuse code reviewers and understand how confusion impacts the efficiency and effectiveness of code review(er)s. As the first step towards this goal we focus on the identification of confusion in developers' comments. Based on an existing theoretical framework categorizing expressions of confusion, we manually classify 800 comments from code reviews of the Android project. We observe that confusion can be reasonably well-identified by humans: raters achieve moderate agreement (Fleiss' kappa 0.59 for the general comments and 0.49 for the inline ones). Then, for each kind of comment we build a series of automatic classifiers that, depending on the goals of the further analysis, can be trained to achieve high precision (0.875 for the general comments and 0.615 for the inline ones), high recall (0.944 for the general comments and 0.988 for the inline ones), or substantial precision and recall (0.696 and 0.542 for the general comments and 0.434 and 0.583 for the inline ones, respectively). These results motivate further research on the impact of confusion on the code review process. Moreover, other researchers can employ the proposed classifiers to analyze confusion in other contexts where software development-related discussions occur, such as mailing lists.
机译:代码审查是确保源代码质量的重要机制。审阅者可以使用内联评论添加与整个变更或针对特定部分的特定部分的关注或缺点的一般性评论。最近的研究表明,审稿人往往不了解正在审查的变化及其上下文。我们的最终目标是确定混淆代码审阅者并了解困惑如何影响守则审查的效率和有效性(ER)的因素。作为实现这一目标的第一步,我们专注于识别开发人员评论中的混乱。基于现有的理论框架对混淆的表达式进行分类,我们手动对Android项目的代码审查进行手动分类800评论。我们观察到,人类可以合理地识别混乱:评估者实现温和的协议(Flyish'Kappa 0.59,为一般意见和Inline Oner的0.49)。然后,对于每种评论,我们构建了一系列自动分类器,根据进一步分析的目标,可以接受培训以实现高精度(对于一般评论为0.875,内联的0.615),高召回(0.944对于一般评论和内联的一般评论和0.988),或者分别为一般评论的大量精确和召回(0.696和0.542,为内联的内联召回0.434和0.583)。这些结果有助于进一步研究混淆对守则审查过程的影响。此外,其他研究人员可以采用所提出的分类器来分析在发生软件开发相关讨论的其他情况下的混淆,例如邮寄列表。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号