首页> 外文会议>International Conference on Web Research >Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches
【24h】

Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

机译:使用基于信息检索的方法与基于机器学习的方法进行自动重复错误报告检测

获取原文

摘要

Nowadays, there are many software repositories, especially on the web, which have many challenges to be automated. Duplicate bug report detection (DBRD) is an excellent problem of software triage systems like Bugzilla since 2004 as an essential online software repository. There are two main approaches for automatic DBRD, including information retrieval (IR)-based and machine learning (ML)-based. Many related works are using both approaches, but it is not clear which one is more useful and has better performance. This study focuses on introducing a methodology for comparing the validation performance of both approaches in a particular condition. The Android dataset is used for evaluation, and about 2 million pairs of bug reports are analyzed for 59 bug reports, which were duplicate. The results show that the ML-based approach has better validation performance, incredibly about 40%. Besides, the ML-based approach has a more reliable criterion for evaluation like accuracy, precision, and recall versus an IR-based approach, which has just mean average precision (MAP) or rank metrics.
机译:如今,有许多软件存储库,尤其是在网络上,存在许多需要自动化的挑战。自2004年以来,重复错误报告检测(DBRD)是像Bugzilla这样的软件分类系统的一个重要问题,它是必不可少的在线软件存储库。自动DBRD有两种主要方法,包括基于信息检索(IR)和基于机器学习(ML)的方法。许多相关的工作都在使用这两种方法,但是尚不清楚哪种方法更有用且性能更好。这项研究的重点是介绍一种在特定条件下比较两种方法的验证性能的方法。使用Android数据集进行评估,并分析了大约200万对错误报告,以查找59个重复的错误报告。结果表明,基于ML的方法具有更好的验证性能,令人难以置信的约为40%。此外,与基于IR的方法相比,基于ML的方法具有更可靠的评估标准,例如准确性,准确性和召回率,而IR的方法仅具有平均精度(MAP)或等级度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号