首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >Bug or Not? Bug Report Classification Using N-Gram IDF
【24h】

Bug or Not? Bug Report Classification Using N-Gram IDF

机译:bug还是不是?使用n-gram iDF的错误报告分类

获取原文

摘要

Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks.With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.
机译:以前的研究发现,错误和非营收之间的大量错误报告被错误分类,并且手动分类错误报告是耗时的任务。为了解决这个问题,我们提出了一个错误报告使用n-gram IDF的分类模型,是用于处理任何长度的单词和短语的逆文档频率(IDF)的理论扩展。 n-gram iDF使我们能够从文本中提取任何长度的关键项,这些关键项可以用作分类错误报告的功能。我们使用N-Gram IDF和主题建模的功能构建具有Logistic回归和随机森林的分类模型,这些功能在各种软件工程任务中广泛使用。我们的结果表明我们的N-Gram IDF的模型具有比基于主题的模型在所有评估病例上的性能优越。我们的模型显示有希望的结果,并有可能扩展到其他软件工程任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号