首页> 外文期刊>International journal of open source software & processes >Automatically Labelled Software Topic Model
【24h】

Automatically Labelled Software Topic Model

机译:自动标记的软件主题模型

获取原文
获取原文并翻译 | 示例
           

摘要

Public software repositories (SR) maintain a massive amount of valuable data offering opportunities to support software engineering (SE) tasks. Researchers have applied information retrieval techniques in mining software repositories. Topic models are one of these techniques. However, this technique does not give an interpretation nor labels to the extracted topics and it requires manual analysis to identify them. Some approaches were proposed to automatically label the topics using tags in SR, but they do not consider the existence of spam-tags and they have difficulties to scale to large tag space. This article introduces a novel approach called automatically labelled software topic model (AL-STM) that labels the topics based on observed tags in SR. It mitigates the shortcomings of manual and automatic labelling of topics in SE. AL-STM is implemented using 22K GitHub projects and evaluated in a SE task (tag recommending) against the currently used techniques. The empirical results suggest that AL-STM is more robust in terms of MAP and nDCG, and more scalable to large tag space.
机译:公共软件存储库(SR)维护着大量有价值的数据,为支持软件工程(SE)任务提供了机会。研究人员已在采矿软件存储库中应用了信息检索技术。主题模型是这些技术之一。但是,此技术无法对提取的主题进行解释或标记,它需要手动分析以识别它们。提出了一些方法来使用SR中的标签自动标记主题,但是它们不考虑垃圾邮件标签的存在,并且难以扩展到较大的标签空间。本文介绍了一种称为自动标记软件主题模型(AL-STM)的新颖方法,该方法基于SR中观察到的标记来标记主题。它减轻了SE中主题的手动和自动标记的缺点。 AL-STM使用22K GitHub项目实施,并根据当前使用的技术在SE任务(标记推荐)中进行了评估。实验结果表明,AL-STM在MAP和nDCG方面更强大,并且可扩展到较大的标签空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号