首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Boa Meets Python: A Boa Dataset of Data Science Software in Python Language
【24h】

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language

机译:蟒蛇遇见Python:Python语言中的数据科学软件的蟒蛇数据集

获取原文

摘要

The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks. By analyzing the metadata and code, we have included the projects in our dataset which use a diverse set of machine learning libraries and managed by a variety of users and organizations. The dataset is made publicly available through Boa infrastructure both as a collection of raw projects as well as in a processed form that could be used for performing large scale analysis using Boa language. We also present two initial applications to demonstrate the potential of the dataset that could be leveraged by the community.
机译:近年来,由于Python编程语言在数据科学中的使用越来越广泛,因此其流行度迅速提高。 Github中Python存储库的可用性为挖掘软件存储库研究提供了机会,例如,提出开发数据科学应用程序的最佳实践,确定错误模式,推荐代码增强等。为实现此研究,我们创建了一个新的数据集其中包括1,558个成熟的Github项目,这些项目开发了用于数据科学任务的Python软件。通过分析元数据和代码,我们将项目包含在我们的数据集中,这些项目使用各种机器学习库,并由各种用户和组织进行管理。该数据集既可以作为原始项目的集合,也可以通过处理后的形式通过Boa基础结构公开使用,该处理后的形式可以用于使用Boa语言执行大规模分析。我们还介绍了两个初始应用程序,以演示社区可以利用的数据集的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号