首页> 外文会议>IEEE International Congress on Big Data >Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis
【24h】

Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis

机译:通过频繁模式发现和特征相关性分析进行大规模异构程序检索

获取原文

摘要

In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to retrieve the right program from projects that are written in different languages and not well-developed. Prior work solved this problem by extracting words from programs, which cannot fully exploit the information of source code. In this paper, we propose a novel program retrieval method by extracting the frequent patterns and analyzing their correlations with accompanying text information. The experimental results on large-scale and heterogeneous datasets validate the effectiveness of our proposed approach. The inferred semantics of programs can significantly improve the accuracy of code artifact retrieval.
机译:在大数据时代,由于数据量的快速增长以及难以从大量的异构数据集中找到正确的信息,信息检索变得更具挑战性。尤其是在软件工程领域,从用不同语言编写且开发不完善的项目中检索正确的程序往往会更加困难。先前的工作通过从程序中提取单词来解决此问题,而这些单词无法充分利用源代码的信息。在本文中,我们提出了一种新颖的程序检索方法,即通过提取频繁模式并分析其与附带文本信息的相关性。在大规模和异构数据集上的实验结果验证了我们提出的方法的有效性。程序的语义可以显着提高代码工件检索的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号