首页> 外文会议>IEEE International Congress on Big Data >Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis

【24h】

Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis

机译：通过频繁模式发现和特征相关性分析进行大规模异构程序检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to retrieve the right program from projects that are written in different languages and not well-developed. Prior work solved this problem by extracting words from programs, which cannot fully exploit the information of source code. In this paper, we propose a novel program retrieval method by extracting the frequent patterns and analyzing their correlations with accompanying text information. The experimental results on large-scale and heterogeneous datasets validate the effectiveness of our proposed approach. The inferred semantics of programs can significantly improve the accuracy of code artifact retrieval.

机译：在大数据时代，由于数据量的快速增长以及难以从大量的异构数据集中找到正确的信息，信息检索变得更具挑战性。尤其是在软件工程领域，从用不同语言编写且开发不完善的项目中检索正确的程序往往会更加困难。先前的工作通过从程序中提取单词来解决此问题，而这些单词无法充分利用源代码的信息。在本文中，我们提出了一种新颖的程序检索方法，即通过提取频繁模式并分析其与附带文本信息的相关性。在大规模和异构数据集上的实验结果验证了我们提出的方法的有效性。程序的语义可以显着提高代码工件检索的准确性。

著录项

来源
《IEEE International Congress on Big Data》|2014年|780-781|共2页
会议地点
作者
Liu Bo; Wu Liang; Dong Qiuxiang; Zhou Yuanchun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big data; Computational modeling; Correlation; Data mining; Feature extraction; Java; Semantics; Information retrieval; big data; data mining; semantics;

机译：大数据;计算建模;相关性数据挖掘;特征提取; Java;语义学信息检索;大数据;数据挖掘;语义学;

相似文献

外文文献
中文文献
专利

1. Genetic programming and frequent itemset mining to identify feature selection patterns of iEEG and fMRI epilepsy data [J] . Otis Smart, Lauren Burrell Engineering Applications of Artificial Intelligence . 2015,第mara期

机译：遗传程序设计和频繁项集挖掘以识别iEEG和fMRI癫痫数据的特征选择模式
2. Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval [J] . Hong ZHANG, Yan-yun WANG, Hong PAN, Journal of Zhejiang University. Science, A . 2008,第2期

机译：从异构特征中了解视觉听觉相关性，以进行跨媒体检索
3. Understanding visual-auditory correlation from heterogeneous features for cross-media retrieval [J] . Hong ZHANG, Yan-yun WANG, Hong PAN, Journal of Zhejiang University. Science, A . 2008,第2期

机译：了解与异构特征的视觉听觉相关性跨媒体检索
4. Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis [C] . Liu Bo, Wu Liang, Dong Qiuxiang, IEEE International Congress on Big Data . 2014

机译：通过频繁模式发现和特征相关分析来检索大规模的异构程序检索
5. Kaizen Programming with Enhanced Feature Discovery: An Automated Approach to Feature Selection and Feature Discovery for Prediction Models [D] . Stelmack, John. 2020

机译：Kaizen编程，具有增强功能发现：用于预测模型的特征选择和特征发现的自动方法
6. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data [O] . Otis Smart, Lauren Burrell -1

机译：遗传程序设计和频繁项集挖掘以识别iEEG和fMRI癫痫数据的特征选择模式
7. Genetic programming and frequent itemset mining to identify feature selection patterns of iEEG and fMRI epilepsy data [O] . Otis Smart, Lauren Burrell 2015

机译：遗传编程和频繁的项目集挖掘，以确定IEEG和FMRI癫痫数据的特征选择模式

Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis

摘要

著录项

相似文献

相关主题

期刊订阅