A HYBRID METHOD OF FEATURE EXTRACTION AND NAIVE BAYES CLASSIFICATION FOR SPLITTING IDENTIFIERS

NAHLA ALANEE; MASRAH AZRIFAH AZMI MURAD

首页> 外文期刊>Journal of Theoretical and Applied Information Technology >A HYBRID METHOD OF FEATURE EXTRACTION AND NAIVE BAYES CLASSIFICATION FOR SPLITTING IDENTIFIERS

【24h】

A HYBRID METHOD OF FEATURE EXTRACTION AND NAIVE BAYES CLASSIFICATION FOR SPLITTING IDENTIFIERS

机译：分离标识符的特征提取和朴素贝叶斯分类的混合方法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays integrating natural language processing techniques on software systems has caught many researchers attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specific portions of the source code. One of the most important information lies on such source code is the identifiers (e.g. Student). Unlike the traditional text processing, the identifiers in the source code is formed as multi-word such as Employee-Name. Such multi-words are not divided using white space, instead it can be formed using special characters (e.g. Employee_ID), CamelCase (e.g. EmployeeName) or using abbreviations (e.g. EmpNm). This makes the process of extracting such identifiers more challenging. Several approaches have been performed to resolve the problem of splitting multi-word identifiers. However, there is still room for improvement in terms of accuracy. Such improvement can be represented by utilizing more robust features that have the ability to analyses the morphology of identifiers. Therefore, this study aims to propose a hybrid method of feature extraction and Na?ve Bayes classifier in order to separate multi-word identifiers within source code. The dataset that has been used in this study is a benchmark-annotated data that contains large number of Java codes. Multiple experiments have been conducted in order to evaluate the proposed features independently and with combinations. Results shown that the combination of all features have obtained the best accuracy by achieving 64.7% of f-measure. Such finding implies the usefulness of the proposed features in terms of discriminating multi-word identifiers.

机译：如今，将自然语言处理技术集成到软件系统中已引起了许多研究人员的关注。为了获得有意义的信息，可以通过分析源代码的形态来表示这种集成。功能位置是识别源代码特定部分的过程。此类源代码中最重要的信息之一就是标识符（例如Student）。与传统的文本处理不同，源代码中的标识符形成为多词，例如Employee-Name。此类多字不使用空格进行分隔，而是可以使用特殊字符（例如Employee_ID），CamelCase（例如EmployeeName）或缩写（例如EmpNm）形成。这使得提取这样的标识符的过程更具挑战性。已经执行了几种方法来解决分割多词标识符的问题。但是，在准确性方面仍有改进的空间。可以通过利用更强大的功能来表示这种改进，这些功能可以分析标识符的形态。因此，本研究旨在提出一种特征提取与朴素贝叶斯分类器的混合方法，以在源代码中分离多词标识符。本研究中使用的数据集是包含大量Java代码的基准注释数据。为了独立地和组合地评估所提出的特征，已经进行了多次实验。结果表明，所有特征的组合通过达到f.measure的64.7％获得了最佳精度。这样的发现暗示了所提出的特征在区分多词标识符方面的有用性。

著录项

来源
《Journal of Theoretical and Applied Information Technology》 |2017年第7期|共页
作者
NAHLA ALANEE; MASRAH AZRIFAH AZMI MURAD;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Classification Spam Email with Elimination of Unsuitable Features with Hybrid of GA-Naive Bayes [J] . O. M. E. Ebadati, F. Ahmadzadeh Journal of information & knowledge management . 2019,第1期

机译：分类垃圾邮件通过消除与Ga-Naive Bayes的混合体消除不合适的功能
2. Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification [J] . LongjunDong, XibingLi, GongnanXie Abstract and applied analysis . 2014,第8期

机译：使用随机林，支持向量机和天真贝叶斯分类识别地震事件和核爆炸的非线性方法
3. Feature Selection using Particle Swarm Optimization Algorithm in Student Graduation Classification with Naive Bayes Method [J] . Evi Purnamasari, Dian Palupi Rini, Sukemi Jurnal RESTI: Rekayasa Sistem dan Teknologi Informasi . 2020,第3期

机译：使用幼稚贝叶斯方法学生毕业分类中的粒子群优化算法的特征选择
4. Papaya Fruit Type Classification using LBP Features Extraction and Naive Bayes Classifier [C] . Christy Atika Sari, Indah Puspa Sari, Eko Hari Rachmawanto, International Seminar on Application for Technology of Information and Communication . 2020

机译：使用LBP特征提取和朴素贝叶斯分类器进行木瓜果实类型分类
5. Naive Bayes and similarity based methods for identifying computer users using keystroke patterns. [D] . Joshi, Shrijit S. 2009

机译：朴素贝叶斯和基于相似度的使用击键模式识别计算机用户的方法。
6. Feature Extraction and Classification Methods for Hybrid fNIRS-EEG Brain-Computer Interfaces [O] . Keum-Shik Hong, M. Jawad Khan, Melissa J. Hong 2018

机译：混合fNIRS-EEG脑机接口的特征提取和分类方法
7. Robust Method of Sparse Feature Selection for Multi-Label Classification with Naive Bayes [O] . Dymitr Ruta 2015

机译：基于朴素贝叶斯的多标签分类稀疏特征选择的鲁棒方法

A HYBRID METHOD OF FEATURE EXTRACTION AND NAIVE BAYES CLASSIFICATION FOR SPLITTING IDENTIFIERS

摘要

著录项

相似文献

相关主题

期刊订阅