不均衡大数据集下的文本特征基因提取方法

孙晶涛; 张秋余

首页> 中文期刊> 《电子科技大学学报》 >不均衡大数据集下的文本特征基因提取方法

不均衡大数据集下的文本特征基因提取方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the cases of imbalance big datasets, the traditional feature processing method is biased to the large class and ignores the small class, which affects the classification performance. So a text feature gene extraction method is proposed in this paper. First of all, considering the feature selection impact of imbalance distribution of sample categorization, a feature selection method based on the CHI statistical matrix combined with information entropy is used to strengthen the characteristics of the small class. Secondly, based on the high order correlation of multidimensional statistical data, the method of text feature extraction is designed to enhance the generalization ability of feature item. Finally, the two methods are combined to construct a new method of text feature extraction under unbalanced large datasets. The experimental results show that the proposed method has a better performance in early maturity and feature dimension reduction, and is far superior to the common feature selection algorithm in the classification ability of small classes.%在不均衡大数据集情况下,传统特征处理方法偏重大类而忽略小类,影响分类性能.该文提出了一种文本特征基因提取方法.首先,基于样本类别分布不均衡对特征选择的影响,给出了一种结合信息熵的CHI统计矩阵特征选择方法,以强化小类的特征;然后,在探究多维统计数据高阶相关性的基础上,采取独立成分分析手段,设计了文本特征基因提取方法,用以增强特征项的泛化能力;最后,将这两种方法相融合,实现了在不均衡大数据集下的文本特征基因提取新方法.实验结果表明,所提方法具有较好的早熟性及特征降维能力,在小类的分类效果上优于常见特征选择算法.

著录项

来源
《电子科技大学学报》 |2018年第1期|125-131|共7页
作者
孙晶涛; 张秋余;
展开▼
作者单位

西安邮电大学计算机学院西安 710121;

兰州理工大学计算机与通信学院兰州 730050;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TN393.098;
关键词
CHI统计选择方法; 不均衡大数据集; 独立成分分析; 信息熵; 文本特征基因提取;

相似文献

中文文献
外文文献
专利

1. 深度学习视域下的文本特征提取方法分析 [J] . 聂维 ,刘小豫 ,康世英 . 中小企业管理与科技 . 2020,第025期
2. 一种高效可直接用于PCR扩增的不吸水链霉菌基因组DNA的提取方法 [J] . 吴红艳 ,陈飞 ,桓明辉 . 生物技术通报 . 2008,第004期
3. 风格约束下产品形态基因的提取方法及应用研究 [J] . 韩卫荣 ,余隋怀 ,戚彬 . 现代制造工程 . 2012,第005期
4. 海洋微藻活体及乙醇固定状态下基因组DNA的微量提取方法 [J] . 宋伦 ,周遵春 ,王年斌 . 应用与环境生物学报 . 2006,第5期
5. 在不同提取方法下蜜蜂基因组DNA浓度的比较 [J] . 吉挺 ,陈晶 ,潘瑞 . 中国蜂业 . 2005,第012期
6. Web文本特征提取方法的研究与发展 [C] . 庞景安 . 第十九届全国计算机信息管理学术研讨会 . 2005
7. 基于Spark的文本特征提取方法研究 [A] . 徐冠华 . 2018

不均衡大数据集下的文本特征基因提取方法

摘要

著录项

相似文献

相关主题

期刊订阅