基于分层抽样的k近邻分类加速算法

宋云胜; 梁吉业

首页> 中文期刊> 《数据采集与处理》 >基于分层抽样的k近邻分类加速算法

基于分层抽样的k近邻分类加速算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

k近邻(k nearest neighbor,kNN)分类作为数据挖掘中最典型的算法之一,以较高的泛化性能以及充足的理论基础被广泛应用.然而kNN在测试时需要计算待识别实例与所有训练实例之间的距离,以至于在面对大规模数据时需要大量的时间.为此,提出一种基于分层抽样的kN N加速算法(KNN based on stratified sampling,SS-kNN).首先将训练实例所在的空间划分为若干个实例个数相等的区域,然后从每个区域内抽取实例,最后判定待识别实例落入划分区域中的哪一个,并从此区域以及相邻区域抽取的实例中寻找其k个近邻.与原始kN N算法以及基于随机抽样的kN N算法相比,SS-kN N算法可以获得与其相近分类精度,但将其运行速度分别提高大约399倍和16倍.%k nearest neighbor (kNN) ,which is one of the most typical data mining algorithms ,is widely applied in various areas due to its better generation ability and sufficient theory results .The method needs to compute the distances between the test instances and all the training instances during executing prediction .However ,it costs substantial time as facing the large-scale data .To solve the problem ,we propose an acceleration algorithm for k nearest neighbor classification based on stratified sampling (SS-kNN) .In the method ,SS-kNN firstly divides the instance space into several subranges with the same number of instances ,and then samples instances from each subrange ,finally judges which subrange the test instance sit and finds its nearest neighbors from this subrange .Compared with kNN and its variant based on the random sampling ,SS-kNN could not only obtain the similar classification accuracy ,but also accelerates the running time by an average of 399 and 16 times respectively .

著录项

来源
《数据采集与处理》 |2017年第6期|1153-1162|共10页
作者
宋云胜; 梁吉业;
展开▼
作者单位

山西大学计算机与信息技术学院,太原,030006;

山西大学计算机与信息技术学院,太原,030006;

山西大学计算智能与中文信息处理教育部重点实验室,太原, 030006;

展开▼
原文格式 PDF
正文语种 chi
中图分类自动推理、机器学习;
关键词
分层抽样; 数据划分; 近邻; 分类精度; 运行时间;

相似文献

中文文献
外文文献
专利

1. 基于K近邻非线性分类器的高光谱遥感数据分类研究 [J] . 莫文通 ,周源 . 城市勘测 . 2014,第004期
2. 基于共空间模式和K近邻分类器的脑-机接口信号分类方法 [J] . 叶柠 ,孙宇舸 ,王旭 . 东北大学学报（自然科学版） . 2009,第008期
3. 基于数据划分的k-近邻分类加速算法机理分析 [J] . 宋云胜 ,王杰 ,梁吉业 . 中国科学技术大学学报 . 2018,第004期
4. 基于分层抽样的不均衡数据集成分类 [J] . 王馨月 ,景丽萍 . 深圳大学学报（理工版） . 2019,第001期
5. 基于分层抽样的垃圾减量分类效果研究 [J] . 赵天梅 ,朱家明 ,张娟娟 . 衡水学院学报 . 2014,第004期
6. 基于核距离学习的K近邻分类 [C] . 朱鹏飞 ,胡清华 . 第九届中国Rough集与软计算、第三届中国Web智能、第三届中国粒计算联合会议（CRSSC-CWI-CGrC’2009） . 2009
7. 基于萤火虫算法和改进K近邻的文本分类研究 [A] . 赵成 . 2020

基于分层抽样的k近邻分类加速算法

摘要

著录项

相似文献

相关主题

期刊订阅