Removing Noise, Reducing dimension, and Weighting Distance to Enhance k-Nearest Neighbors for Diabetes Classification

机译：去除噪声，减小尺寸和加权距离，以增强K-Collect邻居进行糖尿病分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Various methods of machine learning have been implemented in the medical field to classify various diseases, such as diabetes. The k-nearest neighbors (KNN) is one of the most known approaches for predicting diabetes. Many researchers have found by combining KNN with one or more other algorithms may provide a better result. In this paper, a combination of three procedures, removing noise, reducing the dimension, and weighting distance, is proposed to improve a standard voting-based KNN to classify Pima Indians Diabetes Dataset (PIDD) into two classes. First, the noises in the training set are removed using k-means clustering (KMC) to make the voter data in both classes more competent. Second, its dimensional is then reduced to decrease the intra-class data distances but increase the inter-class ones. Two methods of dimensional reduction: principal component analysis (PCA) and autoencoder (AE), are applied to investigate the linearity of the dataset. Since there is an imbalance on the dataset, a proportional weight is incorporated into the distance formula to get the fairness of the voting. A 5-fold cross validation-based evaluation shows that each proposed procedure works very well in enhancing the KNN. KMC is capable of increasing the accuracy of KNN from 81.6% to 86.7%. Combining KMC and PCA improves the KNN accuracy to be 90.9%. Next, a combination of KMC and AE enhances the KNN to gives an accuracy of 97.8%. Combining three proposed procedures of KMC, PCA, and Weighted KNN (WKNN) increases the accuracy to be 94.5%. Finally, the combination of KMC, AE, and WKNN reaches the highest accuracy of 98.3%. The facts that AE produces higher accuracies than PCA inform that the features in the dataset have a high non-linearity.

机译：在医学领域中实施了各种机器学习方法，以分类各种疾病，例如糖尿病。 K-最近邻居（KNN）是预测糖尿病最着名的方法之一。通过将KNN与一个或多个其他算法组合来发现许多研究人员可以提供更好的结果。在本文中，提出了三个程序，消除噪声，减小维度和加权距离的组合，以改善基于标准的投票的KNN，以将PIMA印第安人糖尿病数据集（PIDD）分类为两类。首先，使用k-means群集（kmc）删除训练集中的噪声，以使两个类别中的选民数据更有能力。其次，其尺寸减少以减少类内的数据距离，但增加了帧间间数据距离。两种维度减少方法：主要成分分析（PCA）和AutoEncoder（AE），用于研究数据集的线性。由于数据集存在不平衡，因此将比例重量结合到距离公式中以获得投票的公平性。基于5倍的交叉验证的评估表明，每个建议的程序在增强knn方面都非常好。 KMC能够将KNN的准确性从81.6％增加到86.7％。结合KMC和PCA改善了KNN精度为90.9％。接下来，kmc和ae的组合增强了knn，给出了97.8％的准确度。结合了三种拟议的KMC，PCA和加权KNN（WKNN）的程序将准确性提高为94.5％。最后，KMC，AE和WKNN的组合达到了98.3％的最高精度。 AE比PCA产生更高的精度的事实通知数据集中的功能具有高线性度。

著录项

来源
《International Seminar on Research of Information Technology and Intelligent Systems》|2020年|471-475|共5页
会议地点
作者
Syifa Khairunnisa; Suyanto Suyanto; Prasti Eko Yunanto;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Diabetes; Principal component analysis; Testing; Training; Data models; Training data; Prediction algorithms;

机译：糖尿病;主成分分析;测试;培训;数据模型;培训数据;预测算法;

相似文献

外文文献
中文文献
专利

1. Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification [J] . Nenad Tomasev, Dunja Mladenic Knowledge and information systems . 2014,第1期

机译：高维k最近邻分类的感知度共享邻居距离
2. Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space [J] . Heo Jae-Pil, Lin Zhe, Yoon Sung-Eui IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第9期

机译：高维空间中近似K最近邻搜索的距离编码乘积量化
3. CLASSIFICATION IN GENERAL FINITE DIMENSIONAL SPACES WITH THE k-NEAREST NEIGHBOR RULE [J] . Gadat Sebastien, Klein Thierry, Marteau Clement The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2016,第3期

机译：用k-近邻规则进行广义有限维分类。
4. Chapter 18 Frog Identification System Based on Local Means K-Nearest Neighbors with Fuzzy Distance Weighting [C] . Haryati Jaafar, Dzati Athiar Ramli, Bakhtiar Affendi Rosdi, International conference on robotic, vision, signal processing and power applications . 2014

机译：第十八章基于局部均值K最近邻的模糊距离加权青蛙识别系统
5. Randomized and Evolutionary Approaches to Dataset Characterization, Feature Weighting, and Sampling in K-Nearest Neighbors [D] . Basak, Suryoday. 2020

机译：基于数据集特征的随机和进化方法，具有在k离邻居中的采样和抽样
6. The distance function effect on k-nearest neighbor classification for medical datasets [O] . Li-Yu Hu, Min-Wei Huang, Shih-Wen Ke, -1

机译：距离函数对医学数据集的k近邻分类的影响
7. An Improved K-Nearest Neighbors Approach Using Modified Term Weighting And Similarity Coefficient For Text Classification [O] . Kadhim Ammar Ismael 2016

机译：改进的K最近邻方法，使用改进的术语加权和相似系数进行文本分类

Removing Noise, Reducing dimension, and Weighting Distance to Enhance k-Nearest Neighbors for Diabetes Classification

摘要

著录项

相似文献

相关主题

期刊订阅