Comparing nearest-neighbour search strategies in the SMOTE algorithm

Jeffery Kam; Scott Dick

首页> 外文期刊>Canadian journal of electrical and computer engineering >Comparing nearest-neighbour search strategies in the SMOTE algorithm

【24h】

Comparing nearest-neighbour search strategies in the SMOTE algorithm

机译：在SMOTE算法中比较最近邻居搜索策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Les recherches par plus proche voisin sont des éléments clé du nouvel algorithme SMOTE pour le ré-échantillonnage des données fortement biaisées pour l'apprentissage automatique. Cependant, jusqu'à présent aucun travail n'a été réalisé pour déterminer la recherche du voisin la plus efficace et la plus évolutive pour être utilisée dans l'algorithme SMOTE. Cette question sera une considération principale si SMOTE va être déployé pour des ensembles de données très larges avec des biais élevés. Ce papier présente des investigations empiriques pour deux stratégies de recherche rapide du voisin, a savoir, les arbres K-D et la méthode box-assisted. On compare les temps nécessaires pour le suréchantillonnage avec SMOTE de trois ensembles de données bien connues, en utilisant ces deux stratégies. Ces temps sont aussi comparés an temps nécessaire pour une stratégie de recherche par force-brute séquentielle. En général, on trouve une amélioration d'un ordre de grandeur dans le temps d'exécution entre une recherche séquentielle et les arbres K-D, et une nouvelle améhoration d'un ordre de grandeur entre les arbres K-D et les recherches par la méthode box-assisted, ce qui indique que l'algorithme SMOTE est mieux balancé avec la croissance des ensembles de données quand on utilise une recherche box-assiaed. Cependant, il a été aussi observé que SMOTE est mieux balancé avec la croissance du degré de suréchantillonnage quand les arbres K-D sont utilisés comme stratégie de recherche du voisin.%Nearest-neighbour searches are a key element of the new SMOTE algorithm for resampling highly skewed datasets for machine learning. However, at present no work has been done to determine the most efficient and scalable neighbour search to use in the SMOTE algorithm. This question will be a key consideration if SMOTE is to be deployed in extremely large, highly skewed datasets. This paper reports on empirical investigations of two fast neighbour .search strategies, namely, K-D trees and box-assisted neighbour searches. The tunes required to oversample three well-known datasets with SMOTE using these two strategies are compared to each other as wen as to the time required for a brute-farce sequential search strategy. In general, an order-of-magnitude improvement in execution time is found from sequential search to K D trees, and a further order-of-magnitude improvement is found from K-D trees to box-assisted search, indicating that the SMOTE algorithm scales best with increasing dataset size when box-assisted search is used However, it is also observed mat SMOTE scales best with an increasing degree of oversampling when K-D trees are used as the neighbour search strategy.

机译：最近的邻居搜索是新的SMOTE算法的关键元素，该算法可重新采样高度偏差的数据以进行机器学习。但是，到目前为止，尚未进行任何工作来确定将在SMOTE算法中使用的最有效和可扩展的邻居搜索。如果要将SMOTE部署到具有高偏差的超大型数据集，则将是主要问题。本文提出了两种快速邻域搜索策略的实证研究，即K-D树和盒子辅助方法。我们使用这两种策略比较了三个著名数据集的SMOTE超采样所需的时间。还将这些时间与顺序暴力搜索策略所需的时间进行比较。通常，我们发现顺序搜索和KD树之间的执行时间有了一个数量级的改进，而KD树和box方法的搜索之间有了一个数量级的新改进。辅助，这表明当使用盒辅助搜索时，SMOTE算法与数据集的增长更好地保持平衡。但是，还可以观察到，当将KD树用作邻居搜索策略时，SMOTE与过采样程度的增加更好地保持了平衡。％最近邻居搜索是用于重采样高度偏斜的新SMOTE算法的关键要素机器学习的数据集。但是，目前尚未完成确定要在SMOTE算法中使用的最有效和可扩展的邻居搜索的工作。如果要将SMOTE部署在超大型，高度偏斜的数据集中，则此问题将是关键考虑因素。本文报告了两种快速邻居搜索策略的实证研究，即K-D树和框辅助邻居搜索。将使用这两种策略对三个著名的数据集进行SMOTE过度采样所需的曲调，与蛮力顺序搜索策略所需的时间进行比较。通常，从顺序搜索到KD树，发现执行时间得到了数量级的改进，从KD树到框辅助搜索，又得到了进一步的数量级的改进，这表明SMOTE算法的扩展效果最佳。当使用框辅助搜索时，数据集的大小会增加。但是，当将KD树用作邻居搜索策略时，也可以观察到mat SMOTE缩放比例最好，且过采样程度会增加。

著录项

来源
《Canadian journal of electrical and computer engineering》 |2006年第4期|p.203-210|共8页
作者
Jeffery Kam; Scott Dick;
展开▼
作者单位

Department of Electrical and Computer Engineering, University of Alberta, 2nd Floor, ECERF Building, Edmonton, Alberta T6G 2V4;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类电工技术;
关键词
classification; data mining; machine learning; nearest-neighbour; oversampling; performance testing; resampling; skewness;

机译：分类;数据挖掘;机器学习;近邻;过采样;性能测试;重采样;偏度;

相似文献

外文文献
中文文献
专利

1. Reactive Search strategies using Reinforcement Learning, local search algorithms and Variable Neighborhood Search [J] . Joao Paulo Queiroz dos Santos, Jorge Dantas de Melo, Adriao Doria Duarte Neto, Expert Systems with Application . 2014,第10期

机译：使用强化学习，本地搜索算法和可变邻域搜索的被动搜索策略
2. Adaptive switching gravitational search algorithm: an attempt to improve diversity of gravitational search algorithm through its iteration strategy [J] . NOR AZLINA AB AZIZ, ZUWAIRIE IBRAHIM, MARIZAN MUBIN, Sadhana . 2017,第7期

机译：自适应切换重力搜索算法：通过迭代策略提高重力搜索算法多样性的尝试
3. Adaptive switching gravitational search algorithm: an attempt to improve diversity of gravitational search algorithm through its iteration strategy [J] . Ab Aziz Nor Azlina, Ibrahim Zuwairie, Mubin Marizan, Sadhana: Academy Proceedings in Engineering Science . 2017,第7期

机译：自适应切换重力搜索算法：通过其迭代策略来提高重力搜索算法的分集
4. Comparing HMM, LDA, SVM and Smote-SVM Algorithms in Classifying Human Activities [C] . Mhamed Bilal Abidine, Belkacem Fergani Mediterranean conference on information communication technologies . 2015

机译：在人类活动分类中比较HMM，LDA，SVM和Smote-SVM算法
5. Comparing AI Search Algorithms and Their Efficiency When Applied to Path Finding Problems. [D] . Kose, Erdal. 2012

机译：当应用于路径查找问题时，比较AI搜索算法及其效率。
6. Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms [O] . Bryar A. Hassan, Tarik A. Rashid 2020

机译：回溯搜索优化算法与其对应算法相比的统计分析和性能评估数据集
7. Comparing Linear Search and Binary Search Algorithms to Search an Element from a Linear List Implemented through Static Array, Dynamic Array and Linked List [O] . Vimal P. Parmar, Ck Kumbharana Phd, Head Guide 2015

机译：比较线性搜索和二进制搜索算法从通过静态数组，动态数组和链接列表实现的线性列表中搜索元素
8. Comparing Evolutionary Programs and Evolutionary Pattern Search Algorithms: A Drug Docking Application [R] . Hart, W. E. 1999

机译：比较进化程序和进化模式搜索算法：药物对接应用程序

Comparing nearest-neighbour search strategies in the SMOTE algorithm

摘要

著录项

相似文献

相关主题

期刊订阅