Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Lensen Andrew; Xue Bing; Zhang Mengjie

首页> 外文期刊>Evolutionary computation >Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

【24h】

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

机译：用于演化相似性群体的遗传编程：表示和分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.

机译：聚类是一种困难而广泛研究的数据挖掘任务，具有许多各种聚类算法在文献中提出。几乎所有算法都使用相似度量，例如距离度量（例如，欧几里德距离），以确定将哪个实例分配给同一群集。这些相似度测量通常是预定义的，并且不能容易地定制到特定数据集的特性，这导致质量的限制和产生的集群的解释性。在本文中，我们提出了一种新方法来通过使用遗传编程来自动不断地发展给定聚类算法的相似性功能。我们介绍了一种新的基于基于遗传编程的方法，它自动选择了一个小的特征子集（特征选择），然后使用各种功能（特征构造）组合，从而产生专门为给定数据集设计的动态和灵活的相似性功能。我们演示了如何使用基于图形的表示来执行群集的进化相似性函数。在一系列大的高维数据集中的各种实验结果表明，所提出的方法可以实现比基准方法更高，更一致的性能。我们进一步扩展了所提出的方法来通过使用多棵树方法自动产生多个互补相似性功能，这提供了进一步的性能改进。我们还分析了自动进化相似性的可解释性和结构，以了解如何以及为什么优于标准距离指标。

著录项

来源
《Evolutionary computation》 |2020年第4期|531-561|共31页
作者
Lensen Andrew; Xue Bing; Zhang Mengjie;
展开▼
作者单位

Victoria Univ Wellington Evolutionary Computat Res Grp Wellington 6140 New Zealand;

Victoria Univ Wellington Evolutionary Computat Res Grp Wellington 6140 New Zealand;

Victoria Univ Wellington Evolutionary Computat Res Grp Wellington 6140 New Zealand;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cluster analysis; automatic clustering; genetic programming; similarity function; feature selection; feature construction;

机译：集群分析;自动聚类;遗传编程;相似函数;特征选择;特征结构;

相似文献

外文文献
中文文献
专利

1. Evolving meaning: Using genetic Programming to learn similarity perspectives for mining biomedical data [J] . Sousa Rita, Silva Sara, Pesquita Catia European journal of clinical investigation . 2019,第S1期

机译：不断发展的含义：使用基因编程来学习采矿生物医学数据的相似性视角
2. Genetic Programming With a New Representation to Automatically Learn Features and Evolve Ensembles for Image Classification [J] . Bi Ying, Xue Bing, Zhang Mengjie Cybernetics, IEEE Transactions on . 2021,第4期

机译：具有新的代表性的遗传编程，可自动学习功能和演化集合以获得图像分类
3. Automatically Evolving Texture Image Descriptors Using the Multitree Representation in Genetic Programming Using Few Instances [J] . Al-Sahaf Harith, Al-Sahaf Ausama, Xue Bing, Evolutionary computation . 2021,第3期

机译：使用少数实例使用遗传编程中的多点表示自动演化纹理图像描述符
4. Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms [C] . Hayden Andersen, Andrew Lensen, Bing Xue IEEE Congress on Evolutionary Computation . 2021

机译：用于演化对聚类算法的相似性功能的遗传编程
5. Functional analysis of genes using the gene ontology: Gene similarity, clustering, and classification. [D] . Nagar, Anurag. 2008

机译：使用基因本体对基因进行功能分析：基因相似性，聚类和分类。
6. Genomewide Analysis of Aryl Hydrocarbon Receptor Binding Targets Reveals an Extensive Array of Gene Clusters that Control Morphogenetic and Developmental Programs [O] . Maureen A. Sartor, Michael Schnekenburger, Jennifer L. Marlowe, 2009

机译：全基因组分析的芳烃受体结合目标揭示了一系列广泛的基因簇控制形态发生和发展计划。
7. Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis [O] . Andrew Lensen, Bing Xue, Mengjie Zhang 2020

机译：用于演化相似性群体的遗传编程：表示和分析
8. A NEW MEASURE OF BIOTIC SIMILARITY BETWEEN SAMPLES AND ITS APPLICATIONS WITH A CLUSTER ANALYSIS PROGRAM [R] . Carlos F. A. Pinkham 1974

机译：利用聚类分析程序测量样品间的生物相似性及其应用的新方法

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

摘要

著录项

相似文献

相关主题

期刊订阅