Bulk construction of dynamic clustered metric trees

Lior Aronovich; Israel Spiegler

首页> 外文期刊>Knowledge and Information Systems >Bulk construction of dynamic clustered metric trees

【24h】

Bulk construction of dynamic clustered metric trees

机译：动态聚类度量树的批量构建

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Repositories of complex data types, such as images, audio, video and free text, are becoming increasingly frequent in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. An important class of access methods for similarity search in metric data is that of dynamic clustered metric trees, where the index is structured as a paged and balanced tree and the space is partitioned hierarchically into compact regions. While access methods of this class allow dynamic insertions typically of single objects, the problem of efficiently inserting a given data set into the index in bulk is largely open. In this article we address this problem and propose novel algorithms corresponding to its two cases, where the index is initially empty (i.e. bulk loading), and where the index is initially non empty (i.e. bulk insertion). The proposed bulk loading algorithm builds the index bottom-up layer by layer, using a new sampling based clustering method, which improves clustering results by improving the quality of the selected sample sets. The proposed bulk insertion algorithm employs the bulk loading algorithm to load the given data into a new index structure, and then merges the new and the existing structures into a unified high quality index, using a novel decomposition method to reduce overlaps between the structures. Both algorithms yield significantly improved construction and search performance, and are applicable to all dynamic clustered metric trees. Results from an extensive experimental study show that the proposed algorithms outperform alternative methods, reducing construction costs by up to 47% for CPU costs and 99% for I/O costs, and search costs by up to 48% for CPU costs and 30% for I/O costs.

机译：在各个领域，诸如图像，音频，视频和自由文本之类的复杂数据类型的存储库变得越来越频繁。对于此类数据类型的通用搜索方法是相似性搜索，其中搜索是针对相似对象，并且相似性是通过度量距离函数建模的。在度量数据中进行相似性搜索的一类重要的访问方法是动态聚簇的度量树，其中索引被构造为分页和平衡的树，并且空间被分层划分为紧凑区域。虽然此类的访问方法通常允许单个对象的动态插入，但将给定数据集有效地批量插入索引的问题在很大程度上尚待解决。在本文中，我们解决了这个问题并提出了与它的两种情况相对应的新颖算法：索引最初为空（即批量加载），索引最初为非空（即批量插入）。提出的批量加载算法使用基于抽样的新聚类方法逐层构建索引自底向上，从而通过提高所选样本集的质量来改善聚类结果。提出的批量插入算法采用批量加载算法将给定的数据加载到新的索引结构中，然后使用一种新颖的分解方法来减少结构之间的重叠，从而将新结构和现有结构合并为统一的高质量索引。两种算法都可以显着改善构造和搜索性能，并且适用于所有动态集群度量树。一项广泛的实验研究结果表明，所提出的算法优于其他方法，可将CPU成本和I / O成本分别降低47％和99％，将CPU成本和48％的搜索成本分别降低48％和30％。 I / O成本。

著录项

来源
《Knowledge and Information Systems》 |2010年第2期|p.211-244|共34页
作者
Lior Aronovich; Israel Spiegler;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Metric access methods; Bulk loading; Bulk insertion; Indexing methods; Metric spaces; Similarity search;

机译：度量存取方法;散装;批量插入;索引方法;度量空间;相似搜索;

相似文献

外文文献
中文文献
专利

1. Bulk construction of dynamic clustered metric trees [J] . Lior Aronovich, Israel Spiegler Knowledge and information systems . 2010,第2期

机译：动态聚类度量树的批量构建
2. CM-tree: A dynamic clustered index for similarity search in metric databases [J] . Lior Aronovich, Israel Spiegler Data & Knowledge Engineering . 2007,第3期

机译：CM-tree：用于度量数据库中相似性搜索的动态聚簇索引
3. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R [J] . Langfelder P, Zhang B, Horvath S Bioinformatics . 2008,第5期

机译：从分层集群树定义集群：R的Dynamic Tree Cut包
4. Parallel M-tree Based on Declustering Metric Objects using K-medoids Clustering [C] . Chu Qiu, Yongquan Lu, Pengdong Gao, Ninth International Symposium on Distributed Computing and Applications to Business Engineering and Science . 2010

机译：基于K-medoids聚类对度量对象进行聚类的并行M树
5. A new implementation of clustering algorithm and its application in net-tree construction algorithm. [D] . Dai, Yan. 2009

机译：聚类算法的一种新实现及其在网络树构建算法中的应用。
6. Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering [O] . Matilde Trevisani, Arjuna Tuzzi 2018

机译：时序语料库曲线聚类：从科学语料库构建到通过单词生命周期聚类的知识动力学发现
7. Bulk insertion for r-trees by seeded clustering [O] . Taewon Lee, Bongki Moon, Sukho Lee 2006

机译：通过种子聚类批量插入r树

Bulk construction of dynamic clustered metric trees

摘要

著录项

相似文献

相关主题

期刊订阅