首页> 外文会议>Knowledge-Based Systems for Safety Critical Applications >Similarity search in sets and categorical data using the signature tree
【24h】

Similarity search in sets and categorical data using the signature tree

机译:使用签名树在集合和分类数据中进行相似性搜索

获取原文
获取原文并翻译 | 示例

摘要

Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods/or multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.
机译:数据挖掘应用程序分析集合数据和高维分类数据的大量集合。对这些数据类型的搜索不仅限于挖掘关联规则和分类的经典问题,而且相似性搜索也是一种经常使用的操作。访问方法/或多维数值数据不适用于此问题,需要专门的索引。我们提出了一种将集合数据表示为位图(签名)并将其组织为层次结构索引的方法,适用于相似性搜索和其他相关查询类型。与先前的技术相比,签名树是动态的,并且不依赖于硬连线常数。对合成数据集和真实数据集进行的实验表明,它对不同的数据特​​征具有鲁棒性,可扩展到数据库大小,并且对于各种查询都是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号