首页> 中文期刊> 《软件学报》 >基于适应性分段估计的数据流相似性搜索

基于适应性分段估计的数据流相似性搜索

         

摘要

Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WF_(global) for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WF_(global) and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WF_(global) or LB_seg_WF in data stream, it is believed that lower-bound LB_WF_(global) (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.%相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WF_(global)和LB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WF_(global)和LB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WF_(global)或LB_seg_WF可能会出现连续失效的情况,分别提出了DTW的下限函数LB_WF_(global)(具有全局约束条件)和上限函数UB_WF、下限函数LB_WF(不具有全局约束条件).通过增量方式快速估计DTW,极大地减少了估计DTW的冗余计算量.通过理论分析和统计实验,验证了该方法的有效性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号