首页> 外国专利> Adaptive similarity searching in sequence databases

Adaptive similarity searching in sequence databases

机译：序列数据库中的自适应相似性搜索

页面导航

摘要
著录项
相似文献

摘要

A computer system and method for performing similarity searches which is phase and scale insensitive and which allows similarity searches to be performed at a semantic level. Each sequence in a database is preferably segmented at multiple projections and/or resolution levels. The sequences may represent object having multi- dimensional features such as temporal and/or spatial-temporal data. Preferably, the segmenting logic starts with the finest resolution, and each sequence is parsed into a number of disjointed segments, wherein each segment has uniform features. The uniform features could be segments having a constant slope, or waveform segments representable by a single function. The segments may then be re- sampled into a fixed length vector with appropriate normalization. A label may also be assigned to each segment via conventional clustering/classification methods. The above steps are iterated at successive projections and/or resolution levels until each sequence in the database has been independently segmented and clustered. Thus, the labels are preferably extracted in a pseudo-hierarchical manner in which the label of the lowest resolution representation of the sequence is extracted first. The representation of each time series at various resolutions and/or projections captures different characteristics of the same time series (or 2D/3D objects). Recall that each segment represents a region having uniform features. The segmentation at each individual resolution and/or projection thus enables recognition or emphasis of different characteristics within segments having uniform features.

机译：一种用于执行相似性搜索的计算机系统和方法，该计算机系统和方法对相位和比例不敏感，并且允许在语义级别执行相似性搜索。数据库中的每个序列优选地在多个投影和/或分辨率级别上被分段。序列可以表示具有多维特征的对象，例如时间和/或时空数据。优选地，分段逻辑以最精细的分辨率开始，并且每个序列被解析为多个不相交的分段，其中每个分段具有统一的特征。统一特征可以是具有恒定斜率的段，也可以是可由单个函数表示的波形段。然后可以通过适当的归一化将片段重新采样到固定长度的向量中。也可以通过常规的聚类/分类方法将标签分配给每个段。在连续的投影和/或分辨率级别上重复执行上述步骤，直到数据库中的每个序列都已独立分段和聚类为止。因此，优选地以伪分层方式提取标记，其中首先提取序列的最低分辨率表示的标记。每个时间序列在各种分辨率和/或投影下的表示都捕获了相同时间序列（或2D / 3D对象）的不同特征。回想一下，每个段代表一个具有统一特征的区域。因此，在每个单独的分辨率和/或投影处的分割使得能够在具有统一特征的段内识别或强调不同特征。

著录项

公开/公告号US5940825A

专利类型
公开/公告日1999-08-17

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US19960726889
发明设计人 PHILIP SHI-LUNG YU;CHUNG-SHENG LI;VITTORIO CASTELLI;
展开▼

申请日1996-10-04
分类号G06F17/30;
国家 US
入库时间 2022-08-22 02:07:28

相似文献

专利
外文文献
中文文献