首页> 外文期刊>Computational Methods in Science and Technologygy >A New Method for Symbolic Sequences Analysis. An Application to Long Sequences
【24h】

A New Method for Symbolic Sequences Analysis. An Application to Long Sequences

机译:符号序列分析的新方法。长序列的应用

获取原文
           

摘要

The method for symbolic sequence decomposition into a set of consecutive, distinct, non-overlapping strings of?various lengths is proposed. Representation of the sequence as a set of words allows one to use set theory notions. The main?result is a quite new definition of the similarity between any two sequences over a given alphabet. No prior sequence?alignment is necessary. In the present paper two applications of a set of words are described. In the first a similarity measure?is applied to prepare centroids for K-means algorithm. It results in a high performance grouping method for long DNA?sequences. The other application concerns the statistical analysis of word attributes. It is shown that similarity, complexityand correlation function of word attributes across sequences of digits of fractional parts of some irrational numbers support?the suggestion that the sequences are instances of a random sequence of decimal digits.Supplementary material:data – irrational numbersdata – clustering
机译:提出了将符号序列分解为一组连续的,不同长度的,不重叠的字符串的方法。将序列表示为一组单词可以允许使用一组理论概念。主要结果是给定字母上任意两个序列之间相似性的全新定义。无需事先进行序列比对。在本文中,描述了一组单词的两个应用。首先,采用相似性度量来为K-means算法准备质心。它为长DNA序列提供了一种高性能的分组方法。另一个应用程序涉及单词属性的统计分析。研究表明,一些无理数的小数部分数字序列之间的单词属性具有相似性,复杂性和相关性,这表明该序列是十进制随机数序列的实例。补充资料:数据–无理数数据–聚类

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号