首页> 外文学位 >Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams.
【24h】

Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams.

机译:图像/时间序列挖掘算法:在发展生物学,文档处理和数据流中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications to develop-mental biology, historical manuscript processing, and data stream processing. Inspired by the NSF IGERT program, this dissertation presents algorithms for analysis of growth dy-namics at the shoot apex of Arabidopsis thaliana. A robust understanding of the causal relationship between gene expression, cell behaviors, and organ growth requires the de-velopment of computational techniques for quantitative analysis of real-time, live-cell meristem growth data. This requires the development/application of image analysis tools and novel time series alignment algorithms. Image analysis is necessary for the computa-tion of growth features, but this leads to a time series of unsynchronized growth data, which requires a robust alignment method. Towards this end, we present two time series alignment algorithms. This dissertation further considers image mining in historical document processing. An application of the Minimum Description Length principle (MDL) to develop a symbols clustering algorithm is presented. The developed algorithm pro-duced one of the first practical applications of MDL to real-world, real-valued data such as images. Moreover, we introduce a novel premise that a clustering algorithm should have the freedom to ignore some data. Extensive empirical results show that the MDL-based algorithm outperforms the popular K-Means clustering algorithm, given the same input data, distance measure, and the correct value of K in K-means. The new algorithm could have significant impact, as clustering is a critical subroutine in almost all historical document processing systems. Finally, we present an algorithm for detecting rare and ap-proximately repeating sequences in unbounded real-valued data streams, given limited space. This algorithm employs the novel integration of SAX time series representation with a Bloom filter to develop a robust cache maintenance policy that allows us to over-come known challenges to a previously unsolved frequent pattern mining problem. Our contribution lies in the fact that we solve this problem for real-valued data, whereas only the discrete-valued case has been considered in the literature.
机译:计算机科学的跨学科研究要求开发用于不同领域的实际应用的计算技术。这通常需要仔细整合不同领域的技术专长。本文提出了图像和时间序列分析算法,并在发展心理生物学,历史手稿处理和数据流处理中具有实际的跨学科应用。受NSF IGERT程序的启发,本论文提出了拟南芥茎尖生长动态动力学分析算法。要全面了解基因表达,细胞行为和器官生长之间的因果关系,需要开发用于实时分析活细胞分生组织生长数据的计算技术。这需要图像分析工具和新颖的时间序列比对算法的开发/应用。图像分析对于增长特征的计算是必需的,但这会导致时间序列不同步的增长数据,这需要一种可靠的对齐方法。为此,我们提出了两种时间序列比对算法。本文还考虑了历史文档处理中的图像挖掘。提出了最小描述长度原理(MDL)在开发符号聚类算法中的应用。所开发的算法将MDL应用于实际的,具有实际价值的数据(例如图像),是MDL的第一个实际应用。此外,我们介绍了一种新颖的前提,即聚类算法应具有忽略某些数据的自由。大量的经验结果表明,在相同的输入数据,距离度量以及K均值中K正确的情况下,基于MDL的算法优于流行的K-Means聚类算法。由于聚类是几乎所有历史文档处理系统中的关键子例程,因此新算法可能会产生重大影响。最后,我们给出了一种在给定空间有限的情况下检测无界实值数据流中稀有和近似重复序列的算法。该算法利用SAX时间序列表示与Bloom过滤器的新颖集成来开发健壮的缓存维护策略,该策略使我们能够克服已知的挑战,以解决以前无法解决的频繁模式挖掘问题。我们的贡献在于,我们为实值数据解决了这个问题,而文献中只考虑了离散值的情况。

著录项

  • 作者

    Tataw, Oben Moses.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Information Technology.;Computer Science.;Health Sciences Human Development.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 122 p.
  • 总页数 122
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号