首页> 外文会议>3rd international conference on bioinformatics and computational biology 2011 >Mining Temporal Patterns from iTRAQ Mass Spectrometry(LC-MS/MS) Data $1Fahad Saeed$2Trairak Pisitkun$3Mark A Knepper$4Jason D Hoffert;
【24h】

Mining Temporal Patterns from iTRAQ Mass Spectrometry(LC-MS/MS) Data $1Fahad Saeed$2Trairak Pisitkun$3Mark A Knepper$4Jason D Hoffert;

机译:从iTRAQ质谱(LC-MS / MS)数据中挖掘时间模式$ 1Fahad Saeed $ 2Trairak Pisitkun $ 3Mark A Knepper $ 4Jason D Hoffert;

获取原文
获取原文并翻译 | 示例

摘要

Large-scale proteomic analysis is emerging as a powerful technique in biology and relies heavily on data acquired by state-of-the-art mass spectrometers. As with any other field in Systems Biology, computational tools are required to deal with this ocean of data. iTRAQ (isobaric Tags for Relative and Absolute quantification) is a technique that allows simultaneous quantification of proteins from multiple samples. Although iTRAQ data gives useful insights to the biologist, it is more complex to perform analysis and draw biological conclusions because of its multi-plexed design. One such problem is to find proteins that behave in a similar way (i.e. change in abundance) among various time points since the temporal variations in the proteomics data reveal important biological information. Distance based methods such as Euclidian distance or Pearson coefficient, and clustering techniques such as k-mean etc, are not able to take into account the temporal information of the series. In this paper, we present an linear-time algorithm for clustering similar patterns among various iTRAQ time course data irrespective of their absolute values. The algorithm, referred to as Temporal Pattern Mining(TPM), maps the data from a Cartesian plane to a discrete binary plane. After the mapping a dynamic programming technique allows mining of similar data elements that are temporally closer to each other. The proposed algorithm accurately clusters iTRAQ data that are temporally closer to each other with more than 99% accuracy. Experimental results for different problem sizes are analyzed in terms of quality of clusters, execution time and scalability for large data sets. An example from our proteomics data is provided at the end to demonstrate the performance of the algorithm and its ability to cluster temporal series irrespective of their distance from each other.
机译:大规模蛋白质组学分析正在作为一种强大的生物学技术出现,并严重依赖于由最新质谱仪获得的数据。与系统生物学的任何其他领域一样,需要使用计算工具来处理这些海洋数据。 iTRAQ(相对和绝对定量的等压标记)是一种允许同时定量多个样品中蛋白质的技术。尽管iTRAQ数据为生物学家提供了有用的见解,但由于其多重设计,执行分析和得出生物学结论更加复杂。一个这样的问题是,由于蛋白质组学数据中的时间变化揭示了重要的生物学信息,因此发现了在各个时间点之间表现出相似行为(即丰度变化)的蛋白质。基于距离的方法(例如Euclidian距离或Pearson系数)和聚类技术(例如k-mean等)无法考虑序列的时间信息。在本文中,我们提出了一种线性时间算法,用于在各种iTRAQ时间过程数据之间的相似模式(无论其绝对值)之间进行聚类。该算法称为时间模式挖掘(TPM),将数据从笛卡尔平面映射到离散的二进制平面。在映射之后,动态编程技术允许挖掘时间上彼此靠近的相似数据元素。所提出的算法可以准确地将时间上彼此更靠近的iTRAQ数据进行聚类,其准确率超过99%。针对大型数据集的集群质量,执行时间和可伸缩性,分析了不同问题大小的实验结果。最后,从我们的蛋白质组学数据中提供了一个示例,以演示该算法的性能及其对时间序列进行聚类的能力,而无论它们彼此之间的距离如何。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号