首页> 外文学位 >Mining gene microarray expression profiles.
【24h】

Mining gene microarray expression profiles.

机译:挖掘基因芯片表达谱。

获取原文
获取原文并翻译 | 示例

摘要

The microarray technologies provide a tool for study large scale gene expression relationship. One of the fundamental principles of biological organization is modularity, i.e. genes can be modularized according to their expression profiles. In general, clustering algorithms are used to group gene expression profiles, and then extract useful patterns. In first part of this dissertation, we proposed a new hierarchical clustering algorithm, called dynamically growing self-organizing tree (DGSOT) algorithm, which overcomes drawbacks of traditional hierarchical clustering algorithms. The DGSOT algorithm combines the horizontal growth and vertical growth to construct a mutlifurcating hierarchical tree from top to bottom to cluster the data. In addition, we propose a new cluster validation criterion, called Cluster Separation, for finding the proper number of clusters at each hierarchical level. And a K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was proposed to improve the clustering accuracy. The clustering result of the DGSOT can be easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression profiles, we found that the hierarchical structure of the DGSOT clustering results is more reasonable than that of Self Organizing Tree Algorithm (SOTA) results. Furthermore, the biological functionality enrichment in the clusters is considerably higher.; However, the clustering algorithms need to artificially predefine a threshold to obtain quality clusters. In second part of this dissertation, we proposed a new algorithm based on random matrix theory, called random matrix modeling (RMM), to automatically reveal gene coexpression modules from microarray expression profiles. The similarity threshold obtained by the RMM is from the inherent characteristic of the input dataset. We evaluated the RMM by an in silico modular network model and demonstrated it on a yeast cell cycle microarray expression profiles. The statistical analyses show that the obtained modules are of biological origin and stable to noise. Furthermore, the structure properties of the modules have been proved to follow the common properties of typical biological systems. To the best of our knowledge, the RMM is the first algorithm that presents an objective mathematical criterion to decide the best threshold to reveal gene coexpression modules. And it has been proved to be a robust, sensitive and validate method to reveal the gene coexpression modules from the microarray profiles.
机译:微阵列技术为研究大规模基因表达关系提供了一种工具。生物学组织的基本原理之一是模块化,即基因可以根据其表达谱进行模块化。通常,聚类算法用于对基因表达谱进行分组,然后提取有用的模式。在本文的第一部分,我们提出了一种新的层次聚类算法,称为动态增长自组织树(DGSOT)算法,它克服了传统层次聚类算法的缺点。 DGSOT算法结合了水平增长和垂直增长,从上到下构造了一个多层次结构树来对数据进行聚类。此外,我们提出了一种新的群集验证标准,称为“群集分离”,用于在每个层次级别上找到适当数量的群集。为了提高聚类的准确性,提出了一种K-level up distribution(KLD)机制,该机制增加了层次结构中数据分布的范围。 DGSOT的聚类结果可以很容易地显示为树状图以进行可视化。基于酵母细胞周期微阵列表达谱,我们发现DGSOT聚类结果的层次结构比自组织树算法(SOTA)结果更合理。此外,簇中的生物功能富集度更高。但是,聚类算法需要人为地预定义阈值以获得质量聚类。在本文的第二部分中,我们提出了一种基于随机矩阵理论的新算法,称为随机矩阵建模(RMM),以从微阵列表达谱中自动揭示基因共表达模块。 RMM获得的相似性阈值来自输入数据集的固有特性。我们通过计算机模块网络模型评估了RMM,并在酵母细胞周期微阵列表达谱上进行了证明。统计分析表明,所获得的模块是生物来源的并且对噪声稳定。此外,已经证明模块的结构性质遵循典型生物系统的共同性质。据我们所知,RMM是第一个提出客观数学标准的算法,该标准可确定揭示基因共表达模块的最佳阈值。事实证明,这是一种从微阵列图谱揭示基因共表达模块的可靠,灵敏且有效的方法。

著录项

  • 作者

    Luo, Feng.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Computer Science.; Biology Molecular.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;分子遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号