...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Ensembling of Gene Clusters Utilizing Deep Learning and Protein-Protein Interaction Information
【24h】

Ensembling of Gene Clusters Utilizing Deep Learning and Protein-Protein Interaction Information

机译:利用深层学习和蛋白质互动信息的基因集群合奏

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster ensemble techniques aim to combine the outputs of multiple clustering algorithms to obtain a single consensus partitioning. The current paper reports about the development of a cluster ensemble based technique combining the concepts of multiobjective optimization and deep-learning models for gene clustering where some additional protein-protein interaction information are utilized for generating the consensus partitioning. The proposed ensemble based framework works in four phases: (i) filtering out the irrelevant genes from the microarray dataset: only the statistically significant genes are considered for further data analysis; (ii) generation of diverse base partitionings: a multi-objective optimization-based clustering technique is proposed which simultaneously optimizes three different cluster quality measures and generates a set of partitioning solutions on the Pareto optimal front; (iii) generation of a consensus partitioning: mentha scores, calculated by accessing a highly enriched protein-protein interaction archive named mentha, of different clustering solutions are considered for generating a weighted incidence matrix; (iv) finally, two approaches are used to generate a consensus partitioning from the obtained incidence matrix. The first approach is based on a traditional machine learning method, and another approach exploits the graph partitioning algorithm and two deep neural models to generate the final clustering. To validate the efficacy of the proposed ensemble framework, it is applied on five gene expression datasets. We present a comparative analysis of the proposed technique over different clustering algorithms in terms of biological homogeneity index (BHI) and biological stability index (BSI). The traditional approach attains an average 3 and 2 percent improvements over the best non-dominated solution with respect to BHI and BSI, respectively, whereas deep learning models illustrate an average 6.8 and 1.5 percent improvements over the proposed traditional approach with respect to BHI and BSI, respectively. Subsequently, Welch's t-test is executed to prove that the results obtained by the proposed methods are statistically significant. Availability of data and materials: https://github.com/sduttap16/DeepEnsm.
机译:群集集合技术旨在将多个聚类算法的输出组合以获得单个共识分区。目前的论文报告了基于组合的基于群组的技术的发展,其组合了多目标优化和基因聚类深度学习模型的概念,其中用于产生共识分区的一些额外的蛋白质蛋白质相互作用信息。所提出的基于集合的框架在四个阶段工作:(i)从微阵列数据集中过滤出不相关的基因:仅考虑统计学上显着的基因进行进一步的数据分析; (ii)提出了一种不同基础分区的生成:提出了一种多目标优化的聚类技术,该技术同时优化了三种不同的群集质量措施,并在Pareto最佳前沿产生一组分区解决方案; (iii)通过访问名为Mentha的高度富集的蛋白质 - 蛋白质相互作用归档来计算不同聚类溶液的术语分段的产生:用于产生加权入射矩阵; (iv)最后,两种方法用于从所获得的发射基质产生共有分配。第一种方法是基于传统的机器学习方法,另一种方法利用图形分区算法和两个深神经模型来生成最终聚类。为了验证所提出的集合框架的功效,它适用于五个基因表达数据集。我们在生物均质指数(BHI)和生物稳定性指数(BSI)方面对不同聚类算法的提出技术进行了比较分析。传统方法分别对BHI和BSI的最佳非主导解决方案分别获得了平均3和2%的改善,而深度学习模型的说明平均值为6.8%和1.5%,而是对BHI和BSI的建议传统方法的改进。 , 分别。随后,执行Welch的T检验以证明通过所提出的方法获得的结果是统计学意义的。数据和材料的可用性:https://github.com/sduttap16/depensm。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号