...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction
【24h】

Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction

机译:基于稀疏马尔可夫链的半监督多实例多标签蛋白质功能预测方法

获取原文
获取原文并翻译 | 示例
           

摘要

Automated assignment of protein function has received considerable attention in recent years for genome-wide study. With the rapid accumulation of genome sequencing data produced by high-throughput experimental techniques, the process of manually predicting functional properties of proteins has become increasingly cumbersome. Such large genomics data sets can only be annotated computationally. However, automated assignment of functions to unknown protein is challenging due to its inherent difficulty and complexity. Previous studies have revealed that solving problems involving complicated objects with multiple semantic meanings using the multi-instance multi-label (MIML) framework is er effective. For the protein function prediction problems, each protein object in nature may associate with distinct structural units (instances) and multiple functional properties (class labels) where each unit is described by an instance and each functional property is considered as a class label. Thus, it is convenient and natural to tackle the protein function prediction problem by using the MIML framework. In this paper, we propose a sparse Markov chain-based semi-supervised MIML method, called Sparse-Markov. A sparse transductive probability graph is constructed to encode the affinity information of the data based on ensemble of Hausdorff distance metrics. Our goal is to exploit the affinity between protein objects in the sparse transductive probability graph to seek a sparse steady state probability of the Markov chain model to do protein function prediction, such that two proteins are given similar functional labels if they are close to each other in terms of an ensemble Hausdorff distance in the graph. Experimental results on seven real-world organism data sets covering three biological domains show that our proposed Sparse-Markov method is able to achieve better performance than four state-of-the-art MIML learning algorithms.
机译:近年来,蛋白质功能的自动分配已在全基因组研究中引起了广泛关注。随着通过高通量实验技术产生的基因组测序数据的快速积累,手动预测蛋白质功能特性的过程变得越来越繁琐。如此庞大的基因组数据集只能通过计算进行注释。然而,由于其固有的困难和复杂性,将功能自动分配给未知蛋白质具有挑战性。以前的研究表明,使用多实例多标签(MIML)框架解决涉及具有多个语义含义的复杂对象的问题是有效的。对于蛋白质功能预测问题,自然界中的每个蛋白质对象都可能与不同的结构单元(实例)和多个功能特性(类标记)相关联,其中每个单元由一个实例描述,每个功能特性都被视为类标记。因此,使用MIML框架解决蛋白质功能预测问题既方便又自然。在本文中,我们提出了一种基于稀疏Markov链的半监督MIML方法,称为Sparse-Markov。构造稀疏的转导概率图,以基于Hausdorff距离度量的集成来编码数据的亲和力信息。我们的目标是利用稀疏转导概率图中蛋白质对象之间的亲和力,以寻求马尔可夫链模型的稀疏稳态概率来进行蛋白质功能预测,这样,如果两个蛋白质彼此靠近,则会被赋予相似的功能标记以图中的整体Hausdorff距离表示。在涵盖三个生物学领域的七个真实世界生物数据集上的实验结果表明,我们提出的Sparse-Markov方法比四种最新的MIML学习算法能够实现更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号