...
首页> 外文期刊>BMC Systems Biology >Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization
【24h】

Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization

机译:通过正则化非负矩阵分解来预测稀疏标签PPI网络中的蛋白质功能特性

获取原文
           

摘要

Predicting functional properties of proteins in protein-protein interaction (PPI) networks presents a challenging problem and has important implication in computational biology. Collective classification (CC) that utilizes both attribute features and relational information to jointly classify related proteins in PPI networks has been shown to be a powerful computational method for this problem setting. Enabling CC usually increases accuracy when given a fully-labeled PPI network with a large amount of labeled data. However, such labels can be difficult to obtain in many real-world PPI networks in which there are usually only a limited number of labeled proteins and there are a large amount of unlabeled proteins. In this case, most of the unlabeled proteins may not connected to the labeled ones, the supervision knowledge cannot be obtained effectively from local network connections. As a consequence, learning a CC model in sparsely-labeled PPI networks can lead to poor performance. We investigate a latent graph approach for finding an integration latent graph by exploiting various latent linkages and judiciously integrate the investigated linkages to link (separate) the proteins with similar (different) functions. We develop a regularized non-negative matrix factorization (RNMF) algorithm for CC to make protein functional properties prediction by utilizing various data sources that are available in this problem setting, including attribute features, latent graph, and unlabeled data information. In RNMF, a label matrix factorization term and a network regularization term are incorporated into the non-negative matrix factorization (NMF) objective function to seek a matrix factorization that respects the network structure and label information for classification prediction. Experimental results on KDD Cup tasks predicting the localization and functions of proteins to yeast genes demonstrate the effectiveness of the proposed RNMF method for predicting the protein properties. In the comparison, we find that the performance of the new method is better than those of the other compared CC algorithms especially in paucity of labeled proteins.
机译:在蛋白质-蛋白质相互作用(PPI)网络中预测蛋白质的功能特性是一个具有挑战性的问题,在计算生物学中具有重要意义。利用属性特征和相关信息共同对PPI网络中的相关蛋白质进行分类的集体分类(CC)已被证明是解决此问题的有力方法。如果给定具有大量标记数据的全标记PPI网络,启用CC通常会提高准确性。但是,在许多现实世界的PPI网络中可能很难获得这样的标记,在这些网络中,通常只有有限数量的标记蛋白质,并且有大量未标记蛋白质。在这种情况下,大多数未标记的蛋白质可能未与标记的蛋白质连接,因此无法从本地网络连接中有效地获得监管知识。结果,在标记稀疏的PPI网络中学习CC模型可能会导致性能下降。我们研究了一种潜在图方法,通过利用各种潜在连接来寻找整合潜在图,并明智地整合所研究的连接,以链接(分离)具有相似(不同)功能的蛋白质。我们开发了一种用于CC的正则化非负矩阵分解(RNMF)算法,以通过利用此问题设置中可用的各种数据源(包括属性特征,潜图和未标记的数据信息)进行蛋白质功能特性预测。在RNMF中,将标签矩阵分解项和网络正则化项合并到非负矩阵分解(NMF)目标函数中,以寻求一种尊重网络结构和标签信息以进行分类预测的矩阵分解。关于KDD Cup任务的实验结果预测了蛋白质对酵母基因的定位和功能,证明了所提出的RNMF方法预测蛋白质特性的有效性。在比较中,我们发现新方法的性能优于其他比较的CC算法,尤其是在缺乏标记蛋白的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号