首页> 中文期刊> 《基因组蛋白质组与生物信息学报:英文版》 >GTB-PPI:Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting

GTB-PPI:Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting

         

摘要

Protein–protein interactions(PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC), pseudo position-specific scoring matrix(Pse PSSM), reduced sequence and index-vectors(RSIV), and autocorrelation descriptor(AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression(L1-RLR) to select an optimal feature subset.Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号