首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling
【24h】

Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling

机译:形成器:基于最遥远的下采样和安全级合成少数群体过采样预测赖氨酸甲型蛋白质化站点

获取原文
获取原文并翻译 | 示例
           

摘要

Lysine formylation is a reversible type of protein post-translational modification and has been found to be involved in a myriad of biological processes, including modulation of chromatin conformation and gene expression in histones and other nuclear proteins. Accurate identification of lysine formylation sites is essential for elucidating the underlying molecular mechanisms of formylation. Traditional experimental methods are time-consuming and expensive. As such, it is desirable and necessary to develop computational methods for accurate prediction of formylation sites. In this study, we propose a novel predictor, termed Formator, for identifying lysine formylation sites from sequences information. Formator is developed using the ensemble learning (EL) strategy based on four individual support vector machine classifiers via a voting system. Moreover, the most distant undersampling and Safe-Level-SMOTE oversampling techniques were integrated to deal with the data imbalance problem of the training dataset. Four effective feature extraction methods, namely bi-profile Bayes (BPB), k-nearest neighbor (KNN), amino acid physicochemical properties (AAindex), and composition and transition (CTD) were employed to encode the surrounding sequence features of potential formylation sites. Extensive empirical studies show that Formator achieved the accuracy of 87.24 and 74.96 percent on jackknife test and the independent test, respectively. Performance comparison results on the independent test indicate that Formator outperforms current existing prediction tool, LFPred, suggesting that it has a great potential to serve as a useful tool in identifying novel lysine formylation sites and facilitating hypothesis-driven experimental efforts.
机译:赖氨酸甲型化是翻译后修饰的可逆蛋白质的蛋白质,并且已被发现参与了无数的生物过程,包括在组蛋白和其他核蛋白中调节染色质构象和基因表达。精确鉴定赖氨酸甲型化位点对于阐明丙烯化的下面的分子机制是必不可少的。传统的实验方法是耗时和昂贵的。因此,希望制定用于精确预测甲酰化位点的计算方法是理想的。在该研究中,我们提出了一种新的预测因子,称为形成器,用于从序列信息中鉴定赖氨酸甲型蛋白质化位点。基于四个单独的支持向量机分类器的集合学习(EL)策略,通过投票系统使用集合学习(EL)策略开发。此外,集成了最遥远的欠采样和安全级别的过采样技术,以处理训练数据集的数据不平衡问题。使用四种有效特征提取方法,即双型贝叶斯(BPB),K最近邻(KNN),氨基酸物理化学性质(Aaindex),以及组成和转变(CTD)以编码潜在甲酰化位点的周围序列特征。广泛的实证研究表明,成立者分别取决于千克试验和独立测试的87.24和74.96%的准确性。独立测试的性能比较结果表明,形成器优于现有现有的预测工具,LFPRED,表明它具有巨大的潜力,可以作为识别新型赖氨酸甲型化位点并促进假设驱动的实验努力的有用工具。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号