首页> 外文期刊>Genomics >iDHS-DSAMS: Identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree
【24h】

iDHS-DSAMS: Identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree

机译:IDHS-DSAM:鉴定基于二核苷酸性能矩阵和集合袋树的DNase I过敏位点

获取原文
       

摘要

DNase I hypersensitive site (DHS) is related to DNA regulatory elements, so the understanding of DHS sites is of great significance for biomedical research. However, traditional experiments are not very good at identifying recombinant sites of a large number of emerging DNA sequences by sequencing. Some machine learning methods have been proposed to identify DHS, but most methods ignore spatial autocorrelation of the DNA sequence. In this paper, we proposed a predictor called iDHS-DSAMS to identify DHS based on the benchmark datasets. We develop a feature extraction method called dinucleotide-based spatial autocorrelation (DSA). Then we use Min-Redundancy-Max-Relevance (mRMR) to remove irrelevant and redundant features and a 100-dimensional feature vector is selected. Finally, we utilize ensemble bagged tree as classifier, which is based on the oversampled datasets using SMOTE. Five-fold cross validation tests on two benchmark datasets indicate that the proposed method outperforms its existing counterparts on the individual accuracy (Acc), Matthews correlation coefficient (MCC), sensitivity (Sn) and specificity (Sp).
机译:DNase I过敏部位(DHS)与DNA调节元素有关,因此对DHS网站的理解对生物医学研究具有重要意义。然而,传统的实验在通过测序识别大量新出现的DNA序列的重组位点并不是很好。已经提出了一些机器学习方法来识别DHS,但大多数方法忽略了DNA序列的空间自相关。在本文中,我们提出了一种称为IDHS-DSAM的预测器,以基于基于基准数据集来识别DHS。我们开发一种名为基于二核苷酸的空间自相关(DSA)的特征提取方法。然后,我们使用Min-Redundancy-Max-相关性(MRMR)来删除无关紧要,并且选择了100维特征向量。最后,我们利用Ensemble Bagged Tree作为分类器,它基于使用Smote的过采样的数据集。两个基准数据集上的五倍交叉验证测试表明所提出的方法优于其现有的对应物,对单独的精度(ACC),Matthews相关系数(MCC),灵敏度(SN)和特异性(SP)而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号