...
首页> 外文期刊>Genomics >iRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components
【24h】

iRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components

机译:IRSPOT-SF:通过将基于序列的特征结合到Chou的伪组件中来预测重组热点

获取原文
获取原文并翻译 | 示例
           

摘要

Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.
机译:基因组中的重组热点不均匀分布。热点是基因组中的区域,其显示出更高的减数分裂重组率。重组热点预测的计算方法通常使用源自核苷酸的物理化学或结构性质的复杂特征。在本文中,我们提出了使用基于序列的特征的Irspot-SF,该功能是生成的计算方式。在我们的方法中使用了四个特征组:K-MER组合物,k-MER组合物,K-MERS的TF-IDF和反向补体K-MER组合物。我们使用递归功能消除以选择热点预测的17个顶部功能。我们的分析表明,胶布K-MER成分的优越性和逆向补体K-MET构成特征在于其他特征。我们使用SVM用RBF内核作为分类算法。我们在标准基准数据集中测试了我们的算法。与其他方法相比,IRSPOT-SF能够在精度,MATHEW的相关系数和敏感度方面产生明显更好的结果,敏感度为84.58%,0.6941和84.57%。我们已经让我们的方法随时可用作基于Python的工具,并制作了可提供的数据集和源代码:https://github.com/abdlmaruf/irspot-sf。基于Irspot-SF和自由使用的Web应用程序在:http://irspot.pythonanywherewhere.com/server.html使用。

著录项

  • 来源
    《Genomics》 |2019年第4期|共7页
  • 作者单位

    United Int Univ Dept Comp Sci &

    Engn Madani Aveneue Dhaka 1212 Bangladesh;

    United Int Univ Dept Comp Sci &

    Engn Madani Aveneue Dhaka 1212 Bangladesh;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 医学遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号