首页> 美国卫生研究院文献>Genes >InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
【2h】

InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning

机译:InPactordB:基于机器学习的自由对准方法一个分类的谱系级工厂LTR RODRANSPOSON参考库

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
机译:长终端重复(LTR)回向转换是构成大多数植物基因组的主要部分的移动元件。通过生物信息学方法识别和注释这些元素的识别和注释代表了大规模植物基因组测序时代的主要挑战。除了他们参与基因组大小的变化之外,LTR回收转换还与不同染色体区域的功能和结构相关,并且可以改变编码区的功能。植物LTR回收扫描碟的几个序列数据库可用于公共访问,例如PGSB和RepetDB,或诸如Repbase的受限访问。虽然这些数据库通过相似性识别新基因组中的LTR-RTS是有用的,但这些数据库的元素没有完全分类为谱系(也称为家庭)级别。在这里,我们呈现InPactordB,由195种植物基因组(属于108种植物物种)的130,439个元素组成的半静曲数据集,分类为谱系。该数据集已用于培训两个深度神经网络(即一个完全连接和一个卷积),以便快速分类这些元素。在谱系级别的分类方法中,我们获得了高达98%的性能,由F1分数,精度和召回分数表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号