...
首页> 外文期刊>Source Code for Biology Medicine >A software pipeline for processing and identification of fungal ITS sequences
【24h】

A software pipeline for processing and identification of fungal ITS sequences

机译:用于处理和鉴定真菌ITS序列的软件管道

获取原文
           

摘要

Background Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software – in the form of an identification pipeline for large sets of fungal ITS sequences – developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 ITS sequences from fungi growing as epiphytes on building material. Results The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion ITS2 of the ITS region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the Ascomycota, with Penicillium and Aspergillus as the two most common genera. The ITS2 was found to indicate a different taxonomic affiliation than did the complete ITS region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences. Conclusion The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups.
机译:通常通过核糖体内部转录间隔区(ITS)区域的DNA测序将环境样品中的背景真菌鉴定到物种水平,以用于国际核苷酸序列数据库中基于BLAST的相似性搜索。这些搜索非常耗时,并且经常需要大量的人工干预和补充分析。在此,我们将以针对大型真菌ITS序列的识别管道的形式介绍软件,以自动执行BLAST过程和其他一些分析步骤。对管道的性能进行了评估,该数据集来自作为附生植物生长在建筑材料上的真菌的350 ITS序列数据集。结果该管道使用Perl编写,并使用本地安装的NCBI-BLAST进行查询序列的相似性搜索。从序列中提取ITS区域的可变子区域ITS2,并将其用于更高灵敏度的其他搜索。计算每个查询序列及其最接近匹配项的多重比对,将共享至少50%最佳匹配项的查询序列聚类,以方便评估假设的特定组。事实证明,该管道大大加快了评估数据集的处理速度,并提高了分辨率,并且发现真菌主要属于子囊菌,其中青霉属和曲霉属是两个最常见的属。对于10%的查询序列,发现ITS2与完整的ITS区域具有不同的分类隶属关系,尽管该数字可能随查询序列的分类范围而变化。结论本软件很容易将大量的真菌查询序列分配给国际序列数据库中它们各自的最佳匹配,并将它们置于更大的生物学环境中。尽管仍然需要对输出进行高度结构化以使其易于处理,但是仍然需要检查这些输出,并可能针对这些数据库中不完整且有时带有错误注释的真菌条目的影响进行校正。开源管道可用于UNIX类型的平台,并且目标数据库的更新版本每两周提供一次。可以轻松修改管道,以在其他分子区域和生物组上运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号