首页> 外文期刊>Investigative Genetics >Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
【24h】

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

机译:用于分析工具的Phred-Phrap软件包:促进人口遗传学重新测序研究的管道

获取原文
           

摘要

Background Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. Results In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. Conclusion We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.
机译:背景技术有针对性的重测序是群体遗传学研究中最强大,应用最广泛的策略之一,因为它可以无偏倚地筛选适用于多种生物的变异。需要重新排序数据的研究实例包括进化论推论,旨在捕获罕见基因多态性和复杂性状的流行病学研究,以及筛查特定遗传病高发家庭和小人群中突变的方法。尽管新一代测序技术的出现,Sanger测序仍是群体遗传学研究中最流行的方法,这是因为基于毛细管电泳的自动测序仪的广泛普及以及它仍然不易出现测序错误,这对人群至关重要遗传学研究。用于重新测序研究的两种流行软件应用是Phred-Phrap-Consed-Polyphred,它可以进行碱基鉴定,比对,图形编辑和基因型鉴定,以及DNAsp,可以进行一组种群遗传学分析。这些独立的工具是基础分析的起点和终点。在使用这些工具之间,需要执行一组基本的但容易出错的任务,以对数据进行重新排序。结果为了协助完成这些中间任务,我们开发了一条管道,可简化典型的重测序研究的数据处理。我们的流程:(1)合并共享参考序列的不同Phred-Phrap-Consed重叠群产生的不同输出; (2)检查基因分型不一致; (3)将Polyphred产生的基因分型数据重新格式化为基因型矩阵,其中个体为行,分离位点为列; (4)使用流行的软件PHASE准备用于单倍型推断的输入文件; (5)处理仅包含多态性位点的PHASE输出文件,以重建人口遗传软件对DNAsp等数据进行重测序所需的推断单倍型,包括多态性和单态性位点。结论我们在人类,植物,动物和微生物中的单倍体和二倍体数据的重新测序研究中测试了该管道,并观察到它大大减少了测序分析所需的时间,并且是一个更加可控的过程,消除了一些处理数据集时可能发生的错误类别。该管道对于使用其他工具进行测序和群体遗传分析的研究人员也非常有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号