首页> 外文期刊>BMC Bioinformatics >Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge
【24h】

Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge

机译:重新启动人类线粒体系统发育:具有专业知识的自动化且可扩展的方法

获取原文
           

摘要

Background Mitochondrial DNA is an ideal source of information to conduct evolutionary and phylogenetic studies due to its extraordinary properties and abundance. Many insights can be gained from these, including but not limited to screening genetic variation to identify potentially deleterious mutations. However, such advances require efficient solutions to very difficult computational problems, a need that is hampered by the very plenty of data that confers strength to the analysis. Results We develop a systematic, automated methodology to overcome these difficulties, building from readily available, public sequence databases to high-quality alignments and phylogenetic trees. Within each stage in an autonomous workflow, outputs are carefully evaluated and outlier detection rules defined to integrate expert knowledge and automated curation, hence avoiding the manual bottleneck found in past approaches to the problem. Using these techniques, we have performed exhaustive updates to the human mitochondrial phylogeny, illustrating the power and computational scalability of our approach, and we have conducted some initial analyses on the resulting phylogenies. Conclusions The problem at hand demands careful definition of inputs and adequate algorithmic treatment for its solutions to be realistic and useful. It is possible to define formal rules to address the former requirement by refining inputs directly and through their combination as outputs, and the latter are also of help to ascertain the performance of chosen algorithms. Rules can exploit known or inferred properties of datasets to simplify inputs through partitioning, therefore cutting computational costs and affording work on rapidly growing, otherwise intractable datasets. Although expert guidance may be necessary to assist the learning process, low-risk results can be fully automated and have proved themselves convenient and valuable.
机译:背景线粒体DNA由于其非凡的特性和丰富性,是进行进化和系统发育研究的理想信息来源。从中可以得到许多见解,包括但不限于筛选遗传变异以鉴定潜在的有害突变。但是,这样的进步需要针对非常困难的计算问题的有效解决方案,而这一需求却因大量数据而无法满足分析的需要。结果我们开发了系统的自动化方法来克服这些困难,从易于获得的公共序列数据库到高质量的比对和系统发育树。在自主工作流程的每个阶段中,都会仔细评估输出结果,并定义异常值检测规则以整合专家知识和自动管理,从而避免了过去解决问题的方法所遇到的手动瓶颈。使用这些技术,我们对人类线粒体的系统发育进行了详尽的更新,说明了我们方法的功能和计算可扩展性,并且对所得的系统发育进行了一些初步分析。结论当前的问题需要仔细定义输入并进行适当的算法处理,以使其解决方案切实可行。可以定义形式规则来解决前者的需求,方法是直接改进输入并将它们组合为输出,后者也有助于确定所选算法的性能。规则可以利用数据集的已知或推断属性来通过分区简化输入,从而降低计算成本,并为快速增长的否则难以处理的数据集提供工作。尽管可能需要专家指导来辅助学习过程,但低风险结果可以实现完全自动化,并证明自己很方便且有价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号