首页> 外文期刊>BMC Medical Genomics >BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge
【24h】

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge

机译:BioBin:一种生物信息学工具,利用公开的生物学知识自动对稀有变体进行分箱

获取原文
           

摘要

Background With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways. Methods We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF Results The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study. Conclusions We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.
机译:背景技术随着近来基因组序列数据成本的降低,人们越来越关注稀有变体和检测其与疾病相关性的方法。我们开发了BioBin,这是一种灵活的折叠方法,受到生物学知识的启发,可用于自动化低频变体的分箱以进行关联测试。我们还建立了知识集成库(LOKI),该库是从公共数据库收集的数据存储库,其中包含以下资源:来自国家生物技术中心(NCBI)的dbSNP和基因Entrez数据库信息,来自基因本体论(GO)的途径信息),蛋白质家族数据库(Pfam),《京都基因与基因组百科全书》(KEGG),Reactome,NetPath-信号转导途径,开放式法规注释数据库(ORegAnno),相互作用数据集的生物通用知识库(BioGrid),药物基因组学知识库(PharmGKB) ),分子相互作用数据库(MINT)和来自UCSC基因组浏览器的进化保守区(ECR)。 BioBin的新颖之处在于可以使用全面的知识指导的多级装箱。例如,可以使用来自以下区域的基因组位置形成箱边界:功能区,进化保守区,基因和/或途径。方法我们使用模拟数据和1000个基因组计划低覆盖率数据对BioBin进行了测试,以通过模拟致病性变体和稀有变体的成对比较(MAF结果)对我们的方法进行了测试(我们的模拟研究结果表明I型错误率得到控制,但是功率下降通过使用具有中等效应大小的变体,可以快速获得小样本量的样本;使用BioBin,我们能够在少于20个基因座的基因中找到模拟的变体,但发现在大箱中的敏感性要低得多,还强调了群体分层的规模我们可以将BioBin应用于dbGaP的自然生物学数据中,并确定了一个有趣的候选基因以进行进一步的研究。结论我们已经确定BioBin将是一个非常实用且灵活的工具分析序列数据并可能发现低频变异与复杂疾病之间的新型关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号