...
首页> 外文期刊>BMC Genomics >Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
【24h】

Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data

机译:通过融合异质基因组和表型数据的多图随机漫步识别疾病基因

获取原文
           

摘要

Background High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. Results In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. Conclusions The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization.
机译:背景高通量实验产生了许多基因组数据集和数百种候选疾病基因。为了从一组候选基因中发现真正的疾病基因,已经提出了一种计算方法,并致力于各种类型的基因组数据源。由于基因组数据的单一来源容易产生偏差,不完整和噪音,因此,为了实现可靠的疾病基因鉴定,强烈要求对不同基因组数据源进行整合。结果与通常采用的数据集成方法不同,该方法将来自每个数据源的候选基因的单独列表进行整合,我们将各种基因组网络合并为一个多图,该多图能够在一对节点之间连接多个边。这种新颖的方法提供了一个具有强大的噪声承受能力的数据平台,可以对疾病基因进行优先排序。然后,提出了一种新的随机游动思想,即使用经过修改的步骤来计算转换矩阵,从而可以处理多图。通过允许表型和基因网络之间的交叉遍历,我们的方法得到了进一步增强,可以处理异构数据类型。与基准数据集相比,我们的方法在疾病基因鉴定中比最先进的方法更准确。我们还进行了一项案例研究,以鉴定胰岛素依赖型糖尿病的疾病基因。一些新发现的疾病基因得到了最近发表的文献的支持。结论所提出的RWRM(在多图上重新启动随机行走)模型和CHN(复杂异类网络)模型可有效地进行数据整合,以进行候选基因优先排序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号