...
首页> 外文期刊>BMC Bioinformatics >A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
【24h】

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations

机译:一种半监督的深度学习方法,用于预测基因组非编码变化的功能效应

获取原文
           

摘要

Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.
机译:理解非编码变体的功能效果很重要,因为它们通常与基因表达改变和疾病发育有关。在过去的几年里,已经开发了许多计算工具来预测其功能影响。然而,处理数据稀缺的内在难度导致进一步改善算法的必要性。在这项工作中,我们提出了一种新的方法,采用具有伪标签的半监督深度学习模型,这利用了从实验注释和未经发票的数据学习。我们在GM12878,HepG2和K562细胞系中制备了具有组蛋白标记,DNA可访问性和序列上下文的已知功能性非编码变体。将我们的方法应用于DataSet展示了其出色的性能,而现有工具相比。我们的结果还表示,具有伪标签的半监督模型比没有伪标签的监督模型实现更高的预测性能。有趣的是,在某种细胞系中训练的模型训练的模型不太可能在其他细胞系中成功,这意味着非编码变体的细胞类型特性。值得注意的是,我们发现DNA可访问性显着促进了变体的功能后果,这表明在建立非编码变体与基因调节的相互作用之前开放染色质构象的重要性。与伪标签相结合的半监督深度学习模型在与有限的数据集进行学习方面具有优势,在生物学中并不罕见。我们的研究提供了一种有效的方法,寻找可能与各种生物现象有关的非编码突变,包括人类疾病。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号