...
首页> 外文期刊>BMC Medical Genomics >Mut2Vec: distributed representation of cancerous mutations
【24h】

Mut2Vec: distributed representation of cancerous mutations

机译:Mut2Vec:癌症突变的分布式表示

获取原文
           

摘要

Embedding techniques for converting high-dimensional sparse data into low-dimensional distributed representations have been gaining popularity in various fields of research. In deep learning models, embedding is commonly used and proven to be more effective than naive binary representation. However, yet no attempt has been made to embed highly sparse mutation profiles into densely distributed representations. Since binary representation does not capture biological context, its use is limited in many applications such as discovering novel driver mutations. Additionally, training distributed representations of mutations is challenging due to a relatively small amount of available biological data compared with the large amount of text corpus data in text mining fields. We introduce Mut2Vec, a novel computational pipeline that can be used to create a distributed representation of cancerous mutations. Mut2Vec is trained on cancer profiles using Skip-Gram since cancer can be characterized by a series of co-occurring mutations. We also augmented our pipeline with existing information in the biomedical literature and protein-protein interaction networks to compensate for the data insufficiency. To evaluate our models, we conducted two experiments that involved the following tasks: a) visualizing driver and passenger mutations, b) identifying novel driver mutations using a clustering method. Our visualization showed a clear distinction between passenger mutations and driver mutations. We also found driver mutation candidates and proved that these were true driver mutations based on our literature survey. The pre-trained mutation vectors and the candidate driver mutations are publicly available at http://infos.korea.ac.kr/mut2vec . We introduce Mut2Vec that can be utilized to generate distributed representations of mutations and experimentally validate the efficacy of the generated mutation representations. Mut2Vec can be used in various deep learning applications such as cancer classification and drug sensitivity prediction.
机译:用于将高维稀疏数据转换为低维分布式表示的嵌入技术已在各个研究领域中得到普及。在深度学习模型中,嵌入通常被使用,并被证明比幼稚的二进制表示更有效。但是,尚未尝试将高度稀疏的突变概况嵌入密集分布的表示中。由于二进制表示不能捕获生物学环境,因此在许多应用程序(例如发现新的驱动程序突变)中其使用受到限制。另外,与文本挖掘领域中大量的文本语料库数据相比,训练突变的分布式表示形式具有挑战性,因为相对较少的可用生物数据。我们介绍Mut2Vec,这是一种新颖的计算管道,可用于创建癌症突变的分布式表示。 Mut2Vec使用Skip-Gram进行了癌症概况培训,因为癌症可以通过一系列共同发生的突变来表征。我们还利用生物医学文献和蛋白质-蛋白质相互作用网络中的现有信息扩充了渠道,以弥补数据不足的问题。为了评估我们的模型,我们进行了两个涉及以下任务的实验:a)可视化驾驶员和乘客的突变,b)使用聚类方法识别新的驾驶员突变。我们的可视化显示出乘客突变和驾驶员突变之间的明显区别。我们还找到了驱动程序突变候选者,并根据我们的文献调查证明了这些是真正的驱动程序突变。预训练的突变载体和候选驱动子突变可在http://infos.korea.ac.kr/mut2vec上公开获得。我们介绍了可用于生成突变的分布式表示形式的Mut2Vec,并通过实验验证了生成的突变表示形式的功效。 Mut2Vec可用于各种深度学习应用程序,例如癌症分类和药物敏感性预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号