首页> 外文期刊>BMC Medical Genomics >Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
【24h】

Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics

机译:从对齐的模式簇中发现并解离对齐的残基关联以揭示亚组特征

获取原文
           

摘要

A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.
机译:蛋白质家族具有局部保守的相似和多样的功能。对齐模式群集(APC)可以反映保守的功能。在APC中发现对齐的残基缔合(ARAs)可以揭示蛋白质家族保守区的微妙内部工作特征。但是,由于细微的多个缠绕因素,可能会纠缠与不同功能/子组/类相对应的ARA。为了从混合模式数据集(如APC)中将残基替换为基本生化特性列表时发现和解开模式,本文提出了一种新的方法,扩展的比对残差关联发现和解缠结(E-ARADD)。 E-ARADD将数值数据集离散化,以将混合模式数据集转换为事件值数据集,构造ARA频率矩阵,然后将其转换为调整后的统计残差(SR)向量空间(SRV),以捕获与随机性的统计偏差。通过在SRV上应用主成分(PC)分解,可以获得按其方差排名的PC。最终,当将PC上的投影重新投影到具有相同SRV基本矢量的矢量空间时,发现了纠缠的ARA。合成,细胞色素c和A类清除剂数据的实验表明,E-ARADD可以a)纠缠APC中纠缠的ARA(具有残基或生化特性),b)揭示与类,细微亚组或特定功能有关的细微AR簇。 E-ARADD可以发现和解开纠缠在蛋白质家族功能和位置上的AR和ARA,从而揭示生物保守区的功能亚群和亚群特征。合成数据的实验结果提供了成功解开的概念验证,它揭示了具有或不具有类别标签作为输入的与类别相关的ARA。细胞色素c数据实验证明了E-ARADD处理两种残留数据的功效。我们新颖的方法不仅能够发现和区分特定统计/功能(PC和RSRV)空间中的AR和ARA,而且还能发现它们在蛋白质家族功能域中的位置。 E-ARADD的成功展示了其在蛋白质组学研究,药物发现以及精密和个性化遗传医学方面的巨大潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号