首页> 外文期刊>Journal of Molecular Biology >DISTINCTIVE SEQUENCE FEATURES IN PROTEIN CODING GENIC NON-CODING, AND INTERGENIC HUMAN DNA
【24h】

DISTINCTIVE SEQUENCE FEATURES IN PROTEIN CODING GENIC NON-CODING, AND INTERGENIC HUMAN DNA

机译:蛋白质编码基因的非编码和内源人类DNA的区别序列特征

获取原文
获取原文并翻译 | 示例
           

摘要

We have studied the behavior of a number of sequence statistics, mostly indicative of protein coding function, in a large set of human clone sequences randomly selected in the course of genome mapping (randomly selected clone sequences), and compared this with the behavior in known sequences containing genes (which we term genic sequences). As expected, given the higher coding density of the genic sequences, the sequence statistics studied behave in a substantially different manner in the randomly selected clone sequences (mostly intergenic DNA) and in the genic sequences. Strong differences in behavior of a number of such statistics are also observed, however when the randomly selected clone sequences are compared with only the non-coding fraction of the genic sequences, suggesting that intergenic and genic non-coding DNA constitute two different classes of non-coding DNA. By studying the behavior of the sequence statistics in simulated DNA of different C + G content, we have observed that a number of them are strongly dependent on C + G content. Thus, most differences between intergenic and genic non-coding DNA can be explained by differences in C + G content. A + T-rich intergenic DNA appears to be at the compositional equilibrium expected under random mutation, while C + G richer non-coding genic DNA is far from this equilibrium. The results obtained in simulated DNA indicate, on the other hand, that a very large fraction of the variation in the coding statistics that underlie gene identification algorithms is due simply to C + G content, and is not directly related to protein coding function. It appears, thus, that the performance of gene-finding algorithms should be improved by carefully distinguishing the effects of protein coding function from those of mere base compositional variation on such coding statistics. (C) 1995 Academic Press Limited [References: 26]
机译:我们研究了在基因组作图过程中随机选择的一大批人类克隆序列(随机选择的克隆序列)中许多序列统计数据的行为,其中大部分指示了蛋白质编码功能,并将其与已知的行为进行了比较包含基因的序列(我们称为基因序列)。不出所料,由于基因序列的编码密度较高,因此所研究的序列统计数据在随机选择的克隆序列(主要是基因间DNA)和基因序列中的行为有本质上的不同。还观察到了许多此类统计数据在行为上的强烈差异,但是,将随机选择的克隆序列与基因序列的非编码部分进行比较时,表明基因间和基因的非编码DNA构成了两类不同的非编码DNA。编码DNA。通过研究不同C + G含量的模拟DNA中序列统计的行为,我们观察到它们中的许多强烈依赖于C + G含量。因此,基因间和基因非编码DNA之间的大多数差异可以通过C + G含量的差异来解释。富含A + T的基因间DNA似乎处于随机突变下预期的组成平衡,而富含C + G的非编码基因DNA距离该平衡很远。另一方面,在模拟DNA中获得的结果表明,作为基因鉴定算法基础的编码统计数据中很大一部分变化仅归因于C + G含量,与蛋白质编码功能没有直接关系。因此,似乎应该通过仔细区分蛋白质编码功能的影响与仅基于这种编码统计的基本组成变化的影响来改善基因发现算法的性能。 (C)1995 Academic Press Limited [参考号:26]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号