首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >Modeling genetic heterogeneity in Hepatitis C Virus hyper-variable region 1 infers demographic characteristics of infected hosts
【24h】

Modeling genetic heterogeneity in Hepatitis C Virus hyper-variable region 1 infers demographic characteristics of infected hosts

机译:丙型肝炎病毒高变区1中的遗传异质性建模可推断受感染宿主的人口统计学特征

获取原文

摘要

Hepatitis C Virus (HCV) is the most common etiological cause of non-Aon-B blood-borne viral hepatitis and the leading cause for liver transplantation. The population of HCV-infected individuals in the US is estimated to be over 3 million. There are 7 major HCV genotypes with world-wide distribution, which are further grouped into numerous sub-genotypes. HCV genotype 1a is the most common genotype in the US, with genotype 1b being the next most common. Both genotypes are responsible for the most difficult-to-treat infections. Several host- and viral-related factors have been identified as risk factors for development of HCV chronic (HCH) infection, liver disease progression and therapy outcome. We previously reported that certain host demographic characteristics were found associated to the genetic properties of HCV strains in a group of chronically infected patients undergoing combined interferon and ribavirin therapy. In this study we expanded analysis to a larger dataset to further explore association between the hosts' ethnic background and the genetic properties of the HCV hyper-variable region 1 (HVR1). The HCV data contained sequences of intra-host HVR1 variants of HCV1a and HCV1b (n=936 and n=630, respectively) obtained from a national survey and five independent state-wide outbreak investigations. Association between properties of HVR1 strains and hosts' ethnicity was examined using viral features derived from nucleotide (nt) and amino-acid (aa) sequence information. Nucleotide sequences of 87nt at genome position 1491–1577 and amino acid sequences of 29 aa at polyprotein position 384–412 (GenBank reference sequence AF01175) were associated with ethnicity data, Caucasian (CA) or Afro-American (AA). To identify relevant viral nt- or aa-based features associated with host ethnicity we applied a correlation feature selection (CFS) method to find subsets with features that have a high correlation to the variable of interest and a low correla- ion between the features. In HCV1a data, the best HVR1 nt-based feature subset (merit=0.26) and aa-based subset (merit=0.20) consisted of 9 nt sites and 6 aa sites, respectively. In HCV1b data, the best nt-based feature subset (merit=0.35) and aa-based subset (merit=0.25) consisted of 13 nt and 8 aa sites, respectively. These findings indicate the association of the ethnicity variable with genetic heterogeneity of certain sets of genomic and polyprotein sites. It also indicates absence of strong correlation between variation at any single site and the ethnicity variable. Therefore, in order to account for interactions and/or dependencies among features in selected subsets, which are associated as a group with host ethnicity, we modeled genetic relationships to ethnicity using Bayesian network classifiers (BNCs). BNC models were initially constructed as naïve Bayesian networks and then were left to learn dependencies among the features. Performance evaluations of BNCs were measured using F-measure and classification accuracy metrics during the training - 10-fold-cross-validation (10xCV) - and testing phases - out-of-sample data (validation). BNCs evaluations were also carried out using 5 datasets generated by random sampling from HCV data where sequences were randomly assigned to ethnicity classes. Remarkable accuracy in performance (10xCV / validation) was observed for the HCV1a BNCs based on 9nt (91.1% / 91.7%) and 6aa features (83.3% / 82.7%). Accuracy of BNCs on randomly labeled data was significantly lower (9nt-BNCRand=60.9% and 6aa-BNCRand=47.3%, avg. accuracy). Similar performances were observed for BNCs constructed from HCV1b data, where accuracy of the classification was further improved by integrating the 13nt and 8aa learned BNCs into a single combined 21 feature BNC construct (96.3% / 90.2%). Average accuracy of the BNCRand was 48.6%. In conclusion, findings in this study suggest that HVR1 sequence variants are st
机译:丙型肝炎病毒(HCV)是非A /非B血源性病毒性肝炎的最常见病因,也是肝移植的主要原因。在美国,被HCV感染的人数估计超过300万人。有7种主要的HCV基因型在世界范围内分布,并进一步分为许多亚基因型。 HCV基因型1a是美国最常见的基因型,其次是基因型1b。两种基因型都是最难治疗的感染。几种与宿主和病毒有关的因素已被确定为HCV慢性(HCH)感染,肝病进展和治疗结果发生的危险因素。我们先前曾报道,在接受联合干扰素和利巴韦林治疗的一组慢性感染患者中,发现某些宿主的人口统计学特征与HCV株的遗传特性有关。在这项研究中,我们将分析扩展到更大的数据集,以进一步探索宿主的种族背景与HCV高变区1(HVR1)的遗传特性之间的关联。 HCV数据包含从国家调查和五次独立的全州范围暴发调查获得的HCV1a和HCV1b宿主内部HVR1变体序列(分别为n = 936和n = 630)。使用衍生自核苷酸(nt)和氨基酸(aa)序列信息的病毒特征检查了HVR1菌株的特性与宿主种族之间的关联。基因组位置1491–1577处的87nt核苷酸序列和多蛋白位置384–412处29 aa的氨基酸序列(GenBank参考序列AF01175)与种族数据,白种人(CA)或美国黑人(AA)相关。为了确定与宿主种族相关的相关的基于病毒nt或aa的特征,我们应用了相关特征选择(CFS)方法来查找具有与目标变量高度相关且特征之间相关性较低的特征的子集。在HCV1a数据中,最佳的基于HVR1 nt的特征子集(优点= 0.26)和基于aa的子集(优点= 0.20)分别由9个nt位点和6个aa位点组成。在HCV1b数据中,最佳的基于nt的特征子集(优点= 0.35)和基于aa的子集(优点= 0.25)分别由13个nt位点和8个aa位点组成。这些发现表明种族变量与某些基因组和多蛋白位点的遗传异质性相关。这也表明在任何单个站点的变异与种族变量之间都没有强烈的相关性。因此,为了考虑所选子集中的要素之间的交互作用和/或依赖性,这些子集中与宿主种族相关联,我们使用贝叶斯网络分类器(BNC)对与种族的遗传关系进行建模。 BNC模型最初被构造为朴素的贝叶斯网络,然后被用来学习特征之间的依赖关系。在训练过程中,使用F度量和分类准确性度量标准测量BNC的性能评估-10倍交叉验证(10xCV)-以及测试阶段-样本外数据(验证)。还使用从HCV数据中随机采样生成的5个数据集进行了BNC评估,其中将序列随机分配给种族类别。对于基于9nt(91.1%/ 91.7%)和6aa特征(83.3%/ 82.7%)的HCV1a BNC,观察到了卓越的性能准确性(10xCV /验证)。 BNC在随机标记数据上的准确性明显较低(平均准确性为9nt-BNCRand = 60.9%和6aa-BNCRand = 47.3%)。从HCV1b数据构建的BNC观察到相似的性能,通过将13nt和8aa获悉的BNC集成到单个组合的21特征BNC构建中,分类的准确性进一步提高(96.3%/ 90.2%)。 BNCRand的平均准确度为48.6%。总之,这项研究的发现表明HVR1序列变异是

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号