首页> 外文期刊>BMC Medical Genomics >Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
【24h】

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

机译:通过内核最大平均差异和信息熵鉴定肺癌基因标志物

获取原文
           

摘要

The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.
机译:肺癌的早期诊断是临床实践中的关键问题,并且鉴定差异表达基因作为疾病标记是一个有前途的解决方案。然而,最现有的基因差异表达分析(DEA)方法具有两个主要缺点:首先,这些方法基于固定的统计假设,并不总是有效的;其次,当控制和实验组之间没有明显的表达水平差距时,这些方法无法识别某种表达水平边界。本文提出了一种鉴定肺癌标志物基因和基因表达水平边界的新方法。通过计算核最大均值差异,我们的方法可以评估正常,正常与肿瘤(NAT)和肿瘤样品之间的表达差异。对于潜在的标记基因,用信息熵方法定义不同组之间的表达水平边界。与两种常规方法T检验和折叠变化相比,我们方法选择的顶部平均排名基因可以在10倍交叉验证中的所有度量下实现更好的性能。然后进行GU和KEGG浓缩分析以探讨前100个排名基因的生物学功能。最后,我们选择前10个平均排名基因作为肺癌标记,并计算它们的表达边界。该方法是有效鉴定肺癌诊断的基因标志物。它不仅比常规DEA方法更准确,而且还提供了可靠的方法来鉴定基因表达水平边界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号