首页> 外文期刊>Journal of biomedical informatics. >A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients
【24h】

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

机译:一种新的基于聚类的过采样方法,可改善肝细胞癌患者的生存预测

获取原文
获取原文并翻译 | 示例
           

摘要

Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models. (C) 2015 Elsevier Inc. All rights reserved.
机译:肝癌是第六个最常被诊断出的癌症,尤其是肝细胞癌(HCC)占原发性肝癌的90%以上。考虑到个体之间的生物学差异,临床医生会根据循证医学评估每位患者的治疗方法,而循证医学可能并不总是适用于特定患者。多年来,对于肝细胞癌的特殊情况,一些研究正在开发策略,以协助临床医生进行决策,使用计算方法(例如机器学习技术)从临床数据中提取知识。但是,这些研究存在一些尚未解决的局限性:一些研究不完全针对肝细胞癌患者,另一些则有严格的应用范围,没有人考虑患者之间的异质性或缺少数据,这是医疗保健领域的常见缺陷。 。在这项工作中,研究了一个由异质临床特征组成的真正复杂的肝细胞癌数据库。我们提出了一种新的基于聚类的过采样方法,该方法对较小且不平衡的数据集具有鲁棒性,从而解决了肝细胞癌患者的异质性。这项工作的预处理程序基于数据插补,考虑了异类和缺失数据(HEOM)的适当距离度量,并进行聚类研究以评估研究数据集中的基础患者组(K-means)。应用最终方法是为了减小具有减小尺寸的潜在患者概况对生存预测的影响。它基于K-means聚类和SMOTE算法来构建代表性数据集,并将其用作针对不同机器学习程序(逻辑回归和神经网络)的训练示例。根据生存预测评估结果,并在不考虑使用弗里德曼等级检验进行聚类和/或过采样的基线方法之间进行比较。我们提出的方法与神经网络相结合,性能优于所有其他方法,这表明对肝细胞癌预测模型中当前使用的经典方法进行了改进。 (C)2015 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号