首页> 外文期刊>Sao Paulo Medical Journal >Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
【24h】

Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study

机译:比较机器学习算法以建立检测未诊断糖尿病的预测模型-ELSA-Brasil:准确性研究

获取原文
           

摘要

CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, na?ˉve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. ?-These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
机译:背景与目的:2型糖尿病是一种慢性疾病,伴有多种严重的健康并发症,会对整体健康产生重大影响。这里的目的是使用成人健康纵向研究(ELSA-Brasil)的数据开发和验证用于检测未诊断的糖尿病的预测模型,并比较此任务中不同机器学习算法的性能。设计与设置:比较机器学习算法以使用来自ELSA-Brasil的数据开发预测模型。方法:从文献中选择27个候选变量的子集后,通过四个连续步骤建立并验证模型:(i)具有十倍交叉验证的参数调整,重复3次; (ii)使用正向选择,具有四种不同机器学习算法的包装器策略和十倍交叉验证(重复三次)的自动变量选择来评估变量的每个子集; (iii)重复十次交叉验证的模型参数的误差估计; (iv)在独立数据集上进行泛化测试。这些模型是通过以下机器学习算法创建的:逻辑回归,人工神经网络,朴素贝叶斯,K近邻和随机森林。结果:最佳模型是使用人工神经网络和逻辑回归创建的。 ε-这些曲线下的平均面积在误差估计步骤中分别为75.24%和74.98%,在泛化测试步骤中分别为74.17%和74.41%。结论:大多数预测模型产生相似的结果,并通过容易获得的临床数据证明了鉴定出未诊断出糖尿病的最高可能性的个体的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号