首页> 外文期刊>BMC Medical Informatics and Decision Making >Explaining multivariate molecular diagnostic tests via Shapley values
【24h】

Explaining multivariate molecular diagnostic tests via Shapley values

机译:通过福利值解释多变量分子诊断测试

获取原文
       

摘要

Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. For a multivariate molecular diagnostic test in clinical use (the VeriStrat? test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values. Exact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat? test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values. Shapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.
机译:机器学习(ML)可以是提取来自富含属性的分子数据集的信息的有效工具,用于产生分子诊断测试。然而,从输入数据产生所得到的分数或分类的方式可能不是透明的。算法可解释性或解释性已成为ML研究的焦点。福利值首次在博弈论中引入,可以通过复杂ML算法提供由特定输入数据集产生的结果的解释。对于临床用途的多变量分子诊断试验(Veristrat?测试),我们计算并讨论精确的福利价值的解释。我们还采用了一些标准近似技术的福芙值计算(本地可解释模型 - 不可知解释(石灰)和福利添加剂解释(Shap)的方法),并将结果与​​精确的福音值进行比较。从256名患者的队列中收集的数据计算的确切福谢价值表明,样品的测试分类属性的相对重要性。虽然Veristrat中使用的所有八个功能?测试同样地贡献以对某些样本进行分类,其他样本对分类生成显示了更复杂的属性重要性模式。精确的福利价值和基于福芙的交互度量能够在样本或患者水平提供可解释的分类解释,而患者亚组可以通过比较患者之间的福价概况来定义。石灰和Shap近似方法,即使寻求包括属性之间的相关性,也产生了定量的结果,在某些情况下,与精确的福音值不同。福芙值可用于确定输入属性对单个样品或患者的多变量分子诊断测试产生的结果的相对重要性。由福芙价值概况定义的患者子组可能激发翻译研究。然而,分子数据固有的相关性和可用于分子诊断测试开发的典型小ML训练集可能导致一些近似方法产生与精确的福音值的定性和定量不同的近似的福泽值。因此,在使用近似方法评估分子诊断测试结果的血糖解释时,建议谨慎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号