...
首页> 外文期刊>BMC Medical Genomics >Privacy-preserving logistic regression training
【24h】

Privacy-preserving logistic regression training

机译:隐私保护逻辑回归训练

获取原文
           

摘要

Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.
机译:Logistic回归是机器学习中用于构建分类模型的流行技术。由于此类模型的构建基于大型数据集的计算,因此将计算外包给云服务是一个很有吸引力的想法。输入数据的隐私敏感特性要求在外包之前采取适当的隐私保护措施。同态加密使人们可以直接对加密数据进行计算,而无需解密,并且可以用来减轻使用云服务引起的隐私问题。在本文中,我们提出了一种在同态加密数据集上训练逻辑回归模型的算法(及其实现)。我们算法的核心是一种新的迭代方法,可以将其视为固定Hessian方法的简化形式,但乘法复杂度要低得多。我们在两个有趣的现实生活应用中测试了该新方法:第一个应用是医学应用,并以基因组数据为输入,构建了一个模型来预测患者患癌症的可能性;第二个应用是金融,模型可以预测信用卡交易被欺诈的可能性。该方法可为两种应用程序产生准确的结果,与在纯文本数据上运行标准算法相当。本文介绍了一种新的简单迭代算法,用于训练适用于同态加密数据集的逻辑回归模型。该算法可以用作构建二进制分类模型的隐私保护技术,并且可以应用于可以通过逻辑回归建模的各种问题。我们的实施结果表明,我们的方法可以处理逻辑回归训练中使用的大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号