首页> 外文会议>IEEE Conference on Computer Communications Workshops >InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy
【24h】

InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy

机译:InPrivate Digging:启用具有差异性隐私的基于树的分布式数据挖掘

获取原文

摘要

Data mining has heralded the major breakthrough in data analysis, serving as a “super cruncher” to discover hidden information and valuable knowledge in big data systems. For many applications, the collection of big data usually involves various parties who are interested in pooling their private data sets together to jointly train machine-learning models that yield more accurate prediction results. However, data owners may not be willing to disclose their own data due to privacy concerns, making it imperative to provide privacy guarantee in collaborative data mining over distributed data sets. In this paper, we focus on tree-based data mining. To begin with, we design novel privacy-preserving schemes for two most common tasks: regression and binary classification, where individual data owners can perform training locally in a differentially private manner. Then, for the first time, we design and implement a privacy-preserving system for gradient boosting decision tree (GBDT), where different regression trees trained by multiple data owners can be securely aggregated into an ensemble. We conduct extensive experiments to evaluate the performance of our system on multiple real-world data sets. The results demonstrate that our system can provide a strong privacy protection for individual data owners while maintaining the prediction accuracy of the original trained model.
机译:数据挖掘预示了数据分析的重大突破,它可以作为“超级研究者”来发现大数据系统中的隐藏信息和有价值的知识。对于许多应用程序来说,大数据的收集通常涉及有兴趣将其私有数据集合并在一起以共同训练机器学习模型的各方,这些模型可产生更准确的预测结果。但是,由于隐私问题,数据所有者可能不愿意公开自己的数据,因此必须在分布式数据集的协作数据挖掘中提供隐私保证。在本文中,我们专注于基于树的数据挖掘。首先,我们为两种最常见的任务设计了新颖的隐私保护方案:回归和二进制分类,其中单个数据所有者可以以差异私有的方式在本地进行培训。然后,我们首次设计并实现了用于梯度提升决策树(GBDT)的隐私保护系统,在该系统中,可以将由多个数据所有者训练的不同回归树安全地聚合到一个集合中。我们进行了广泛的实验,以评估我们的系统在多个真实数据集上的性能。结果表明,我们的系统可以为单个数据所有者提供强大的隐私保护,同时保持原始训练模型的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号