首页> 外文期刊>Statistics and computing >Information preserving regression-based tools for statistical disclosure control
【24h】

Information preserving regression-based tools for statistical disclosure control

机译:信息保留基于回归的统计泄露控制工具

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a unified framework for regression-based statistical disclosure control for microdata. A basic method, known as information preserving statistical obfuscation (IPSO), produces synthetic data that preserve variances, covariances and fitted values. The data are then generated conditionally according to the multivariate normal distribution. Generalizations of the IPSO method are described in the literature, and these methods aim to generate data more similar to the original data. This paper describes these methods in a concise and interpretable way, which is close to efficient implementation. Decomposing the residual data into orthogonal scores and corresponding loadings is an essential part of the framework. Both QR decomposition (Gram-Schmidt orthogonalization) and singular value decomposition (principal components) may be used. Within this framework, new and generalized methods are presented. In particular, a method is described by means of which the correlations to the original principal component scores can be controlled exactly. It is shown that a suggested method of random orthogonal matrix masking can be implemented without generating an orthogonal matrix. Generalized methodology for hierarchical categories is presented within the context of microaggregation. Some information can then be preserved at the lowest level and more information at higher levels. The presented methodology is also applicable to tabular data. One possibility is to replace the content of primary and secondary suppressed cells with generated values. It is proposed replacing suppressed cell frequencies with decimal numbers, and it is argued that this can be a useful method.
机译:本文介绍了对Microdata的基于回归的统计披露控制的统一框架。一种基本方法,称为信息保留统计混淆(IPSO),产生了保持差异,协方差和装配值的合成数据。然后根据多变量正态分布条件地生成数据。在文献中描述了IPSO方法的概括,这些方法旨在生成更类似于原始数据的数据。本文以简洁和可解释的方式介绍了这些方法,这是接近有效的实现。将残余数据分解成正交分数和相应的负载是框架的重要组成部分。可以使用QR分解(Gram-Schmidt正交化)和奇异值分解(主成分)。在此框架内,提出了新的和广义方法。特别地,通过该方法描述了可以精确地控制与原始主成分分数的相关性。结果表明,可以在不生成正交矩阵的情况下实现所建议的随机正交矩阵屏蔽的方法。在微识别的背景下呈现了分层类别的广义方法。然后可以在更高级别的最低级别和更多信息中保留一些信息。呈现的方法也适用于表格数据。一种可能性是替换具有生成值的主抑制单元的内容。建议用十进制数替换抑制的细胞频率,并且认为这可以是有用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号