首页> 外文期刊>Statistics and computing >Information preserving regression-based tools for statistical disclosure control
【24h】

Information preserving regression-based tools for statistical disclosure control

机译:基于信息保留回归的统计信息披露控制工具

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a unified framework for regression-based statistical disclosure control for microdata. A basic method, known as information preserving statistical obfuscation (IPSO), produces synthetic data that preserve variances, covariances and fitted values. The data are then generated conditionally according to the multivariate normal distribution. Generalizations of the IPSO method are described in the literature, and these methods aim to generate data more similar to the original data. This paper describes these methods in a concise and interpretable way, which is close to efficient implementation. Decomposing the residual data into orthogonal scores and corresponding loadings is an essential part of the framework. Both QR decomposition (Gram-Schmidt orthogonalization) and singular value decomposition (principal components) may be used. Within this framework, new and generalized methods are presented. In particular, a method is described by means of which the correlations to the original principal component scores can be controlled exactly. It is shown that a suggested method of random orthogonal matrix masking can be implemented without generating an orthogonal matrix. Generalized methodology for hierarchical categories is presented within the context of microaggregation. Some information can then be preserved at the lowest level and more information at higher levels. The presented methodology is also applicable to tabular data. One possibility is to replace the content of primary and secondary suppressed cells with generated values. It is proposed replacing suppressed cell frequencies with decimal numbers, and it is argued that this can be a useful method.
机译:本文提出了一个统一的框架,用于基于回归的微数据统计披露控制。一种基本方法,称为信息保存统计模糊处理(IPSO),可生成保留方差,协方差和拟合值的综合数据。然后根据多元正态分布有条件地生成数据。 IPSO方法的一般性描述于文献中,这些方法旨在生成与原始数据更相似的数据。本文以简明易懂的方式描述了这些方法,这接近有效实现。将残差数据分解为正交分数和相应的负荷是框架的重要组成部分。可以同时使用QR分解(Gram-Schmidt正交化)和奇异值分解(主要成分)。在此框架内,提出了新的通用方法。特别地,描述了一种方法,通过该方法可以精确地控制与原始主成分得分的相关性。结果表明,可以在不产生正交矩阵的情况下实现建议的随机正交矩阵掩蔽方法。在微聚合的背景下,提出了用于层次类别的通用方法。然后可以将某些信息保留在最低级别,而将更多信息保留在更高级别。所提出的方法也适用于表格数据。一种可能性是用生成的值替换初级和次级抑制单元格的内容。建议用十进制数代替抑制的单元频率,并认为这可能是一种有用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号