首页> 外文期刊>Computer and information science >The Design of Pre-Processing Multidimensional Data Based on Component Analysis
【24h】

The Design of Pre-Processing Multidimensional Data Based on Component Analysis

机译:基于分量分析的多维数据预处理设计

获取原文
获取原文并翻译 | 示例
           

摘要

Increased implementation of new databases related to multidimensional data involving techniques to support efficient query process, create opportunities for more extensive research. Pre-processing is required because of lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding. Several types of pre-processing based on component analysis will be carried out for cleaning, data integration and transformation, as well as to reduce the dimensions. Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent. This paper aims to improve the quality of pre-processed data based on component analysis. RapidMiner is used for data pre-processing using FastICA algorithm. Kernel K-mean is used to cluster the pre-processed data and Expectation Maximization (EM) is used to model. The model was tested using Wisconsin breast cancer datasets, lung cancer datasets and prostate cancer datasets. The result shows that the performance of the cluster vector value is higher and the processing time is shorter.
机译:与涉及支持有效查询过程的技术的多维数据有关的新数据库的实施增加,为更广泛的研究创造了机会。由于缺少数据属性值,嘈杂的数据,错误,不一致或离群值以及编码差异,因此需要进行预处理。将进行基于组件分析的几种预处理,以进行清洗,数据集成和转换以及减小尺寸。成分分析可以通过统计方法完成,目的是将各种数据源分离为独立的统计模式。本文旨在基于组件分析提高预处理数据的质量。 RapidMiner用于使用FastICA算法进行数据预处理。内核K均值用于对预处理数据进行聚类,而期望最大化(EM)用于建模。使用威斯康星州乳腺癌数据集,肺癌数据集和前列腺癌数据集对模型进行了测试。结果表明,聚类向量值的性能较高,处理时间较短。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号