首页> 外文会议>Second International Conference on Data Mining, 2nd >Data scale reduction via instances summarization using the Rough Set Theory
【24h】

Data scale reduction via instances summarization using the Rough Set Theory

机译:使用粗糙集理论通过实例汇总减少数据规模

获取原文
获取原文并翻译 | 示例

摘要

Actually, the major obstacle encountered when applying Data Mining algorithms to real life data is the incapacity of these algorithms to handle very large data such as those stored in industrial databases. Developing new algorithm which require less memory and processing time will certainly help to solve this problem. But we followed here another way to solution, the reduction of the size of input data. We present in this article our new system CFSumm, which is dedicated to data summarization considered as a pre-process step before the use of a Data Mining Tool. The basic idea of this method is to summarize several instances sufficiently similar by a weighted pseudo-instance which can replace them for further processes. We explain in this article how the α-Rough Set Theory framework allows a great flexibility in the summarization process. We also expose some experimental results obtained on data with real life size, which demonstrate the quality of the summary obtained and the high scalability of our method.
机译:实际上,将数据挖掘算法应用于现实生活数据时遇到的主要障碍是这些算法无法处理非常大的数据,例如存储在工业数据库中的数据。开发需要更少内存和处理时间的新算法无疑将有助于解决此问题。但是我们在这里采用了另一种解决方案,即减小输入数据的大小。我们在本文中介绍了我们的新系统CFSumm,该系统专用于数据汇总,被视为使用数据挖掘工具之前的预处理步骤。该方法的基本思想是通过加权伪实例总结几个足够相似的实例,这些实例可以替换它们以进行进一步的处理。我们将在本文中解释α粗糙集理论框架如何在汇总过程中提供极大的灵活性。我们还公开了从具有真实大小的数据中获得的一些实验结果,这些结果证明了所获得摘要的质量以及我们方法的高度可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号