首页> 中文期刊> 《西北工业大学学报》 >流式大数据下随机森林方法及应用

流式大数据下随机森林方法及应用

         

摘要

流式计算形态下的大数据分析一直是当前需要解决的问题,而且研究成果和实践经验较少。随机森林方法是目前应用较多的分类算法,但在流式计算应用场景中,数据所呈现出来的实时性、易失性、无序性等特征会使得算法准确度逐渐降低。针对这个问题,分析了随机森林的算法特点,提出了根据决策树的准确度进行随机森林剪枝的思路。同时为了适应数据的变化,结合准确度间隔的概念提出生成、验证并补充新决策树的方法,最终形成可以不断随数据更新的随机森林,满足流式大数据环境对算法的要求。使用实际数据对改进后方法的可行性进行了验证,证明新方法在真实流式大数据场景中有着更高的分类准确度,最后分析讨论了随机森林方法如何进一步研究改进的主题。%Stream computing is an important form of big data computing. Random forest method is one of the most widely applied classification algorithms at present. From the actual requirements, random forest method faces not only huge number of features but also constantly changing data pattern over time. The accuracy of a random forest algorithm without self renewal and adaptive algorithm will gradually reduce over time. Aiming at this problem, this paper analyzes the characteristics of random forest algorithm, gives a new pruning idea according to the accuracy of the decision trees. In order to adapt to the change of data, a new random method based on margin is presented. This new method can update itself constantly and can be applied in stream big data environments. Using the actual data, the new method is verified has higher accuracy in classification, and analysis and discussion of how to further re⁃search and improve the random forest method in big data environment.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号