...
首页> 外文期刊>SIGKDD explorations >Scaling Big Data Mining Infrastructure: The Twitter Experience
【24h】

Scaling Big Data Mining Infrastructure: The Twitter Experience

机译:扩展大数据挖掘基础架构:Twitter体验

获取原文
获取原文并翻译 | 示例
           

摘要

The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this paper, we discuss the evolution of our infrastructure and the development of capabilities for data mining on "big data". One important lesson is that successful big data mining in practice is about much more than what most academics would consider data mining: life "in the trenches" is occupied by much preparatory work that precedes the application of data mining algorithms and followed by substantial effort to turn preliminary models into robust solutions. In this context, we discuss two topics: First, schemas play an important role in helping data scientists understand petabyte-scale data stores, but they're insufficient to provide an overall "big picture" of the data available to generate insights. Second, we observe that a major challenge in building data analytics platforms stems from the heterogeneity of the various components that must be integrated together into production workflows - we refer to this as "plumbing". This paper has two goals: For practitioners, we hope to share our experiences to flatten bumps in the road for those who come after us. For academic researchers, we hope to provide a broader context for data mining in production environments, pointing out opportunities for future work.
机译:在过去的几年中,Twitter的分析平台在规模,复杂性,用户数量和各种用例方面都经历了巨大的增长。在本文中,我们讨论了基础架构的发展以及“大数据”上数据挖掘功能的发展。一个重要的教训是,实践中成功的大数据挖掘远远超过了大多数学者所认为的数据挖掘:“战life中的生活”是在应用数据挖掘算法之前进行的大量准备工作,随后进行了大量工作来将初步模型转化为可靠的解决方案。在这种情况下,我们讨论了两个主题:首先,模式在帮助数据科学家理解PB级数据存储中起着重要作用,但是它们不足以提供可用于生成洞察力的数据的整体“概图”。其次,我们观察到构建数据分析平台的主要挑战来自必须集成到生产工作流程中的各种组件的异构性-我们将其称为“管道”。本文有两个目标:对于从业者,我们希望与我们分享经验,为那些追随我们的人们扫平道路上的坎bump。对于学术研究人员,我们希望为生产环境中的数据挖掘提供更广阔的环境,并指出未来的工作机会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号