...
首页> 外文期刊>Industrial Engineering and Management >A Big Data Knowledge Computing Platform for Intelligence Studies- Wen Yi, Chinese Academy of Sciences, China
【24h】

A Big Data Knowledge Computing Platform for Intelligence Studies- Wen Yi, Chinese Academy of Sciences, China

机译:中国科学院文学智商研究的大数据知识计算平台 - 中国科学院

获取原文
           

摘要

Intelligence studies is a method of using modern information technology and soft science research methods to form valuable information products by collecting, selecting, evaluating and synthesizing information resources. With the advent of the era of big data, the core work of information analysis with data is facing enormous opportunities and challenges. How to make good use of big data in an effort to solve the problem of big data, optimize and improve the traditional intelligence studies methods and tools, innovation and research based on big data are the key issues that need to be studied and solved in current intelligence studies work.Through the analysis of intelligence studies methods and common tools under the background of big data, we sort out the processes and requirements of the intelligence studies work under big data environment, design and implement a universal knowledge computing platform for intelligence studies, which enables intelligence analysts to easily use all kinds of big data analysis algorithms without writing programs (http://www.zhiyun.ac.cn). Our platform is built upon the open source big data system of Hadoop and Spark. All the data are stored in the distributed file system HDFS and data management system of Hive. All of the computational resources are managed with Yarn and each of the submitted task is scheduled with the workflow scheduler system Oozie.The core of the platform consists of three modules: data management, data calculation and data visualization.The data management module is used to store and manage the relevant data of intelligence studies, which consists of four parts: metadata management, data connection, data integration and data management. The platform supports the import and management of multi-source heterogeneous data, including papers, patents from ISI, PubMed, etc., and also supports the data import with API of MySQL, Hive and other database systems. The platform has more than 20 kinds of data cleaning and updating rules, such as search and replace, regular cleaning, null filling, etc., and also supports users to customize and edit the cleaning rules.The data calculation module is used to store and manage the big data analysis algorithm and intelligence analysis process, and provides a user-friendly GUI for users to create customized intelligence analysis process, and the packaged process can be submitted to the platform for calculation and obtain the calculation results of each step. In the system, a task is formulated as a directed acyclic graph (DAG) in which the source data flows into the root nodes. Each node makes operations on the data, generates new data, and sends the generated data to its descendant nodes for conducting further operations. Finally, the results flow out from the leaf nodes. The data visualization module is used to visualize the results of intelligence analysis and calculation, including more than ten kinds of visualization charts such as line chart, histogram chart, radar chart and word cloud chart.Practice has proved that the platform can well meet the requirements of intelligence studies in various fields in the era of big data, and promote the application of data mining and knowledge discovery in the field of intelligence studies.
机译:智能研究是一种使用现代信息技术和软化科学研究方法的方法,通过收集,选择,评估和综合信息资源来形成有价值的信息产品。随着大数据时代的出现,信息分析的核心工作与数据面临巨大的机遇和挑战。如何利用大数据的努力解决大数据的问题,优化和改善传统的智能研究方法和工具,基于大数据的创新和研究是需要研究和解决的关键问题智力研究工作。关注智能研究的分析和普通工具在大数据的背景下,我们对大数据环境下的智能研究的过程和要求整理了智能研究的过程和要求,设计和实施了智力研究的普遍知识计算平台,这使得智能分析师能够轻松使用各种大数据分析算法而无需编写程序(http://www.zhiyun.ac.cn)。我们的平台建立在Hadoop和Spark的开源大数据系统之上。所有数据都存储在分布式文件系统HDFS和Hive数据管理系统中。所有计算资源都是用纱线管理的,每个提交的任务都安排使用工作流程调度器系统Oozie。该平台的核心由三个模块组成:数据管理,数据计算和数据可视化。数据管理模块用于存储和管理智能研究的相关数据,包括四个部分:元数据管理,数据连接,数据集成和数据管理。该平台支持多源异构数据的导入和管理,包括论文,ISI,PubMed等的专利,并支持MySQL,Hive和其他数据库系统API的数据导入。该平台具有20多种数据清理和更新规则,例如搜索和替换,定期清洁,空填充等,并支持用户自定义和编辑清洁规则。数据计算模块用于存储和存储和管理大数据分析算法和智能分析过程,为用户提供用户友好的GUI来创建自定义智能分析过程,并且可以将打包过程提交给计算结果,并获得每个步骤的计算结果。在系统中,任务被配制为定向的非循环图(DAG),其中源数据流入根节点。每个节点对数据进行操作,生成新数据,并将所生成的数据发送到其后代节点以进行进一步的操作。最后,结果从叶节点流出。数据可视化模块用于可视化智能分析和计算结果,包括十多种可视化图表,如界图,直方图图表,雷达图和单词云图表.Practice已证明该平台可以很好地满足要求大数据时代各地区的智力研究,促进数据挖掘与知识发现在智力研究领域的应用。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号