The GDELT is a real time database of global human society for open research which monitors the world's broadcast,print,and web news since 1979,creating a free open platform for computing on the entire world.First,we designed and realized a data collector,which collects metadata of GDELT database in real time and stores them in a Hadoop Distributed File System (HDFS).Then,we proposed a hash-based method to correlate Event tables,Mentions tables and GKG tables in GDELT,in order to digest every detailed information of each event.Finally,we took South Korea as example to make spatiotemporal visualization analysis,such as Event Spatiotemporal Heat Map,Distribution of Media Attention and Event Extraction Confidence Dot Map.This all will provide a new perspective and solution for further research.%GDELT是一个实时、开源、全球性的社会事件新闻数据库,它收录全世界从1979年至今的网络、纸质新闻报道中抽取的事件.本文设计并实现了一个GDELT数据库采集器,实时采集GDELT的元数据,并将其存储在HDFS分布式文件系统中;提出了一种基于哈希的方法在Spark中对GDELT的三个主要数据表进行快速的连接操作,得到GDELT联合数据集,以便充分挖掘每个事件的详细信息;最后,以韩国地区的GDELT联合数据集为例进行了时空可视化分析,如事件热度的区域时间分布、关注媒体分布、抽取结果的置信度分布等.为情报科技工作者及相关人员提供了一个新的角度和解决方案.
展开▼