In order to solve the problem of massive pieces of information on micro-blogs, this paper studies the centralization theory-based hotspot discovery methods for micro-blogs, in consideration of the features of micro-blogging content such as short text, variety of sources and diverse means of dissemination. Through the structured metadata acquired from open APIs, some metadata models for micro-blogging content are analyzed, and the hotspot discovery process is regarded as a value-added process of the original materials to clusters of hot products. For initial and deep processing methods during the production process, some data pre-processing techniques as well as short text clustering-based and disseminating path and users behavior-based centralizing techniques are proposed. And a complete production and processing model is established. Finally, a series of experiments have verified the theoretical achievement.%以解决微博平台海量信息碎片为切入点,结合微博信息文本短小、来源广泛、传播方式多样等特点,设计基于中心化的微博热点的发现机制.通过微博平台开放API记录的结构化元数据信息,设计微博的元数据模型,将微博热点发现看作是原始语料到热点语料簇的生产加工增值过程,设计以数据预处理技术为核心的语料初加工方法,以及基于短文本聚类、基于传播路径与用户行为的中心化深加工方法,构建完整的生产加工过程模型,并通过实例验证理论研究成果.
展开▼