首页> 外文期刊>Journal of Intelligent Information Systems >A distributed architecture for efficient parallelization and computation of knowledge-based temporal abstractions
【24h】

A distributed architecture for efficient parallelization and computation of knowledge-based temporal abstractions

机译:一种分布式架构,可有效地并行化和计算基于知识的时间抽象

获取原文
获取原文并翻译 | 示例
           

摘要

Today, data storage capabilities as well as computational power are rapidly increasing. On the one hand, this improvement makes it possible to generate and store a great amount of temporal (time-oriented) data for future query, analysis and discovery of new knowledge. On the other hand, systems and experts are encountering new problems in processing this increased amount of data. The rapid growth in stored time-oriented data necessitates the development of new methods for handling, processing, and interpreting large amounts of temporal data. One approach is to use an automatic summarization process based on predefined knowledge, such the Knowledge-Based Temporal-Abstraction (KBTA) method. This method enables one to summarize and reduce the amount of raw data by creating higher level interpretations based on predefined domain knowledge. Unfortunately, the task of temporal abstraction is inherently computationally expensive, especially when an enormous volume of multivariate data has to be handled and when complex patterns need to be considered. In this research, we address the scalability problem of a temporal-abstraction task that involves processing significantly large amounts of raw data. We propose a new computational framework, the Distributed KBTA (DKBTA), which efficiently distributes the abstraction process among several parallel computational nodes, in order to achieve an acceptable computation time. The DKBTA framework distributes the temporal-abstraction process along one or more computational axes, each of which enables parallelization of one or more temporal-abstraction tasks into which the main temporal-abstraction task is decomposed,such as by different subject groups, concepts types, or abstraction types. We have implemented the DKBTA framework and have evaluated it in a preliminary fashion in the medical and the information security domains, with encouraging results. In our small-scale evaluation, only distribution along the subjects axis and sometimes along the concept-type axis seemed to consistently enhance performance, and only for computations involving individual subjects and not functions of sets of subjects; but this observation might depend on the number of processing units. Additionally, since the communication between the processing units was based on the TCP protocol, we could not observe any speedup even when using two processing units on the same machine. In our further evaluations we plan to use a shared memory architecture in order to exchange data between processing units.
机译:如今,数据存储功能以及计算能力正在迅速提高。一方面,这种改进使得可以生成和存储大量的时间(面向时间)数据,以供将来查询,分析和发现新知识。另一方面,系统和专家在处理增加的数据量时遇到新的问题。存储的面向时间的数据的快速增长需要开发用于处理,处理和解释大量时间数据的新方法。一种方法是使用基于预定义知识的自动汇总过程,例如基于知识的时间抽象(KBTA)方法。通过基于预定义的领域知识创建更高级别的解释,此方法使人们可以汇总并减少原始数据量。不幸的是,时间抽象的任务本质上在计算上是昂贵的,特别是当必须处理大量的多变量数据并且需要考虑复杂的模式时。在这项研究中,我们解决了涉及处理大量原始数据的时间抽象任务的可伸缩性问题。我们提出了一种新的计算框架,即分布式KBTA(DKBTA),该框架可以将抽象过程有效地分布在多个并行计算节点之间,以实现可接受的计算时间。 DKBTA框架沿着一个或多个计算轴分配时间抽象过程,每个过程轴都可以并行化一个或多个主要抽象时间任务分解成一个或多个时间抽象任务的任务,例如通过不同的主题组,概念类型,或抽象类型。我们已经实施了DKBTA框架,并已在医疗和信息安全领域进行了初步评估,并取得了令人鼓舞的结果。在我们的小规模评估中,仅沿主题轴(有时沿概念类型轴)的分布似乎可以持续提高性能,并且仅用于涉及单个主题而不是主题集功能的计算。但是这种观察可能取决于处理单元的数量。此外,由于处理单元之间的通信基于TCP协议,因此即使在同一台机器上使用两个处理单元,我们也无法观察到任何加速。在我们的进一步评估中,我们计划使用共享内存体系结构以便在处理单元之间交换数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号