首页> 外文期刊>Journal of Data and Information Science >Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems
【24h】

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

机译:基于分布式内存大数据系统的多方面增量张量分解

获取原文
           

摘要

Purpose We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors. Design/methodology/approach Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform. Findings The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost. Research limitations There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data. Practical implications The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor. Originality/value The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.
机译:目的我们提出Inparten2,一种基于Apache Spark框架的多谱并行因子分析三维张量分解算法。所提出的方法可降低重新分解成本并且可以处理大张量。设计/方法/方法考虑到张量添加沿着所有轴的给定张量的尺寸,所提出的方法使用现有的分解结果分解传入的张量而不产生子张力。此外,Inparten2避免计算Khari-Rao产品,并通过使用Apache Spark平台最大限度地减少洗机。发现通过将其执行时间和准确性与各种数据集上的现有分布式张量分解方法的执行时间和准确度进行比较来评估INParten2的性能。结果证实,Inparten2可以处理大张量并降低张量分解的重新计算成本。因此,所提出的方法比现有的张量分解算法快,并且可以显着降低重新分解成本。研究限制有几个基于Hadoop的分布式张量分解算法以及基于Matlab的分解方法。然而,前者需要更长的迭代时间,因此不能与基于火花的算法的执行时间进行比较,而后者在单个机器上运行,因此限制了它们处理大数据的能力。实际意义,当通过基于现有分解结果分解它们而不重新分解整个张量时,所提出的算法可以减少重新分解成本。原始性/值建议的方法可以处理大张量并且在Apache Spark的有限内存框架内快速。此外,Inparten2可以处理静态以及增量张量分解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号