Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

Hye-Kyung Yang; Hwan-Seung Yong

首页> 外文期刊>Journal of Data and Information Science >Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

【24h】

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

机译：基于分布式内存大数据系统的多方面增量张量分解

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors. Design/methodology/approach Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform. Findings The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost. Research limitations There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data. Practical implications The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor. Originality/value The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.

机译：目的我们提出Inparten2，一种基于Apache Spark框架的多谱并行因子分析三维张量分解算法。所提出的方法可降低重新分解成本并且可以处理大张量。设计/方法/方法考虑到张量添加沿着所有轴的给定张量的尺寸，所提出的方法使用现有的分解结果分解传入的张量而不产生子张力。此外，Inparten2避免计算Khari-Rao产品，并通过使用Apache Spark平台最大限度地减少洗机。发现通过将其执行时间和准确性与各种数据集上的现有分布式张量分解方法的执行时间和准确度进行比较来评估INParten2的性能。结果证实，Inparten2可以处理大张量并降低张量分解的重新计算成本。因此，所提出的方法比现有的张量分解算法快，并且可以显着降低重新分解成本。研究限制有几个基于Hadoop的分布式张量分解算法以及基于Matlab的分解方法。然而，前者需要更长的迭代时间，因此不能与基于火花的算法的执行时间进行比较，而后者在单个机器上运行，因此限制了它们处理大数据的能力。实际意义，当通过基于现有分解结果分解它们而不重新分解整个张量时，所提出的算法可以减少重新分解成本。原始性/值建议的方法可以处理大张量并且在Apache Spark的有限内存框架内快速。此外，Inparten2可以处理静态以及增量张量分解。

著录项

来源
《Journal of Data and Information Science》 |2020年第2期|共20页
作者
Hye-Kyung Yang; Hwan-Seung Yong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
PARAFACTensor decompositionIncremental tensor decompositionApache SparkBig data;

机译：parafactensor分解incmental tensor decompositionApache sparkbig数据;

相似文献

外文文献
中文文献
专利

1. Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems [J] . Hye-Kyung Yang, Hwan-Seung Yong 数据与情报科学学报：英文版 . 2020,第002期

机译：基于分布式内存大数据系统的多方面增量张量分解
2. Incremental QR-based tensor-train decomposition for industrial big data [J] . Chen Yanping, Jin Xiaodong, Xia Hong, 中国邮电高校学报（英文版） . 2021,第001期

机译：基于QR的基于QR的Tensor-Trous Trantupositure为工业大数据
3. Evaluation of SQL benchmark for distributed in-memory Database Management Systems [J] . Oleg Borisenko, David Badalyan International journal of computer science and network security . 2018,第10期

机译：评估分布式内存数据库管理系统的SQL基准
4. DisMASTD: An Efficient Distributed Multi-Aspect Streaming Tensor Decomposition [C] . Keyu Yang, Yunjun Gao, Yifeng Shen, International Conference on Data Engineering . 2021

机译：拆除：高效的分布式多方面流式传输张量分解
5. Optimization of Block-Based Tensor Decompositions through Sub-Tensor Impact Graphs and Applications to Dynamicity in Data and User Focus [D] . Huang, Shengyu. 2021

机译：通过子张量冲击图和应用于数据和用户焦点的动态性的基于块的张量分解的优化
6. iSPEED: a Scalable and Distributed In-Memory Based Spatial Query System for Large and Structurally Complex 3D Data [O] . Hoang Vo, Yanhui Liang, Jun Kong, -1

机译：iSPEED：适用于大型且结构复杂的3D数据的可扩展的分布式基于内存的空间查询系统
7. Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems [O] . Hye-Kyung Yang, Hwan-Seung Yong 2020

机译：基于分布式内存大数据系统的多方面增量张量分解
8. Viewcache: An Incremental Pointer-Base Access Method for Distributed Databases.Part 1: The Universal Index System Design Document. Part 2: The Universal Index System Low-Level Design Document. Part 3: User's Guide. Part 4: Reference Manual. [R] . Kelley, S., Roussopoulos, N., Sellis, T. 1992

机译：Viewcache：分布式数据库的增量指针式访问方法。第1部分：通用索引系统设计文档。第2部分：通用指数系统低级设计文件。第3部分：用户指南。第4部分：参考手册。

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

摘要

著录项

相似文献

相关主题

期刊订阅