首页> 外国专利> NEURAL NETWORK SYSTEM, NEURAL NETWORK TRAINING METHOD, AND NEURAL NETWORK TRAINING PROGRAM

NEURAL NETWORK SYSTEM, NEURAL NETWORK TRAINING METHOD, AND NEURAL NETWORK TRAINING PROGRAM

机译：神经网络系统，神经网络训练方法和神经网络训练计划

页面导航

摘要
著录项
相似文献

摘要

The present invention improves a throughput of data-parallel distributed training. This neural network system comprises a memory, and a plurality of processors that access the memory, wherein, in each of a plurality of trainings, each of the plurality of processors: executes a calculation of a neural network on the basis of an input of training data and parameters in the neural network, and calculates an output of the neural network; and calculates a gradient or an update amount based on the gradient for parameters of the difference between the calculated output and teacher data of the training data, wherein (1) in a first case where the accumulation of the gradient or the update amount is not smaller than a threshold, the plurality of processors execute a first update processing by respectively transmitting the accumulations of a plurality of the calculated gradients or update amounts to other processors among the plurality of processors to integrate the accumulations of the gradients or the update amounts, receiving the integrated accumulations of the gradients or the update amounts, and updating the parameters with the integrated accumulations of the gradients or the update amounts, and (2) in a second case where the accumulation of the gradient or the update amount is smaller than the threshold, the plurality of processors execute a second update processing by not integrating the plurality of accumulations of the gradients or the update amounts by the transmission, but respectively updating the parameters with the calculated gradients or update amounts.

机译：本发明提高了数据并行分布式训练的吞吐量。该神经网络系统包括存储器，以及访问存储器的多个处理器，其中，在多个培训中的每一个中，多个处理器中的每一个：基于训练的输入执行神经网络的计算神经网络中的数据和参数，并计算神经网络的输出;基于校验量数据的计算输出和教师数据之间的差异的渐变来计算梯度或更新量，其中（1）在梯度累积或更新量的累积不小的情况下多个处理器通过将多个计算出的梯度或更新量的累积分别执行多个处理器中的其他处理器的累积来执行第一更新处理，以集成梯度的累积或更新量，接收渐变或更新金额的集成累加，以及更新具有渐变累计的集成累积或更新量的参数，以及（2）在梯度累积或更新量小于阈值的第二种情况下，多个处理器通过不集成多个T的累积来执行第二更新处理他的渐变或更新金额由传输，但分别使用计算出的渐变或更新量更新参数。

著录项

公开/公告号WO2021140643A1

专利类型
公开/公告日2021-07-15

原文格式PDF
申请/专利权人 FUJITSU LIMITED;
展开▼

申请/专利号WO2020JP00644
发明设计人 DANJO TAKUMI;
展开▼

申请日2020-01-10
分类号G06N3/08;
国家 JP
入库时间 2022-08-24 19:58:23

相似文献

专利
外文文献
中文文献