首页> 中文期刊> 《情报学报》 >结合LSTM和CNN混合架构的深度神经网络语言模型

结合LSTM和CNN混合架构的深度神经网络语言模型

         

摘要

语言模型是自然语言处理研究中的基础性工作,是计算机识别与理解自然语言的桥梁,是人工智能学科的前沿及热点课题.其在语音识别、机器翻译、信息检索和知识图谱等领域都有着广泛的应用.至今,语言模型已经历了从统计模型、神经网络模型到深度神经网络模型的衍化.随着深度学习技术的广泛应用,采用大规模的数据集、复杂的模型以及高昂的训练代价成为语言模型建模的特点.本文通过模型输入拟人化、卷积神经网络(convolutional neural network)编码以及融合门机制并结合长短时记忆单元(long short-term memory,LSTM)优化了语言模型,提出了结合LSTM和CNN混合架构的深度神经网络语言模型(Gated CLSTM).利用深度学习框架Tensorflow实现了Gated CLSTM.实验环节还采用了负采样及循环投影层等经典的优化技术,在包含近十亿个英文单词的通用数据集(one billion word benchmark)下测试了模型的性能,分别训练了单层模型和三层模型,以观察网络深度对性能的影响.结果显示,在四个 GPU 的单机环境下,单层模型经过4天的训练,将模型混淆度(perplexity)降低至42.1;三层模型经过6天的训练后将混淆度降低至33.1;与多个典型的基准模型相比,综合硬件、时间复杂度以及混淆度三个指标,Gated CLSTM获得了明显的改进.%The language model is one of the most important domains in natural language processing. It is a bridge for the computer to identify and comprehend human language, and it is also a sign of Artificial Intelligence development. The language model is popular in Speech Recognition, Machine Translation, Information Retrieval, and Knowledge Mapping. With the rapid expansion of technology and hardware, the language model has experienced a transformation from statistical model to neural network model and then to the deep neural network model. The wide application of depth learning makes language modeling more extensive, complex, and expensive. This paper combines the person-alized input, convolutional neural network (CNN) coding, and the technique of union gate, cooperating with long short-term memory (LSTM) mechanism to improve the language model. The dynamic integration of LSTM and CNN is called Gated CLSTM. In the experiment, we used the deep learning framework Tensorflow to achieve a Gated GLSTM architecture. Besides, some classical optimization techniques, such as noise contrastive estimation and re-current projection layer, were adopted in the experiment. We tested the performance of the Gated CLSTM under an open and big scale corpus set and trained a signal-layer model and a three-layer model to observe how network depth influences the performance. The single-layer model has 4 days of training experience and reduced the perplexity to 42.1 in four GPU console environment. The three-layer model reduced the perplexity to 33.1 in 6 days. Compared with some classical benchmark models, significant improvements have been made by Gated CLSTM considering both hardware and time complexity and perplexity.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号