结合LSTM和CNN混合架构的深度神经网络语言模型

王毅; 谢娟; 成颖

首页> 中文期刊> 《情报学报》 >结合LSTM和CNN混合架构的深度神经网络语言模型

结合LSTM和CNN混合架构的深度神经网络语言模型

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

语言模型是自然语言处理研究中的基础性工作,是计算机识别与理解自然语言的桥梁,是人工智能学科的前沿及热点课题.其在语音识别、机器翻译、信息检索和知识图谱等领域都有着广泛的应用.至今,语言模型已经历了从统计模型、神经网络模型到深度神经网络模型的衍化.随着深度学习技术的广泛应用,采用大规模的数据集、复杂的模型以及高昂的训练代价成为语言模型建模的特点.本文通过模型输入拟人化、卷积神经网络(convolutional neural network)编码以及融合门机制并结合长短时记忆单元(long short-term memory,LSTM)优化了语言模型,提出了结合LSTM和CNN混合架构的深度神经网络语言模型(Gated CLSTM).利用深度学习框架Tensorflow实现了Gated CLSTM.实验环节还采用了负采样及循环投影层等经典的优化技术,在包含近十亿个英文单词的通用数据集(one billion word benchmark)下测试了模型的性能,分别训练了单层模型和三层模型,以观察网络深度对性能的影响.结果显示,在四个 GPU 的单机环境下,单层模型经过4天的训练,将模型混淆度(perplexity)降低至42.1;三层模型经过6天的训练后将混淆度降低至33.1;与多个典型的基准模型相比,综合硬件、时间复杂度以及混淆度三个指标,Gated CLSTM获得了明显的改进.%The language model is one of the most important domains in natural language processing. It is a bridge for the computer to identify and comprehend human language, and it is also a sign of Artificial Intelligence development. The language model is popular in Speech Recognition, Machine Translation, Information Retrieval, and Knowledge Mapping. With the rapid expansion of technology and hardware, the language model has experienced a transformation from statistical model to neural network model and then to the deep neural network model. The wide application of depth learning makes language modeling more extensive, complex, and expensive. This paper combines the person-alized input, convolutional neural network (CNN) coding, and the technique of union gate, cooperating with long short-term memory (LSTM) mechanism to improve the language model. The dynamic integration of LSTM and CNN is called Gated CLSTM. In the experiment, we used the deep learning framework Tensorflow to achieve a Gated GLSTM architecture. Besides, some classical optimization techniques, such as noise contrastive estimation and re-current projection layer, were adopted in the experiment. We tested the performance of the Gated CLSTM under an open and big scale corpus set and trained a signal-layer model and a three-layer model to observe how network depth influences the performance. The single-layer model has 4 days of training experience and reduced the perplexity to 42.1 in four GPU console environment. The three-layer model reduced the perplexity to 33.1 in 6 days. Compared with some classical benchmark models, significant improvements have been made by Gated CLSTM considering both hardware and time complexity and perplexity.

著录项

来源
《情报学报》 |2018年第2期|194-205|共12页
作者
王毅; 谢娟; 成颖;
展开▼
作者单位

南京大学信息管理学院;

南京 210023;

南京大学信息管理学院;

南京 210023;

南京大学信息管理学院;

南京 210023;

展开▼
原文格式 PDF
正文语种 chi
中图分类
关键词
语言模型; 循环神经网络; 卷积神经网络; 字符序列编码;

相似文献

中文文献
外文文献
专利

1. CNN-LSTM深度神经网络在滚动轴承故障诊断中的应用 [J] . 陈保家 ,陈学力 ,沈保明 . 西安交通大学学报 . 2021,第006期
2. 基于CNN-LSTMs混合模型的人体行为识别方法 [J] . 陈飞 ,程合彬 ,王伟光 . 信息技术与信息化 . 2019,第004期
3. 基于CNN-LSTM架构神经网络的桥梁损伤位置识别 [J] . 皇鹏飞 ,高士武 ,杨晓林 . 价值工程 . 2020,第005期
4. 基于LSTM-Attention与CNN混合模型的文本分类方法 [J] . 滕金保 ,孔韦韦 ,田乔鑫 . 计算机工程与应用 . 2021,第014期
5. 基于剪枝优化CNN-LSTM混合模型在边坡位移预测中的应用 [J] . 郑海青 ,赵越磊 ,孙晓云 . 河南科学 . 2021,第004期
6. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别 [C] . 李丽双 ,郭元凯 . 第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会 . 2017
7. 基于CNN-LSTM混合架构神经网络的桥梁损伤识别方法研究 [A] . 皇鹏飞 . 2020

结合LSTM和CNN混合架构的深度神经网络语言模型

摘要

著录项

相似文献

相关主题

期刊订阅