Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

机译：基于BiLSTM-CRF模型的古汉语自动句切与词法分析的集成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition. Tasks such as lexical analysis need to be based on sentence segmentation because of the reason that a plenty of ancient books are not punctuated. However, step-by-step processing is prone to cause multi-level diffusion of errors. This paper designs and implements an integrated annotation system of sentence segmentation and lexical analysis. The BiLSTM-CRF neural network model is used to verify the generalization ability and the effect of sentence segmentation and lexical analysis on different label levels on four cross-age test sets. Research shows that the integration method adopted in ancient Chinese improves the F1 -score of sentence segmentation, word segmentation and part of speech tagging. Based on the experimental results of each test set, the F1 -score of sentence segmentation reached 78.95, with an average increase of 3.5%; the F1-score of word segmentation reached 85.73%, with an average increase of 0.18%; and the F1-score of part-of-speech tagging reached 72.65, with an average increase of 0.35%.

机译：古代汉语信息处理的基本任务包括自动句子分割，单词分割，词性标注和命名实体识别。诸如词法分析之类的任务需要基于句子切分，这是因为很多古籍都没有标点符号。但是，分步处理容易导致错误的多级扩散。本文设计并实现了句子分段和词法分析的集成注释系统。使用BiLSTM-CRF神经网络模型验证了四个跨年龄测试集的泛化能力以及句子分割和词法分析对不同标签级别的影响。研究表明，古代汉语中采用的整合方法提高了句子分割，单词分割和部分语音标记的F1分数。根据每个测试集的实验结果，句子分割的F1得分达到了78.95，平均增长了3.5％。 F1分词率达到85.73％，平均提高0.18％。词性标签的F1得分达到72.65，平均增长0.35％。

著录项

来源
《Workshop on Language Technologies for Historical and Ancient Languages》|2020年|52-58|共7页
会议地点
作者
CHENG Ning; LI Bin; XIAO Liming; XU Changwei; GE Sijia; HAO Xingyue; FENG Minxuan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
sentence segmentation of ancient Chinese; word segmentation; part-of-speech tagging; BiLSTM-CRF; ancient Chinese information processing;

机译：古汉语句子分割分词词性标记; BiLSTM-CRF;中国古代信息处理;

相似文献

外文文献
中文文献
专利

1. Ancient Chinese Sentence Segmentation Based on Bidirectional LSTM+CRF Model [J] . Hongbin Wang, Haibing Wei, Jianyi Guo, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2019,第4a138期

机译：基于双向LSTM + CRF模型的古代句子分割
2. Automatically identifying the sentence skeleton of Chinese sentences based on the event model [J] . Xu Wei Tsinghua Science and Technology . 2012,第3期

机译：基于事件模型自动识别中文句子的句子骨架
3. Automatically Identifying the Sentence Skeleton of Chinese Sentences Based on the Event Model [J] . Wei Xu, Ke Zhao, Zhenzhen Yi, 清华大学学报（英文版） . 2012,第003期

机译：基于事件模型的汉语句子自动识别
4. A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM [C] . Boli Wang, Xiaodong Shi, Zhixing Tan, Chinese lexical semantics workshop . 2016

机译：基于NNLM的古汉语文本句子分割方法
5. Automatic Design of Prosodic Features for Sentence Segmentation [D] . Fung, James G. 2011

机译：句子分割的韵律特征的自动设计
6. Model-based analysis of thinking in problem posing as sentence integration focused on violation of the constraints [O] . Ahmad Afif Supianto, Yusuke Hayashi, Tsukasa Hirashima -1

机译：基于模型的问题分析思维在句子整合中的着重在于对约束的违反
7. Statistical modeling for lexical chains for automatic Chinese news story segmentation. [O] . 2010

机译：statistical modeling for lexical chains for automatic Chinese news story segmentation.
8. Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation. [R] . Tur, G., Stolcke, A., Hakkani-Tur, D., 2001

机译：整合韵律和词汇提示自动主题分割。

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

摘要

著录项

相似文献

相关主题

期刊订阅