首页> 外文会议>Workshop on Language Technologies for Historical and Ancient Languages >Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model
【24h】

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

机译:基于BiLSTM-CRF模型的古汉语自动句切与词法分析的集成

获取原文

摘要

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition. Tasks such as lexical analysis need to be based on sentence segmentation because of the reason that a plenty of ancient books are not punctuated. However, step-by-step processing is prone to cause multi-level diffusion of errors. This paper designs and implements an integrated annotation system of sentence segmentation and lexical analysis. The BiLSTM-CRF neural network model is used to verify the generalization ability and the effect of sentence segmentation and lexical analysis on different label levels on four cross-age test sets. Research shows that the integration method adopted in ancient Chinese improves the F1 -score of sentence segmentation, word segmentation and part of speech tagging. Based on the experimental results of each test set, the F1 -score of sentence segmentation reached 78.95, with an average increase of 3.5%; the F1-score of word segmentation reached 85.73%, with an average increase of 0.18%; and the F1-score of part-of-speech tagging reached 72.65, with an average increase of 0.35%.
机译:古代汉语信息处理的基本任务包括自动句子分割,单词分割,词性标注和命名实体识别。诸如词法分析之类的任务需要基于句子切分,这是因为很多古籍都没有标点符号。但是,分步处理容易导致错误的多级扩散。本文设计并实现了句子分段和词法分析的集成注释系统。使用BiLSTM-CRF神经网络模型验证了四个跨年龄测试集的泛化能力以及句子分割和词法分析对不同标签级别的影响。研究表明,古代汉语中采用的整合方法提高了句子分割,单词分割和部分语音标记的F1分数。根据每个测试集的实验结果,句子分割的F1得分达到了78.95,平均增长了3.5%。 F1分词率达到85.73%,平均提高0.18%。词性标签的F1得分达到72.65,平均增长0.35%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号