首页> 外文会议>Asian conference on intelligent information and database systems >Improving Efficiency of Sentence Boundary Detection by Feature Selection
【24h】

Improving Efficiency of Sentence Boundary Detection by Feature Selection

机译:通过特征选择提高句子边界检测的效率

获取原文

摘要

The goal of sentence boundary detection (SBD) is to predict the presence/absence of sentence boundary in an unstructured word sequence, where there is no punctuation presented. In this paper, we propose a feature selection approach to obtain more effective features used for the SBD classifier. Specifically, the observed words are considered its correlation with the sentence boundary based on the pointwise mutual information before being used as the feature of the classifier. By using the linear chain CRF model to predict sentence boundaries of a text sequence, the experimental results on a part of the English Gigaword 2nd Edition corpus show that the proposed method helps to reduce the number of model parameters up to 44.87 % while maintaining a comparable F1-score to the original model.
机译:句子边界检测(SBD)的目标是预测没有标点符号的非结构化单词序列中句子边界的存在/不存在。在本文中,我们提出了一种特征选择方法,以获得用于SBD分类器的更有效的特征。具体地,在用作分类器的特征之前,基于逐点的互信息将观察到的单词视为与句子边界的相关性。通过使用线性链CRF模型来预测文本序列的句子边界,在英语Gigaword 2nd Edition语料库的一部分上的实验结果表明,所提出的方法有助于在保持可比性的同时将模型参数的数量减少多达44.87% F1评分为原始模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号