首页> 外文会议>IEEE China Summit International Conference on Signal and Information Processing >Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features
【24h】

Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features

机译:使用条件随机场和韵律特征的中文广播新闻句子边界检测

获取原文

摘要

This paper studies the use of condition random fields (CRF) and prosodic features for sentence boundary detection in Chinese broadcast news. Previous approaches mostly use first-order CRF and ignore the important context and sequential information. In this paper, we explore high-order CRF models to fully make use of the contextual and sequential information. Moreover, we show the effectiveness of CRF in sentence boundary detection by comparing it with various competitive models. The prosodic feature set is usually designed to be as exhaustive as possible in previous approaches. As a result, features may be highly correlated and some of them may be not effective. In this paper, we use a correlation-based feature selection method to select a subset with the most useful features. Finally, the use of the prosodic features, e.g., pitch, in Chinese sentence segmentation deserves further investigation because the tonal aspect of Chinese may complicate the expressions of pitch features. In this paper, we study the effectiveness of the prosodic features and rank their importance by an analysis of feature usage.
机译:本文研究了中文广播新闻中的条件随机字段(CRF)和韵律特征。以前的方法主要使用一阶CRF并忽略重要的上下文和顺序信息。在本文中,我们探讨了高阶CRF模型,以充分利用上下文和顺序信息。此外,我们通过将其与各种竞争模型进行比较来展示CRF在句子边界检测中的有效性。韵律特征集通常被设计成在以前的方法中尽可能穷举。结果,特征可以高度相关,其中一些可能无效。在本文中,我们使用基于相关的特征选择方法来选择具有最有用功能的子集。最后,在中文句子分割中使用韵律特征值得进一步调查,因为中文的色调方面可能使音调特征的表达复杂化。在本文中,我们研究了韵律特征的有效性,并通过对特征使用的分析来排名其重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号