Effective semi-supervised learning strategies for automatic sentence segmentation

Dalva Dogan; Guz Umit; Gurkan Hakan

首页> 外文期刊>Pattern recognition letters >Effective semi-supervised learning strategies for automatic sentence segmentation

【24h】

Effective semi-supervised learning strategies for automatic sentence segmentation

机译：有效的半监督学习策略，用于句子自动切分

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively. (c) 2017 Elsevier B.V. All rights reserved.

机译：句子分割过程的主要目的是确定自动语音识别器输出的单词流的句子边界。为句子分段而开发的统计方法需要大量的标记数据，这是费时，费力且昂贵的。在这项工作中，我们使用词汇，韵律和词法信息为句子边界分类问题提出了新的多视图半监督学习策略。目的是在只有少量句子边界标记的数据可用时，找到有效的半监督机器学习策略。我们主要研究两种半监督的学习方法，称为自我训练和共同训练。共同训练也使用了不同的示例选择策略，即同意，不同意和自我结合。此外，我们提出了基于三视图和委员会的算法，该算法结合了使用三个不相交特征集的协议，分歧和自我组合策略。我们提出了在句子分割任务上不同学习策略的比较结果。实验结果表明，使用我们提出的多视图学习策略可以大大提高句子的分割性能，因为数据集可以由三个冗余且不相交的特征集表示。我们显示，当只有少量手动标记的数据分别适用于土耳其和英语口语时，所提出的策略可以显着提高平均基准F值，分别为67.66％至75.15％和64.84％至66.32％。（c）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Pattern recognition letters》 |2018年第1期|76-86|共11页
作者
Dalva Dogan; Guz Umit; Gurkan Hakan;
展开▼
作者单位

FMV ISIK Univ, Fac Engn, Dept Elect & Elect Engn, Istanbul, Turkey;

FMV ISIK Univ, Fac Engn, Dept Elect & Elect Engn, Istanbul, Turkey;

FMV ISIK Univ, Fac Engn, Dept Elect & Elect Engn, Istanbul, Turkey;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine learning; Multi-view semi-supervised learning; Co-training; Sentence segmentation; Boosting;

机译：机器学习;多视图半监督学习;协同训练;句子分割;提升;

相似文献

外文文献
中文文献
专利

1. Automatic segmentation of optic disc in retinal fundus images using semi-supervised deep learning [J] . Shaleen Bengani, Angel Arul Jothi J., Vadivel S. Multimedia Tools and Applications . 2021,第3期

机译：半监督深度学习在视网膜眼底图像中的视镜盘自动分割
2. Semi-supervised learning for automatic segmentation of the knee from MRI with convolutional neural networks [J] . Computer Methods and Programs in Biomedicine: An International Journal Devoted to the Development, Implementation and Exchange of Computing Methodology and Software Systems in Biomedical Research and Medical Practice . 2020,第期

机译：从卷积神经网络自动分割膝关节自动分割的半监督学习
3. AUTOMATIC KNEE CARTILAGE AND MENISCI SEGMENTATION FROM 3D-DESS MRI USING DEEP SEMI-SUPERVISED LEARNING [J] . Panfilov E., Tiulpin A., Juntunen M., Osteoarthritis and cartilage . 2019,第Suppla1期

机译：使用深度半监督学习的3D-Dess MRI自动膝关节软骨和半月形细分
4. Extension of Conventional Co-Training Learning Strategies to Three-View and Committee-Based Learning Strategies for Effective Automatic Sentence Segmentation [C] . Dogan Dalva, Umit Guz, Hakan Gurkan 2018 IEEE Spoken Language Technology Workshop . 2018

机译：将常规的共同训练学习策略扩展到三视图和基于委员会的学习策略，以实现有效的自动句段分割
5. Automatic Design of Prosodic Features for Sentence Segmentation [D] . Fung, James G. 2011

机译：句子分割的韵律特征的自动设计
6. Combining active learning and semi-supervised learning techniques to extract protein interaction sentences [O] . Min Song, Hwanjo Yu, Wook-Shin Han 2011

机译：结合主动学习和半监督学习技术提取蛋白质相互作用句
7. Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations [O] . Chanatip Saetia, Tawunrat Chalothorn, Ekapol Chuangsuwanich, 2020

机译：使用本地和遥远的词表示的半监督泰语句子分割

Effective semi-supervised learning strategies for automatic sentence segmentation

摘要

著录项

相似文献

相关主题

期刊订阅