首页> 外文会议>9th International conference on language resources and evaluation >Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems
【24h】

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

机译:桥接语音技术与自然语言处理之间的差距:术语发现系统的评估工具箱

获取原文

摘要

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from an automatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.
机译:语言方面的从任一连续音素记录或从原料讲话无监控发现已经看到无论是从理论和实践的角度来看,在过去几年,越来越关心然而,存在执行期限发现系统没有共同认可的评估方法。在这里,我们提出了这样的评价工具箱,来自语音技术和自然语言处理绘图的想法。我们首先变换基于语音的输出转换成一种符号表示,并且计算在该表示五种类型的评价标准的:声匹配的质量,所述簇的质量发现,并与真实字对准的质量(类型,令牌和边界的分数)。我们测试了两个学期的发现系统中使用的符号输入接收的语音作为输入,以及一个我们的方法。后者是用金转录和从自动语音识别器所获得的转录都运行,以模拟的情况下当仅不完全的符号信息是可用的。所得到的结果,通过使用所提出的评价指标的分析和这些指标的影响进行了讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号