首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >WEB Page Collection Using Automatic Document Segmentation for Spoken Document Retrieval
【24h】

WEB Page Collection Using Automatic Document Segmentation for Spoken Document Retrieval

机译:使用自动文档分段进行语音文档检索的WEB页面收集

获取原文

摘要

In spoken document retrieval, the main factor affecting retrieval performance is speech recognition errors. Refining speech recognition technology can make improvement of speech recognition performance. However, if a query has out-ofvocabulary words, we cannot get the spoken documents related to the query. This paper describes spoken document retrieval using document expansion based on WEB whose contents are similar to the spoken documents retrieved. Most of spoken documents have some topics. Therefore, each spoken document is automatically divided into some segments depending on topic. And then, similar WEB pages to the spoken document can be collected using the query derived from the segment. The document expansion using WEB achieved improvement of the spoken document retrieval performance from 0.364 to 0.401 on interpolated 11-points average precition metric.
机译:在语音文档检索中,影响检索性能的主要因素是语音识别错误。完善语音识别技术可以提高语音识别性能。但是,如果查询中有非词汇词,我们将无法获得与查询相关的语音文档。本文描述了使用基于WEB的文档扩展的语音文档检索,其内容类似于所检索的语音文档。大多数口头文件都有一些主题。因此,每个语音文档会根据主题自动分为几个部分。然后,可以使用从句段派生的查询来收集与语音文档相似的WEB页面。使用内插11点平均精度指标,使用WEB进行文档扩展可以将语音文档检索性能从0.364提高到0.401。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号