首页> 外文期刊>JMIR Medical Informatics >Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations
【24h】

Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations

机译:基准临床语音识别和信息提取:新数据,方法和评估

获取原文
           

摘要

Background Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off. Objective The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow. Methods We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness. Results The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents. Conclusions The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.
机译:背景技术卫生保健中十分可预防的不良事件是由信息流故障引起的。这些故障在临床移交中是明显的;不管口头交接良好,如果手动记录笔记或根本不记录笔记,则在3-5个班次后这些信息将从三分之二到全部丢失。语音识别和信息提取提供了一种填写移交表格以进行临床验证和签字的方法。目的研究的目的是提供记录的口头交接,带注释的逐字记录和评估,以支持口头和书面自然语言处理方面的研究,以填写临床交接表格。该数据集基于合成的患者资料,从而避免了道德和法律限制,同时根据现实的临床情况,保持了语音到文本转换和信息提取研究的功效。我们还引入了一个Web应用程序来演示系统设计和工作流程。方法我们使用Dragon Medical 11.0进行语音识别,并使用CRF ++进行信息提取。为了计算信息提取的功能,我们还应用了CoreNLP,MetaMap和Ontoserver。我们的评估使用交叉验证技术来衡量处理的正确性。结果提供的数据是使用移动设备记录的护理移交模拟,该模拟是由模拟的患者记录和移交脚本(由澳大利亚注册护士说出)建立的。语音识别正确识别了我们100个测试文档中的7277个单词中的5276个。我们在信息提取中考虑了50个互斥的类别,并在不相关的文本类别中实现了F1(即Precision和Recall的谐波均值)为0.86,而在剩余的35种非空类别中,F1为0.70。我们的101个测试文件。结论这项研究的意义在于将数据以及相关的性能基准和一些处理软件开放给研究和开发社区,以研究临床文档和语言处理。数据在CLEFeHealth 2015评估实验室中用于语音识别的共享任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号