首页> 外文期刊>BMC Medical Informatics and Decision Making >Automated extraction of Biomarker information from pathology reports
【24h】

Automated extraction of Biomarker information from pathology reports

机译:从病理报告中自动提取生物标志物信息

获取原文
           

摘要

Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports. We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a “slide paragraph” unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital. High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search. Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting.
机译:病理报告以自由文本形式编写,这妨碍了有效的数据收集。我们旨在克服这一限制,并设计了一个自动系统,用于从累积的病理报告中提取生物标志物谱。我们设计了一个新的数据模型来表示生物标志物知识。自动化系统基于“幻灯片段落”单元解析免疫组织化学报告,“幻灯片段落”单元定义为针对同一组织幻灯片获得的一组免疫组织化学发现。病理报告使用上下文无关的语法进行免疫组织化学分析,并使用树状结构进行手术病理分析。该方法的性能在汉城国立大学医院管理的100名随机选择的患者的人工注释病理报告中得到了验证。与分析手术病理结果相对较差的性能相比,从免疫组织化学报告中获得较高的F值用于解析生物标志物名称和相应的测试结果(分别为0.999和0.998)。然而,将所提出的方法应用于我们的单中心数据集揭示了关于221种独特生物标志物的信息,与基于已发表文献获得的生物标志物谱相比,该结果更丰富。由于数据表示模型,所提出的方法可以将从免疫组织化学报告中提取的生物标志物谱与一种或多种外科病理报告中列出的相应病理发现相关联。术语变体通过归一化为由扩展的字典查找和基于文本相似性的搜索确定的相应首选术语来解决。我们提出的生物标志物数据提取方法解决了有关数据表示的关键限制,并且可以处理在临床环境中准备的报告,这些报告通常包含不完整的句子,印刷错误和格式不一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号