...
首页> 外文期刊>BMC Medical Informatics and Decision Making >An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
【24h】

An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival

机译:本体指导的语义数据集成框架,以支持癌症生存期的集成数据分析

获取原文
           

摘要

Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges. Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs. Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies. Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA.
机译:癌症是美国第二大死亡原因,仅次于心脏病。由于来自单个数据源的有限数据可用性,现有的癌症生存率分析主要集中于个人因素。需要整合来自不同来源的数据以同时研究尽可能多的风险因素。因此,我们提出了一种基于本体的方法来集成异构数据集,以解决关键数据集成难题。遵循本体工程的最佳实践,我们创建了癌症研究变量本体(OCRV),以适应现有的语义资源,例如美国国家癌症研究所(NCI)词库。使用全局视点数据集成方法,我们创建了映射公理以将不同来源中的数据元素链接到OCRV。我们在Ontop平台上实施了一个数据集成管道,根据下游多级集成数据分析(IDA)的需求,使用语义查询来查询,提取和转换关系数据库中的数据,并将其转换为池化数据集。基于我们在癌症生存IDA中的用例,我们在OCRV中创建了量身定制的本体结构,以促进数据集成任务。具体来说,我们创建了一个灵活的框架来应对关键的集成挑战:(1)使用共享的受控词汇表使人和计算机都可以理解数据;(2)显式建模语义关系使得可以对数据进行计算和推理, (3)通过地理变量将患者与上下文和环境因素联系起来,(4)能够清楚地记录本体中的数据操作和集成过程。使用基于本体的数据集成方法,不仅可以通过通用的受控词汇表来标准化数据变量的定义,而且可以使来自不同来源的变量之间的语义关系对于同一数据集的所有用户而言都是明确且清晰的。这种方法解决了变量选择,提取和集成过程中的歧义,从而提高了IDA的可重复性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号