...
首页> 外文期刊>BMC Medical Genomics >Developing a healthcare dataset information resource (DIR) based on Semantic Web
【24h】

Developing a healthcare dataset information resource (DIR) based on Semantic Web

机译:基于语义网开发医疗数据集信息资源(DIR)

获取原文
           

摘要

The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search. The DIR prototype is composed of four major components—dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly. We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.
机译:正确的数据集对于获得正确的数据科学见解至关重要。因此,对于数据科学家来说,重要的是要对相关数据集的可用性以及这些数据集的内容,结构和现有分析有很好的了解。尽管正在进行许多努力以集成大量和各种各样的数据集,但是多年来一直存在着一种缺乏信息资源的问题,该信息资源侧重于数据集目标用户的特定需求。为了解决这一差距,我们使用面向用户的方法开发了数据集信息资源(DIR),该方法收集了针对特定用户类型的相关数据集知识。在当前版本中,我们专门解决入门级数据科学家在学习识别,理解和分析医疗保健中主要数据集时所面临的挑战。我们强调DIR不包含来自数据集的实际数据,而是旨在提供有关数据集及其分析的全面知识。 DIR利用语义Web技术和W3C数据集描述配置文件作为知识集成和表示的标准。为了为目标用户提取量身定制的知识,我们已经开发了使用基于自然语言处理(NLP)的方法从数据集文档中进行手动提取以及从相关出版物中进行半自动提取的方法。语义查询组件可用于知识检索,并且提供了参数化的问答功能,以简化搜索过程。 DIR原型由四个主要组件组成-数据集元数据和相关知识,搜索模块,针对常见问题的问题解答以及博客。当前的实现包括有关12个常用的大型和复杂医疗数据集的信息。根据健康信息学的新手进行的初次使用评估表明,DIR是有用的并且对初学者友好。我们已经开发了一种新颖的面向用户的DIR,它提供了专门针对目标用户组的数据集知识。有关数据集的知识可以在语义网中有效地表示。在此初始阶段,DIR已经能够提供12个数据集的复杂且相关的知识,以帮助入门级健康信息学家使用合适的数据集学习保健数据分析。内容和功能级别的进一步开发正在进行中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号