...
首页> 外文期刊>Journal of Computer-Aided Molecular Design >From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions
【24h】

From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions

机译:从数据点时间表到精心挑选的数据集,从科学文章,问题和可能的解决方案中挖掘实验数据和化学结构数据的数据

获取原文
获取原文并翻译 | 示例
           

摘要

The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.
机译:科学文献是实验和化学结构数据的重要来源。很多时候,这些数据已被收集到较小或较大的数据集中,而数据质量和策展问题则由用户承担。当前的研究提出了一种系统的,可重现的工作流程,用于从科学文献中收集一系列数据点,并组装一个适合于高质量建模和决策支持目的的数据库。工作流程的质量保证方面与(1)单个数据点级别和(2)数据点级别集合的化学结构和相关毒性值的管理有关。数据库的组装采用新颖的“时间轴”方法。该工作流以软件解决方案的形式实现,其适用性在梨形四膜虫急性水生毒性终点实例上得到了证明。我们发现86篇关于梨形毛霉的主要出版物的文献资料集包含2,072种化合物和2,498种独特的毒性值,这些毒性值分为2,440个数字值和58个文本值。将每种化合物指定为首选毒性值。讨论了最常见的化学和毒理学数据管理方案的示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号