首页> 外文期刊>Database >Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
【24h】

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE

机译:通过文本挖掘改善文献与生物学数据之间的联系:GEO,PDB和MEDLINE的案例研究

获取原文
           

摘要

High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, 50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
机译:高通量实验和生物信息学技术正在创造爆炸性的数据量,而这些数据正变得越来越庞大,无法跟踪需要访问,分析和处理现有数据的生物学家和研究人员。许多可用数据都存储在专门的数据库中,例如用于微阵列的Gene Expression Omnibus(GEO)或用于蛋白质结构和坐标的Protein Data Bank(PDB)。其作者还在文献数据库(例如MEDLINE和PubMed Central)中归档的出版物中描述了数据集。当前,生物数据库和文献之间的链接的管理主要依靠体力劳动,这使其成为耗时且艰巨的任务。在这里,我们分析了GEO,PDB和MEDLINE之间链接管理的当前状态。我们发现,链接管理是异类的,具体取决于所涉及的源和数据库,并且源之间的重叠率很低,对于PDB和GEO而言,<50%。此外,我们证明了文本挖掘工具可以自动提供有价值的证据,以帮助策展人扩大他们审查的文章和数据库条目的范围。因此,我们提出了一些建议,以改善策展链接的覆盖范围,以及在保持高质量策展的同时,从不同数据库获得的信息的一致性。数据库网址:http://www.ncbi.nlm.nih.gov/PubMed、http://www.ncbi.nlm.nih.gov/geo/、http://www.rcsb.org/pdb/

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号