Rule-based deduplication of article records from bibliographic databases

机译：从书目数据库对文章记录进行基于规则的重复数据删除

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

We recently designed and deployed a metasearch engine, Metta, that sends queries and retrieves search results from five leading biomedical databases: PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Central Register of Controlled Trials. Because many articles are indexed in more than one of these databases, it is desirable to deduplicate the retrieved article records. This is not a trivial problem because data fields contain a lot of missing and erroneous entries, and because certain types of information are recorded differently (and inconsistently) in the different databases. The present report describes our rule-based method for deduplicating article records across databases and includes an open-source script module that can be deployed freely. Metta was designed to satisfy the particular needs of people who are writing systematic reviews in evidence-based medicine. These users want the highest possible recall in retrieval, so it is important to err on the side of not deduplicating any records that refer to distinct articles, and it is important to perform deduplication online in real time. Our deduplication module is designed with these constraints in mind. Articles that share the same publication year are compared sequentially on parameters including PubMed ID number, digital object identifier, journal name, article title and author list, using text approximation techniques. In a review of Metta searches carried out by public users, we found that the deduplication module was more effective at identifying duplicates than EndNote without making any erroneous assignments.

机译：我们最近设计并部署了元搜索引擎Metta，该引擎可以发送查询并从五个领先的生物医学数据库中检索搜索结果：PubMed，EMBASE，CINAHL，PsycINFO和Cochrane对照试验中央注册系统。由于许多文章都在多个数据库中的一个以上建立了索引，因此需要对检索到的文章记录进行重复数据删除。这不是一个简单的问题，因为数据字段包含许多丢失和错误的条目，并且某些类型的信息在不同数据库中的记录方式不同（且不一致）。本报告介绍了基于规则的方法，用于跨数据库对文章记录进行重复数据删除，并包括一个可自由部署的开源脚本模块。 Metta旨在满足使用循证医学撰写系统评价的人们的特殊需求。这些用户希望在检索中实现最高的召回率，因此，在避免对引用不同文章的任何记录进行重复数据删除方面很重要，并且实时在线进行重复数据删除也很重要。我们的重复数据删除模块在设计时考虑了这些限制。使用文本近似技术，对共享同一出版年份的文章在包括PubMed ID号，数字对象标识符，期刊名称，文章标题和作者列表在内的参数上进行顺序比较。在审查公共用户进行的Metta搜索时，我们发现重复数据删除模块在识别重复项方面比EndNote更有效，而不会进行任何错误分配。

著录项

期刊名称 Database: The Journal of Biological Databases and Curation
作者
Yu Jiang; Can Lin; Weiyi Meng; Clement Yu; Aaron M. Cohen; Neil R. Smalheiser;
展开▼
作者单位

展开▼
年(卷),期 2014(2014),-1
年度 2014
页码 bat086
总页数 7
原文格式 PDF
正文语种
中图分类生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Rule-based deduplication of article records from bibliographic databases [J] . Aaron M. Cohen, Can Lin, Clement Yu, Database . 2014,第1期

机译：从书目数据库对文章记录进行基于规则的重复数据删除
2. A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases [J] . R. Parimala Devi, V. Thigarasu Indian Journal of Science and Technology . 2015,第34期

机译：来自多个Web数据库的时间动态记录的语义重复数据删除
3. A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases [J] . R. Parimala Devi, V. Thigarasu Indian Journal of Science and Technology . 2015,第34期

机译：来自多个Web数据库的时间动态记录的语义重复数据删除
4. Bibliographic database of PACS-related articles from the SPIE literature [C] . Peter E. Shile, Mallinckrodt Institute of Radiology/Washington Univ. Medical Ctr., St. Louis, Medical Imaging 1996: PACS Design and Evaluation: Engineering and Clinical Issues . 1996

机译：SPIE文献中PACS相关文章的书目数据库
5. Online Deduplication for Distributed Databases. [D] . Xu, Lianghong. 2016

机译：分布式数据库的在线重复数据删除。
6. Reporting of article retractions in bibliographic databases and online journals [O] . Kath Wright, Catriona McDaid 2011

机译：在书目数据库和在线期刊中报告文章撤回
7. Scientific journal «Plant and Soil Science» Font Size Make font size smaller Make font size default Make font size larger Language Select Language User Username Password Remember me Article Tools Print this article Indexing metadata How to cite item Finding References Email this article (Login required) Email the author (Login required) About The Authors YE. Krestʹyaninov National University of Life and Enviromental Sciences of Ukraine L. Yermakova National University of Life and Enviromental Sciences of Ukraine T. Antal National University of Life and Enviromental Sciences of Ukraine Social networks Information For Readers For Authors For Librarians Author Fees This journal charges the following author fees. Publication of one page: 50.00 (UAH) The fee include those of the journal’s publishing, online hosting and archiving. The ability of authors to pay the fee does not influence the peer review process. No fee can be paid prior to the final positive decision of the reviewers and the editor in charge, regarding the article proposed to be evaluated in order to be published. Depending upon each particular case, the fee can be covered by the journal edition. Details Recipient: National University of Life and Environmental Sciences of Ukraine Address: Heroyiv Oborony st., 15, Kyiv-03041, Ukraine. Current account number 31254247216289 Bank: State Treasury Service of Ukraine, Kyiv Bank code 820172 Certificate of VAT №100155865 Payment: In an article in scientific journal "Plant and Soil Science" Personal Account 18.02.06.06.01 Tel .: +38 044 527 87 20 Email: nti_dep@nubip.edu.ua Example of bibliographic description The list of journals included in scientometric databases: - Scopus (Uкraine, Belarus, Poland, Russia); - Іndex Copernicus; - Web of Sciense (humanities, natural sciences, social sciences); - РІНЦ. Search algorithm and calculation scientometric indicator: - Scopus; - Publish or Perish; - Google Scholar; - SNIP-іndex journal. Home About Login Register Search Current Archives Statistics Reminder for authors Editorial Board Home > Vol 10, No 1 (2019) > Krestʹyaninov Formation of corn grain yield and quality depending on micronutrients topdressing under conditions of Left bank Forest Steppe [O] . YE. Krestʹyaninov, L. Yermakova, T. Antal 2019

机译：科学杂志«植物和土壤科学»字体大小使字体大小较小Make Font Size默认制作字体大小较大语言选择语言用户用户用户用户用户用户名称打印本文索引项目查找参考文章查找参考文章电子邮件本文（需要登录）通过电子邮件发送给作者ye的作者（需要登录）。 Krest'yaninov国立生活大学L. Yermakova国立生活大学L. Yermakova国立生命大学乌克兰塔斯坦国立生命学院与乌克兰读者的环境网络社会网络信息图书馆员的社交网络信息作者作者费用的作者提供费用以下内容作者费用。出版一页：50.00（UAH）费用包括期刊出版，在线托管和归档的费用。作者支付费用的能力不会影响同行评审过程。在审核人员和负责编辑的最终决定之前，无需支付费用，就拟议的文章进行了评估，以便公布。根据每个特定案例，期刊版本可以涵盖费用。详细资料收件人：国立生活大学乌克兰的环境科学地址：伊莱夫·奥诺齐St.，15，Kyiv-03041，乌克兰。当前账户号码31254247216289银行：乌克兰国家财政部服务，基辅银行代码820172增值税证书№100155865付款：在科学期刊“植物和土壤科学”个人账户中的一篇文章18.02.06.06.01电话。：+38 044 527 87 20封电子邮件：nti_dep@nubip.edu.ua书目描述的示例，Sciporal数据库中包含的期刊列表： - Scopus（uкraine，白俄罗斯，波兰，俄罗斯）; - іdex哥白尼; - 巩膜网（人文，自然科学，社会科学）; - рінц。搜索算法与计算科学计量指标： - Scopus; - 发表或灭亡; - 谷歌学术; - Snip-index杂志。主页关于登录登记搜索当前档案统计提醒为作者编辑委员会主页> Vol 10，第1（2019）> Krest'yaninov在左岸森林草原条件下的微量营养营养营养营养品，质量
8. Global Ecosystems Database. Version 0.1 (Beta-test). EPA Global Climate Research Program. NOAA/NGDC Global Change Database Program. Prototype 1. Database Documentation. NGDC Key to Geophysical Records Documentation No. 25 [R] . Campbell, W. G., Kineman, J. J. 1991

机译：全球生态系统数据库。版本0.1（Beta测试）。 Epa全球气候研究计划。 NOaa / NGDC全球变化数据库计划。原型1.数据库文档。 NGDC地球物理记录关键文件第25号

Rule-based deduplication of article records from bibliographic databases

摘要

著录项

相似文献

相关主题

期刊订阅