首页> 美国卫生研究院文献>Journal of Cheminformatics >Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules
【2h】

Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules

机译:放手休息:高效的深层CNN-RNN架构无需手工规则即可实现化学命名实体的识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Chemical named entity recognition (NER) is an active field of research in biomedical natural language processing. To facilitate the development of new and superior chemical NER systems, BioCreative released the CHEMDNER corpus, an extensive dataset of diverse manually annotated chemical entities. Most of the systems trained on the corpus rely on complicated hand-crafted rules or curated databases for data preprocessing, feature extraction and output post-processing, though modern machine learning algorithms, such as deep neural networks, can automatically design the rules with little to none human intervention. Here we explored this approach by experimenting with various deep learning architectures for targeted tokenisation and named entity recognition. Our final model, based on a combination of convolutional and stateful recurrent neural networks with attention-like loops and hybrid word- and character-level embeddings, reaches near human-level performance on the testing dataset with no manually asserted rules. To make our model easily accessible for standalone use and integration in third-party software, we’ve developed a Python package with a minimalistic user interface.
机译:化学命名的实体识别(NER)是生物医学自然语言处理研究的活跃领域。为了促进新的和更好的化学NER系统的开发,BioCreative发布了CHEMDNER语料库,它是由各种手动注释的化学实体组成的广泛数据集。尽管现代机器学习算法(如深度神经网络)可以自动设计规则,但几乎不需要人工训练,大多数在主体上训练的系统都依赖复杂的手工规则或策展的数据库来进行数据预处理,特征提取和输出后处理。没有人为干预。在这里,我们通过尝试各种深度学习体系结构来进行有针对性的令牌化和命名实体识别,从而探索了这种方法。我们的最终模型基于卷积和有状态的递归神经网络的结合,具有类似注意力的循环以及混合的单词和字符级别的嵌入,在没有手动声明规则的情况下,在测试数据集上的性能接近人类级别。为了使我们的模型易于访问,以供独立使用和集成到第三方软件中,我们开发了具有简约用户界面的Python软件包。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号