Doublet method for very fast autocoding

Jules J Berman

首页> 外文期刊>BMC Medical Informatics and Decision Making >Doublet method for very fast autocoding

【24h】

Doublet method for very fast autocoding

机译：Doublet方法可实现非常快速的自动编码

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Autocoding (or automatic concept indexing) occurs when a software program extracts terms contained within text and maps them to a standard list of concepts contained in a nomenclature. The purpose of autocoding is to provide a way of organizing large documents by the concepts represented in the text. Because textual data accumulates rapidly in biomedical institutions, the computational methods used to autocode text must be very fast. The purpose of this paper is to describe the doublet method, a new algorithm for very fast autocoding. Methods An autocoder was written that transforms plain-text into intercalated word doublets (e.g. "The ciliary body produces aqueous humor" becomes "The ciliary, ciliary body, body produces, produces aqueous, aqueous humor"). Each doublet is checked against an index of doublets extracted from a standard nomenclature. Matching doublets are assigned a numeric code specific for each doublet found in the nomenclature. Text doublets that do not match the index of doublets extracted from the nomenclature are not part of valid nomenclature terms. Runs of matching doublets from text are concatenated and matched against nomenclature terms (also represented as runs of doublets). Results The doublet autocoder was compared for speed and performance against a previously published phrase autocoder. Both autocoders are Perl scripts, and both autocoders used an identical text (a 170+ Megabyte collection of abstracts collected through a PubMed search) and the same nomenclature (neocl.xml, containing over 102,271 unique names of neoplasms). In side-by-side comparison on the same computer, the doublet method autocoder was 8.4 times faster than the phrase autocoder (211 seconds versus 1,776 seconds). The doublet method codes 0.8 Megabytes of text per second on a desktop computer with a 1.6 GHz processor. In addition, the doublet autocoder successfully matched terms that were missed by the phrase autocoder, while the phrase autocoder found no terms that were missed by the doublet autocoder. Conclusions The doublet method of autocoding is a novel algorithm for rapid text autocoding. The method will work with any nomenclature and will parse any ascii plain-text. An implementation of the algorithm in Perl is provided with this article. The algorithm, the Perl implementation, the neoplasm nomenclature, and Perl itself, are all open source materials.

机译：当软件程序提取文本中包含的术语并将其映射到术语中包含的标准概念列表时，就会发生背景自动编码（或自动概念索引）。自动编码的目的是提供一种通过文本中表示的概念来组织大型文档的方法。由于文本数据在生物医学机构中迅速积累，因此用于自动编码文本的计算方法必须非常快。本文的目的是描述doublet方法，这是一种用于快速自动编码的新算法。方法编写了一种自动编码器，该自动编码器将纯文本转换为插入的单词双峰（例如，“睫状体产生房水”变成“睫状，睫状体，身体产生，房水，房水”）。根据从标准术语中提取的双峰索引检查每个双峰。匹配的doublet被分配一个特定于该命名法中每个doublet的数字代码。与从命名法提取的双峰索引不匹配的文本双峰不是有效命名术语的一部分。来自文本的匹配双峰的序列被连接起来并与命名术语匹配（也表示为双峰的序列）。结果将doublet自动编码器的速度和性能与以前发布的短语自动编码器进行了比较。两种自动编码器都是Perl脚本，并且两种自动编码器都使用相同的文本（通过PubMed搜索收集的170兆字节摘要摘要）和相同的命名法（neocl.xml，包含超过102,271个肿瘤的唯一名称）。在同一台计算机上进行并排比较时，doublet方法自动编码器比短语自动编码器快8.4倍（211秒对1,776秒）。在具有1.6 GHz处理器的台式计算机上，doublet方法每秒编码0.8 MB的文本。此外，双峰自动编码器成功匹配了短语自动编码器遗漏的术语，而短语自动编码器未找到双峰自动编码器遗漏的术语。结论自动编码的doublet方法是一种用于快速文本自动编码的新颖算法。该方法可以使用任何术语，并且可以解析任何ascii纯文本。本文提供了Perl中算法的实现。算法，Perl实现，肿瘤术语以及Perl本身都是开源材料。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2004年第1期|共页
作者
Jules J Berman;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Control-point-placement method for the aerodynamic correction of the vortex- and the doublet-lattice methods [J] . Antonio B. Guimaraes Neto, Roberto G.A. da Silva, Pedro Paglione Aerospace science and technology . 2014,第auga期

机译：空气动力学校正的涡旋和双峰晶格方法的控制点放置方法
2. Fast Tomography Imaging System for Material Surface Based on Doublet-cylinder-lens in Intelligent OCT [J] . Shijun Xu, Xiaoling Ren MATEC Web of Conferences . 2017,第1期

机译：智能OCT中基于双凸透镜的材料表面快速层析成像系统
3. Ultrafast transition between polariton doublet and alternating current Stark triplet in organic one-dimensional photonic crystal microcavity [J] . Ishii, Kenta, Nakanishi, Applied Physics Letters . 2013,第1期

机译：一维光子晶体微腔中极化子双峰与交流斯塔克三峰之间的超快转变
4. Method Based on OSEK/VDX Platform Using Model-based and Autocode Technology for Diesel ECU Software Development [C] . MU Chunyang, SUN Lining, DU Zhijiang, Annual International Computer Software and Applications Conference . 2007

机译：基于OSEK / VDX平台的方法，采用基于模型和自电电机技术的柴油ECU软件开发
5. Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques [D] . Nanda, Gaurav. 2017

机译：结合机器学习方法和自然语言处理技术来改进伤害性叙述的自动编码
6. Doublet method for very fast autocoding [O] . Jules J Berman 2004

机译：Doublet方法可实现非常快速的自动编码
7. Doublet method for very fast autocoding [O] . Jules J Berman 2004

机译：Doublet方法可实现非常快速的自动编码
8. AN AUTOCODE PROGRAMME TO CALCULATE THEnFAST-TO-THERMAL FLUX RATIO ON THE AXIS OFnARRAYS OF CONCENTRIC FISSILE CYLINDERS [R] . J. GRIFFITHS 1963

机译：一种用于计算同心导弹圆柱轴上的热通量比的自动编程程序

Doublet method for very fast autocoding

摘要

著录项

相似文献

相关主题

期刊订阅