An Unsupervised Model of Orthographic Variation for Historical Document Transcription

机译：历史文献转录的正交变化无监督模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Historical documents frequently exhibit extensive orthographic variation, including archaic spellings and obsolete shorthand. OCR tools typically seek to produce so-called diplomatic transcriptions that preserve these variants, but many end tasks require transcriptions with normalized orthography. In this paper, we present a novel joint transcription model that learns, unsupervised, a probabilistic mapping between modern orthography and that used in the document. Our system thus produces dual diplomatic and normalized transcriptions simultaneously, and achieves a 35% relative error reduction over a state-of-the-art OCR model on diplomatic transcription, and a 46% reduction on normalized transcription.

机译：历史文献经常表现出广泛的正字法变化，包括过时的拼写和过时的速记。 OCR工具通常寻求产生保留这些变体的所谓外交转录，但是许多最终任务都需要使用标准化拼字法进行转录。在本文中，我们提出了一种新颖的联合转录模型，该模型可以无监督地学习现代拼字法与文档中所使用的概率映射。因此，我们的系统同时产生双重外交和标准化转录，并且相对于最新的外交转录OCR模型，相对错误减少了35％，归一化转录减少了46％。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2016年|467-472|共6页
会议地点
作者
Dan Garrette; Hannah Alpert-Abrams;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation [J] . Santoro Adolfo, Marcelli Angelo Pattern recognition letters . 2020,第Mara期

机译：使用关键字发现系统作为历史手写文档转录的工具：绩效评估的模型和程序
2. Recognizing the orthography changes for identifying the temporal origin on the example of the Balkan historical documents [J] . Brodic Darko, Amelio Alessia Neural computing & applications . 2019,第8期

机译：认识到识别BALKAN历史文档示例的时间原点的正射法变化
3. An unsupervised lower-baseline localization method based on writing style features for historical documents [J] . Garcia-Calderon Miguel Angel, Garcia-Hernandez Rene Arnulfo, Ledeneva Yulia Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：基于历史文档的写作风格特征的无监督的下基线定位方法
4. An Unsupervised Model of Orthographic Variation for Historical Document Transcription [C] . Dan Garrette, Hannah Alpert-Abrams Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：历史文献转录的透明变化模型
5. Computational models for historical document image enhancement. [D] . Obafemi-Ajayi, Tayo. 2010

机译：用于历史文档图像增强的计算模型。
6. Modeling historical tuberculosis epidemics among Canadian First Nations: effects of malnutrition and genetic variation [O] . Sarah F. Ackley, Fengchen Liu, Travis C. Porco, -1

机译：模拟加拿大原住民中的历史性结核病流行：营养不良和遗传变异的影响
7. An Unsupervised Model of Orthographic Variation for Historical Document Transcription [O] . Dan Garrette, Hannah Alpert-Abrams 2016

机译：历史文献转录的透明变化模型

An Unsupervised Model of Orthographic Variation for Historical Document Transcription

摘要

著录项

相似文献

相关主题

期刊订阅