OCR Error Correction Using Statistical Machine Translation

HAITHEM AFLI; LOIEC BARRAULT; HOLGER SCHWENK

首页> 外文期刊>International journal of computational linguistics and applications >OCR Error Correction Using Statistical Machine Translation

【24h】

OCR Error Correction Using Statistical Machine Translation

机译：使用统计机器翻译的OCR纠错

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we explore the use of a statistical machine translation system for optical character recognition (OCR) error correction. We investigate the use of word and character-level models to support a translation from OCR system output to correct french text. Our experiments show that character and word based machine translation correction make significant improvements to the quality of the text produced through digitization. We test the approach on historical data provided by the National Library of France. It shows a relative Word Error Rate reduction of 60% at the word-level, and 54% at the character level.

机译：在本文中，我们探索了使用统计机器翻译系统进行光学字符识别（OCR）纠错的用途。我们调查了单词和字符级模型的使用，以支持从OCR系统输出到正确的法语文本的翻译。我们的实验表明，基于字符和单词的机器翻译校正可以显着提高通过数字化生成的文本的质量。我们对法国国家图书馆提供的历史数据进行了测试。它显示出在单词级别的相对单词错误率减少了60％，在字符级别的减少了54％。

著录项

来源
《International journal of computational linguistics and applications》 |2016年第1期|175-191|共17页
作者
HAITHEM AFLI; LOIEC BARRAULT; HOLGER SCHWENK;
展开▼
作者单位

Universite du Maine, France;

Universite du Maine, France;

Universite du Maine, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
OCR error correction; SMT; post-processing;

机译：OCR纠错;SMT;后期处理;

相似文献

外文文献
中文文献
专利

1. Grammatical and context-sensitive error correction using a statistical machine translation framework [J] . Nava Ehsan, Heshaam Faili Software . 2013,第2期

机译：使用统计机器翻译框架进行语法和上下文相关的错误纠正
2. Statistical learning for OCR error correction [J] . Mei Jie, Islam Aminul, Mohd Abidalrahman, Information Processing & Management . 2018,第6期

机译：OCR纠错的统计学习
3. Correction of Errors in a Verb Modality Corpus for Machine Translation with a Machine-Learning Method [J] . MASAKI MURATA, MASAO UTIYAMA, KIYOTAKA UCHIMOTO, ACM transactions on Asian language information processing . 2005,第1期

机译：机器学习的机器翻译动词语料库中的错误校正
4. Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation [C] . Tomoya Mizumoto, Yuji Matsumoto Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：统计机器翻译的语法错误纠正的判别重排
5. Utilizing big data in identification and correction of OCR errors. [D] . Agarwal, Shivam. 2013

机译：利用大数据识别和纠正OCR错误。
6. Molecular and statistical approaches to the detection and correction of errors in genotype databases. [O] . L M Brzustowicz, C Mérette, X Xie, 1993

机译：用于检测和纠正基因型数据库中错误的分子和统计方法。
7. Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation [O] . Yinchen Zhao, Mamoru Komachi, Hiroshi Ishikawa 2015

机译：用语料库增强和基于分层术语的统计机器翻译提高中国语法纠错

OCR Error Correction Using Statistical Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅