...
首页> 外文期刊>OCLC Systems and Services >IMPACT: working together to address the challenges involving mass digitization of historical printed text
【24h】

IMPACT: working together to address the challenges involving mass digitization of historical printed text

机译:影响:共同应对涉及历史印刷文本大规模数字化的挑战

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - The purpose of this paper is to address the most urgent challenges that libraries face in the mass digitization of historical printed text: the unsatisfactory result of the conversion of scanned images to full featured electronic text by means of automated optical character recognition (OCR); the historical language barrier around 1850, caused by inadequacy of most existing lexica for historical language for OCR or post-correction and a lack of institutional knowledge and expertise in libraries, museums and archives.rnDesign/methodology/approach - In the EC-funded project IMPACT (Improving Access to Text), seven libraries, six research institutes and two private sector companies across Europe work together to address the challenges by the development of OCR software and technologies which exceed the accurateness of current state-of-the-art software significantly. The IMPACT solutions focus on the entire process of recognition after the document leaves the scanner: Image processing, OCR processing (including use of dictionaries), OCR correction and Document formatting. IMPACT will also build capacity in mass digitization by sharing best practice and expertise with the cultural heritage communities in Europe.rnFindings - Technical results will include toolkits for image enhancement and segmentation, an adaptive OCR engine and several prototypes of experimental OCR engines, computational lexica and several post-correction modules including a web based collaborative correction system and a parser for structural metadata. Strategic tools include several decision support tools, guidelines, a web site with demonstrator platform, a training programme and ultimately, a sustainable Centre of Competence for mass digitization in Europe.rnOriginality/value - The IMPACT solutions will allow for the first time to transform large amounts of digitized historical texts into electronic text with a minimum of manual interference and a significantly improved accessibility for the user.
机译:目的-本文的目的是解决图书馆在历史印刷文本的大规模数字化中面临的最紧迫的挑战:通过自动光学字符识别(OCR)将扫描图像转换为全功能电子文本的不令人满意的结果; 1850年左右的历史语言障碍,这是由于大多数现有词典对OCR或后期更正的历史语言缺乏支持,以及图书馆,博物馆和档案馆缺乏机构知识和专业知识.rn设计/方法/方法-在EC资助的项目中IMPACT(改善对文本的访问),遍布欧洲的七个图书馆,六个研究所和两家私营公司共同努力,通过开发OCR软件和技术来应对挑战,这些挑战大大超过了当前最先进的软件的准确性。 IMPACT解决方案专注于文档离开扫描仪后的整个识别过程:图像处理,OCR处理(包括使用字典),OCR校正和文档格式设置。 IMPACT还将通过与欧洲文化遗产社区分享最佳实践和专业知识来建设大规模数字化的能力。rn结果-技术成果将包括用于图像增强和分割的工具包,自适应OCR引擎以及实验性OCR引擎,计算词典和模拟的多个原型几个后期校正模块,包括基于Web的协作校正系统和用于结构化元数据的解析器。战略工具包括几个决策支持工具,指南,一个带有演示平台的网站,一个培训计划,最后是一个可持续发展的欧洲大众数字化能力中心。rnOriginity / value-IMPACT解决方案将首次实现大规模转型数量的数字化历史文本转换为电子文本,从而减少了人工干预,并大大改善了用户的可访问性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号