首页> 外文会议>Archiving conference >'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives
【24h】

'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives

机译:“ PlaIR”:一种可完全访问数字化报纸档案的系统

获取原文

摘要

This paper presents a platform dedicated to the analysis and the online consultation of historical newspaper archives. This platform has been designed to provide a user experience as intuitive as possible by using mature open source tools. All the features are implemented thanks to the Spring framework. To meet this goal, we created a system to display tiled high-resolution images operating without a plug-in but based on an open source solution called IIPImage. The platform also allows for full-text searches thanks to the Java search library Apache Lucene and displays the results in the form of newspaper articles. In addition, we established collaborative features to provide the users with the ability to correct the content automatically generated by our document processing workflow and accessed through the browsing platform. The system is able to store all the corrections of the users, by using the couple Hibernate/MySQL. The aim is to enable continuous improvement of both the content quality and the search accuracy, by exploiting the ability of the users to recognize significant errors, in order to enhance the digital objects representing the newspaper issues. The proposed system is designed to generate metadata describing the physical layout, but also the logical structure of newspaper documents. Our article segmentation analyses a newspaper issue and recognizes articles, even if they straddle more than one page or if they spread in a complex structure. The workflow can also consider as input data, the results of optical character recognition (OCR) engines in order to provide a textual indexation of the segmented articles. By using this system, we want to create a true and representative digital object using standard formats (i.e. METS / ALTO) and containing the logical description of the content, making easier reading and understanding by the users.
机译:本文提供了一个专门用于历史报纸档案的分析和在线咨询的平台。通过使用成熟的开源工具,该平台旨在提供尽可能直观的用户体验。所有功能都通过Spring框架得以实现。为了实现此目标,我们基于名为IIPImage的开源解决方案,创建了一个系统来显示平铺的高分辨率图像,这些图像无需插件即可运行。该平台还借助Java搜索库Apache Lucene进行全文搜索,并以报纸文章的形式显示结果。此外,我们建立了协作功能,以使用户能够纠正由我们的文档处理工作流程自动生成并可以通过浏览平台访问的内容。该系统能够通过使用夫妇Hibernate / MySQL来存储用户的所有更正。目的是通过利用用户识别重大错误的能力来实现内容质量和搜索准确性的持续改进,从而增强代表报纸问题的数字对象。提出的系统旨在生成描述物理布局的元数据,还描述报纸文档的逻辑结构。即使文章跨越一页以上或散布在复杂的结构中,我们的文章细分也会分析报纸问题并识别出文章。工作流还可以将光学字符识别(OCR)引擎的结果视为输入数据,以提供分段文章的文本索引。通过使用此系统,我们希望使用标准格式(即METS / ALTO)创建一个真实且具有代表性的数字对象,并包含内容的逻辑描述,从而使用户更易于阅读和理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号