首页> 外文会议>Archiving conference >'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives

【24h】

'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives

机译：“ PlaIR”：一种可完全访问数字化报纸档案的系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a platform dedicated to the analysis and the online consultation of historical newspaper archives. This platform has been designed to provide a user experience as intuitive as possible by using mature open source tools. All the features are implemented thanks to the Spring framework. To meet this goal, we created a system to display tiled high-resolution images operating without a plug-in but based on an open source solution called IIPImage. The platform also allows for full-text searches thanks to the Java search library Apache Lucene and displays the results in the form of newspaper articles. In addition, we established collaborative features to provide the users with the ability to correct the content automatically generated by our document processing workflow and accessed through the browsing platform. The system is able to store all the corrections of the users, by using the couple Hibernate/MySQL. The aim is to enable continuous improvement of both the content quality and the search accuracy, by exploiting the ability of the users to recognize significant errors, in order to enhance the digital objects representing the newspaper issues. The proposed system is designed to generate metadata describing the physical layout, but also the logical structure of newspaper documents. Our article segmentation analyses a newspaper issue and recognizes articles, even if they straddle more than one page or if they spread in a complex structure. The workflow can also consider as input data, the results of optical character recognition (OCR) engines in order to provide a textual indexation of the segmented articles. By using this system, we want to create a true and representative digital object using standard formats (i.e. METS / ALTO) and containing the logical description of the content, making easier reading and understanding by the users.

机译：本文提供了一个专门用于历史报纸档案的分析和在线咨询的平台。通过使用成熟的开源工具，该平台旨在提供尽可能直观的用户体验。所有功能都通过Spring框架得以实现。为了实现此目标，我们基于名为IIPImage的开源解决方案，创建了一个系统来显示平铺的高分辨率图像，这些图像无需插件即可运行。该平台还借助Java搜索库Apache Lucene进行全文搜索，并以报纸文章的形式显示结果。此外，我们建立了协作功能，以使用户能够纠正由我们的文档处理工作流程自动生成并可以通过浏览平台访问的内容。该系统能够通过使用夫妇Hibernate / MySQL来存储用户的所有更正。目的是通过利用用户识别重大错误的能力来实现内容质量和搜索准确性的持续改进，从而增强代表报纸问题的数字对象。提出的系统旨在生成描述物理布局的元数据，还描述报纸文档的逻辑结构。即使文章跨越一页以上或散布在复杂的结构中，我们的文章细分也会分析报纸问题并识别出文章。工作流还可以将光学字符识别（OCR）引擎的结果视为输入数据，以提供分段文章的文本索引。通过使用此系统，我们希望使用标准格式（即METS / ALTO）创建一个真实且具有代表性的数字对象，并包含内容的逻辑描述，从而使用户更易于阅读和理解。

著录项

来源
《Archiving conference》|2012年|p.48-53|共6页
会议地点
作者
Thomas Palfray; Stephane Nicolas; Thierry Paquet; Pierrick Tranouez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Google Discontinues Initiative to Digitize Newspapers' Archives [J] . Greg Landgraf American Libraries: Official bulletin of the American Library Association . 2011,第7a8期

机译：Google终止了将报纸档案数字化的计划
2. ‘The Digitization of Newspaper Archives: Opportunities and Challenges for Historians’ [J] . Adrian Bingham Twentieth Century British History . 2010,第2期

机译：‘报纸档案馆的数字化：历史学家的机遇与挑战’
3. A Newspaper/Periodical Digitization Project in Mongolia: Creating a Digital Archive of Rare Mongolian Publications [J] . KRYSTYNA K. MATUSIAK, MYAGMAR MUNKHMANDAKH Serials librarian . 2009,第1a2期

机译：蒙古的报纸/期刊数字化项目：建立蒙古稀有出版物的数字档案馆
4. 'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives [C] . Thomas Palfray, Stephane Nicolas, Thierry Paquet, Archiving conference . 2012

机译：'PLAIR'：提供全面访问数字化报纸档案的系统
5. Indexing multimedia collections and user access An analysis of the indexing systems in place at the BBC Archive and the British Film Institute National Archive. [D] . Baber, Shaun. 2012

机译：为多媒体馆藏和用户访问建立索引对BBC档案馆和英国电影学院国家档案馆中已有的索引系统进行分析。
6. The development of a virtual database to provide on-line access to a large archive of clinical data. [O] . L. E. Stevens, S. M. Huff, P. J. Haug 1992

机译：虚拟数据库的开发可提供对大型临床数据档案的在线访问。
7. Digitizing Ideas: Accessing Art from Libraries and Archives in a Digital Environment [O] . Jakšić Jasna 2013

机译：数字化想法：在数字环境中从图书馆和档案馆访问艺术品

'PlaIR' : A System to Provide Full Access to Digitized Newspaper Archives

摘要

著录项

相似文献

相关主题

期刊订阅