Relating Articles Textually and Visually

机译：文章撰写了文章和视觉

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Historical documents have been undergoing large-scale digitization over the past years, placing massive image collections online. Optical character recognition (OCR) often performs poorly on such material, which makes searching within these resources problematic and textual analysis of such documents difficult. We present two approaches to overcome this obstacle, one textual and one visual. We show that, for tasks like finding newspaper articles related by topic, poor-quality OCR text suffices. An ordinary vector-space model is used to represent articles. Additional improvements obtain by adding words with similar distributional representations. As an alternative to OCR-based methods, one can perform image-based search, using word spotting. Synthetic images are generated for every word in a lexicon, and word-spotting is used to compile vectors of their occurrences. Retrieval is by means of a usual nearest-neighbor search. The results of this visual approach are comparable to those obtained using noisy OCR. We report on experiments applying both methods, separately and together, on historical Hebrew newspapers, with their added problem of rich morphology.

机译：在过去几年中，历史文件一直在进行大规模的数字化，在线放置巨大的图像集合。光学字符识别（OCR）经常在这种材料上执行不良，这使得在这些资源中进行搜索问题和文本分析这些文件的困难。我们提出了两种方法来克服这个障碍，一个文本和一个视觉。我们展示了，对于像主题相关的报纸文章等任务，质量差的OCR文本就足够了。普通的矢量空间模型用于表示文章。通过添加具有类似分布表示的单词来获得其他改进。作为基于OCR的方法的替代方案，可以使用Word Spotting执行基于图像的搜索。为词典中的每个单词生成合成图像，并且使用字斑用于编译其出现的向量。检索是借助于通常的最近邻居搜索。这种视觉方法的结果与使用Noisy OCR获得的结果相当。我们在历史希伯来报纸上报告应用两种方法，分别和一起使用的实验，并提出了富含形态的问题。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|732p|共7页
会议地点
作者
Nachum Dershowitz; Daniel Labenski; Adi Silberpfennig; Lior Wolf; Yaron Tsur;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Optical character recognition software; Task analysis; Visualization; Noise measurement; Engines; Tools; Event detection;

机译：光学字符识别软件;任务分析;可视化;噪声测量;发动机;工具;事件检测;

相似文献

外文文献
中文文献
专利

1. Communicating textual health information to the mobile phones of visually-impaired users [J] . Per Egil Kummervold and Halgeir Holthe Journal of Telemedicine and Telecare . 2008,第4期

机译：将文本健康信息传达给视障用户的手机
2. Communicating textual health information to the mobile phones of visually-impaired users. [J] . Kummervold PE, Holthe H Journal of telemedicine and telecare . 2008,第4期

机译：将文本健康信息传达给视障用户的手机。
3. A widespread visually-sensitive functional network relates to symptoms in essential tremor [J] . Derek BArcher, Stephen ACoombes, Winston TChu, Brain: A journal of neurology . 2018,第2期

机译：广泛的视觉敏感功能网络涉及基本震颤的症状
4. Relating Articles Textually and Visually [C] . Nachum Dershowitz, Daniel Labenski, Adi Silberpfennig, IAPR International Conference on Document Analysis and Recognition . 2017

机译：通过文字和视觉关联文章
5. Sight for visually impaired users: Summarizing information graphics textually. [D] . Demir, Seniz. 2010

机译：视力障碍者的视线：通过文字总结信息图形。
6. Our words our story: a textual analysis of articles published in the Bulletin of the Medical Library Association/Journal of the Medical Library Association from 1961 to 2010 [O] . Mark E. Funk 2013

机译：我们的文字我们的故事：对1961年至2010年医学图书馆协会简报 /医学图书馆协会期刊上发表的文章的文字分析
7. Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters relating to Article 20 of the Cooperation Acrreement and Article 13 of the Interim Agreement between the European Economic Community and the Kingdom of Morocco and concerning the import into the Community of fruit salads originating in Morocco; Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters relating to Article 19 of the Cooperation Agreement and Article 12 of the Interim Agreement between the European Economic Community and the People's Democratic Republic of Algeria and concerning the import into the Community of fruit salads originating in Algeria; Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters relating to Article 19 of the Cooperation Agreement and Article 12 of the Interim Agreement between the European Economic Community and the Republic of Tunisia and concerning the import into the Community of fruit salads originating in Tunisia; Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters relating to Article 9 of Protocol 1 to the Agreement between the European Economic Community and the State of Israel and concerning the importation into theCommunity of fruit salads originating in Israel; Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters relating to Article 9 of Protocol 1 to the Agreement between the European Economic Community and the State of Israel and concerning the importation into theCommunity of tomato concentrates originating in Israel; Recommendation for Council Regulation (EEC) concluding the Agreement in the form of an exchange of letters between the European Economic Community and the People's Democratic Republic of Algeria and concerning the importation into theCommunity of tomato concentrates originating in Algeria (submitted to the Council by the Commission). COM (76) 633 final, 30 November 1976 [O] . 1976

机译：建议理事会条例（EEC）以与合作协议第20条和欧洲经济共同体与摩洛哥王国之间的临时协议第13条有关的信函交换和关于进入共同体的形式交换协议来自摩洛哥的水果沙拉;建议理事会条例（EEC）以与“合作协议”第19条和“欧洲经济共同体与阿尔及利亚人民民主共和国之间的临时协议”第12条有关的信函交换的形式缔结协议，并涉及进口到来自阿尔及利亚的水果沙拉社区;建议理事会条例（EEC）以与合作协议第19条和欧洲经济共同体与突尼斯共和国之间的临时协议第12条有关的信件交换以及关于进入共同体的形式交换协议来自突尼斯的水果沙拉;建议理事会条例（EEC）以与欧洲经济共同体与以色列国之间的协议第1号议定书第9条有关的信件交换的形式缔结该协定，并涉及进口来自以色列的水果沙拉社区;建议理事会条例（EEC）以与欧洲经济共同体与以色列国之间的协议第1号议定书第9条有关的信件交换的形式缔结该协定，并涉及进口源自以色列的番茄浓缩物群落;建议理事会条例（EEC）以欧洲经济共同体与阿尔及利亚人民民主共和国之间的换文形式签订协议，并关于进口源自阿尔及利亚的番茄浓缩物群落（由委员会提交理事会）。 COm（76）633最终，1976年11月30日

Relating Articles Textually and Visually

摘要

著录项

相似文献

相关主题

期刊订阅