Information Retrieval From Historical Newspaper Collections in Highly Inflectional Languages: A Query Expansion Approach

Anni Jaervelin; Heikki Keskustalo; Eero Sormunen; Miamaria Saastamoinen; Kimmo Kettunen

首页> 外文期刊>Journal of the American Society for Information Science and Technology >Information Retrieval From Historical Newspaper Collections in Highly Inflectional Languages: A Query Expansion Approach

【24h】

Information Retrieval From Historical Newspaper Collections in Highly Inflectional Languages: A Query Expansion Approach

机译：从高折语的历史报纸收藏中检索信息：一种查询扩展方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.

机译：该研究的目的是测试通过近似字符串匹配方法进行的查询扩展是否有益于从历史报纸收藏中以丰富的化合物和屈折形式的语言进行检索（芬兰语）。首先，使用近似字符串匹配方法来生成索引词列表，这些索引词与1800年代的数字化报纸收藏中的当代查询词最相似。对顶级索引词变体进行了分类，以估计检索测试中适当的查询扩展范围。其次，在Cranfield样式测试中测量了近似字符串匹配方法，自动生成的变形形式及其组合的有效性。最后，对测试结果进行了详细的主题级分析。在历史报纸收藏的索引中，单词的出现通常会随着光学字符识别（OCR）错误扩散到许多语言和历史变体中。所有查询扩展方法均改善了基线结果。为了实现最高的性能改进，每个查询词需要大约30个变体的广泛扩展。基于近似字符串匹配的查询扩展优于使用查询词的变形形式，这表明在处理一种类型的变体中，覆盖不同类型的变体比精度更重要。

著录项

来源
《Journal of the American Society for Information Science and Technology》 |2016年第12期|2928-2946|共19页
作者
Anni Jaervelin; Heikki Keskustalo; Eero Sormunen; Miamaria Saastamoinen; Kimmo Kettunen;
展开▼
作者单位

School of Information Sciences, University of Tampere, Tampere FIN-33014, Finland;

School of Information Sciences, University of Tampere, Tampere FIN-33014, Finland;

School of Information Sciences, University of Tampere, Tampere FIN-33014, Finland;

School of Information Sciences, University of Tampere, Tampere FIN-33014, Finland;

Centre for Preservation and Digitisation, National Library of Finland, Saimaankatu 6, Mikkeli FI-50100, Finland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An empirical study of query expansion and cluster-based retrieval in language modeling approach [J] . Na SH, Kang IS, Roh JE, Information Processing & Management . 2007,第2期

机译：语言建模方法中查询扩展和基于聚类的检索的实证研究
2. Analysing user's queries for cross-language image retrieval from digital library collections [J] . Daniela Petrelli, Paul Clough The Electronic Library . 2012,第2期

机译：从数字图书馆馆藏分析用户查询以进行跨语言图像检索
3. Cluster-based query expansion using external collections in medical information retrieval [J] . Journal of biomedical informatics. . 2015,第Null期

机译：在医疗信息检索中使用外部集合进行基于集群的查询扩展
4. An Empirical Study of Query Expansion and Cluster- Based Retrieval in Language Modeling Approach [C] . Seung-Hoon Na, In-Su Kang, Ji-Eun Roh, Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR) . 2005

机译：语言建模方法中基于查询扩展和基于聚类的检索的实证研究
5. Cluster-based Query Expansion Using Language Modeling for Biomedical Literature Retrieval. [D] . Xu, Xuheng. 2011

机译：用于生物医学文献检索的使用语言建模的基于聚类的查询扩展。
6. Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge [O] . Theodore B Wright, David Ball, William Hersh 2017

机译：使用MeSH术语进行数据集检索的查询扩展：OHSU在bioCADDIE 2016数据集检索挑战中
7. Cluster-based query expansion using external collections in medical information retrieval [O] . Oh Heung-Seon, Jung Yuchul 2015

机译：在医疗信息检索中使用外部集合进行基于集群的查询扩展
8. Retrieval Effects of Query Expansion on a Feedback Document Retrieval System [R] . Smeaton, A. F. 1982

机译：查询扩展对反馈文献检索系统的检索效果

Information Retrieval From Historical Newspaper Collections in Highly Inflectional Languages: A Query Expansion Approach

摘要

著录项

相似文献

相关主题

期刊订阅