【24h】

Finding relevant PDF medical journal articles by the content of their figures

机译:通过数字内容查找相关的PDF医学期刊文章

获取原文
获取原文并翻译 | 示例

摘要

Literature review is a time-consuming burden because it is hard to find relevant articles. But literature review is so important because it allows researchers to find solutions to their questions/problems from previous work already performed and published by others. It is difficult to wade through documents quickly and assess their quality by only looking at their title, abstract, or even full-text. The human visual system allows us to quickly glance at images and infer the main subject of an article and decide whether we are interested in reading more. In some cases, such as biology articles for example, figures showing photos of experimental results quickly allow a researcher in the literature review phase to determine the quality of the work by its results. This work describes a system for literature review that uses content-based image retrieval (CBIR) techniques to search for relevant documents using the content of figures in a document along with relevance feedback refinement instead of keyword search guesswork. The long-term goal is to use it as a subsystem in a content-based document retrieval system where the figures and their captions are combined with the document's body text. This paper describes the processing of the documents to extract available raster graphics as well as text with its layout and formatting information intact. The process of matching a figure to its caption using this layout information is then described. While caption-based search is implemented but not quite merged into the system yet, the figure-caption matching is complete. Two novel modified tf-idf measures that are being considered to take into account bold/italic text, font size, and document structure as a way to infer text importance rather than just rely on text frequency is detailed mathematically and explained intuitively. CBIR queries where there are multiple images that form the query are issued as separate queries and their results are then merged together.
机译:文献综述是一项费时的负担,因为很难找到相关的文章。但是,文献综述之所以如此重要,是因为它使研究人员可以从他人已经完成并发表的先前工作中找到解决问题的方法。仅浏览标题,摘要或全文,就很难快速浏览文档并评估其质量。人类的视觉系统使我们能够快速浏览图像并推断出文章的主要主题,并决定我们是否有兴趣阅读更多内容。在某些情况下,例如生物学文章,显示实验结果照片的数字可以快速地使研究人员在文献复审阶段,根据其结果确定作品的质量。这项工作描述了一种文献综述系统,该系统使用基于内容的图像检索(CBIR)技术来使用文档中图形的内容以及相关性反馈优化(而非关键字搜索的猜测)来搜索相关文档。长期目标是将其用作基于内容的文档检索系统中的子系统,该系统中的图形及其标题与文档的正文结合在一起。本文描述了文档的处理过程,以提取可用的光栅图形以及文本,而其布局和格式信息完整无缺。然后描述使用此布局信息将图形与其标题匹配的过程。虽然实现了基于字幕的搜索,但尚未完全合并到系统中,但图形字幕匹配已完成。数学上详细介绍了两种新颖的修改过的tf-idf措施,这些措施考虑了粗体/斜体文本,字体大小和文档结构,以此来推断文本的重要性,而不是仅仅依靠文本频率,并且对其进行了直观的解释。包含多个构成查询的图像的CBIR查询作为单独的查询发出,然后将它们的结果合并在一起。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号