首页> 外文期刊>IEEE Transactions on Image Processing >Unambiguous Scene Text Segmentation With Referring Expression Comprehension
【24h】

Unambiguous Scene Text Segmentation With Referring Expression Comprehension

机译:具有参考表达理解的明确场景文本分段

获取原文
获取原文并翻译 | 示例
           

摘要

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e., scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.
机译:文本实例为自然场景的理解和解释提供了有价值的信息。文本中体现的丰富精确的高级语义可能是有益于了解我们周围的世界,并赋予广泛的现实世界应用。虽然最近的视觉短语接地接近一般对象,但本文探讨了提取指定的文本并预测明确的场景文本分段掩码,即从自然语言描述(参考表达式)上的场景文本分段,如黑色摆动的小男孩上的橙色文本蝙蝠。本新颖问题的解决方案使得能够从复杂背景中精确分割场景文本实例。在我们提出的框架中,统一的深度网络通过将自然场景图像的区域级和像素级视觉特征编码到空间特征映射中,并将它们分解为文本实例的显着响应映射。要进行定量评估,我们建立了一个新的场景文本引用表达式分段数据集:Coco-charref。实验结果展示了所提出的框架对文本实例分割任务的有效性。通过将基于图像的视觉功能与基于语言的文本解释组合,我们的框架优于来自Coco-CharRef数据集的最先进的文本本地化和自然语言对象检索方法的基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号