Unambiguous Scene Text Segmentation With Referring Expression Comprehension

Rong Xuejian; Yi Chucai; Tian Yingli

首页> 外文期刊>IEEE Transactions on Image Processing >Unambiguous Scene Text Segmentation With Referring Expression Comprehension

【24h】

Unambiguous Scene Text Segmentation With Referring Expression Comprehension

机译：具有参考表达理解的明确场景文本分段

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e., scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

机译：文本实例为自然场景的理解和解释提供了有价值的信息。文本中体现的丰富精确的高级语义可能是有益于了解我们周围的世界，并赋予广泛的现实世界应用。虽然最近的视觉短语接地接近一般对象，但本文探讨了提取指定的文本并预测明确的场景文本分段掩码，即从自然语言描述（参考表达式）上的场景文本分段，如黑色摆动的小男孩上的橙色文本蝙蝠。本新颖问题的解决方案使得能够从复杂背景中精确分割场景文本实例。在我们提出的框架中，统一的深度网络通过将自然场景图像的区域级和像素级视觉特征编码到空间特征映射中，并将它们分解为文本实例的显着响应映射。要进行定量评估，我们建立了一个新的场景文本引用表达式分段数据集：Coco-charref。实验结果展示了所提出的框架对文本实例分割任务的有效性。通过将基于图像的视觉功能与基于语言的文本解释组合，我们的框架优于来自Coco-CharRef数据集的最先进的文本本地化和自然语言对象检索方法的基准。

著录项

来源
《IEEE Transactions on Image Processing》 |2020年第2020期|591-601|共11页
作者
Rong Xuejian; Yi Chucai; Tian Yingli;
展开▼
作者单位

CUNY City Coll Dept Elect Engn New York NY 10031 USA;

CUNY Grad Ctr New York NY 10031 USA|Google Augmented Real Mountain View CA 94043 USA;

CUNY City Coll Dept Elect Engn New York NY 10031 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image segmentation; Visualization; Semantics; Natural languages; Task analysis; Feature extraction; Grounding; Natural language description; text detection; text retrieval; text recognition; deep neural network; referring expression;

机译：图像分割;语言;语言;自然语言;任务分析;特征提取;接地;自然语言描述;文本检测;文本检索;文本识别;深神经网络;参考;

相似文献

外文文献
中文文献
专利

1. An effective graph-cut scene text localization with embedded text segmentation [J] . Liu Xiaoqian, Wang Weiqiang Multimedia Tools and Applications . 2015,第13期

机译：具有嵌入式文本分割的有效的图割场景文本本地化
2. Generating unambiguous and diverse referring expressions [J] . Nikolaos Panagiaris, Emma Hart, Dimitra Gkatzia Computer speech and language . 2021,第Jula期

机译：生成明确和不同的引用表达式
3. Effective Subword Segmentation for Text Comprehension [J] . Zhang Zhuosheng, Zhao Hai, Ling Kangwei, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第11期

机译：用于文本理解的有效子词分段
4. Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation [C] . Gen Luo, Yiyi Zhou, Xiaoshuai Sun, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2020

机译：联合引用表达理解和分段的多任务协作网络
5. Referring Expression Comprehension for CLEVR-Ref+ Dataset [D] . Rathor, Kuldeep Singh. 2020

机译：引用CLEVR-REF + DataSet的表达式理解
6. Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing [O] . Jinpeng Mi, Jianzhi Lyu, Song Tang, 2020

机译：通过引用表达理解和场景图解析接地的交互式自然语言接地
7. Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing [O] . Jinpeng Mi, Jianzhi Lyu, Song Tang, 2020

机译：通过引用表达理解和场景图解析接地的交互式自然语言接地

Unambiguous Scene Text Segmentation With Referring Expression Comprehension

摘要

著录项

相似文献

相关主题

期刊订阅