Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures

Jason Clemons; Yingze Bao; Mohit Bagra; Todd Austin; Silvio Savarese

首页> 外文期刊>Journal of vision >Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures

【24h】

Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures

机译：通过视觉特征分析和听觉签名对视觉障碍者使用视听的场景理解

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The World Health Organization estimates that approximately 2.6% of the human population is visually impaired, with 0.6% being totally blind. In this research we propose to use visual sonification as a means to assist the visually impaired. Visual sonification is the process of transforming visual data into sounds - a process which would non-invasively allow blind persons to distinguish different objects in their surroundings using their sense of hearing. The approach, while non-invasive, creates a number of research challenges. Foremost, the ear is a much lower bandwidth interface than the optical nerves or a cortical interface (roughly 150k bps vs. 10M bps). Rather than converting visual inputs into a list of object labels (e.g., "car", "phone") as traditional visual aids systems do, we conjecture a paradigm where visual abstractions are directly transformed into auditory signatures. These signatures provide a rich characterization of object in the surroundings and can be efficiently transmitted to the user. This process leverages users' capabilities to learn and adapt to the auditory signatures over time. In this study we propose to obtain visual abstractions by using a popular representation in computer vision called bag-of-visual-words (BoW). In a BoW representation, an object category is modeled as a histogram of epitomic features (or visual words) that appear in the image and are created during an a-priori off-line learning phase. The histogram is than directly converted into an audio signature using a suitable modulation scheme. Our experiments demonstrate that humans are capable of successfully discriminating audio signatures associated to different visual categories (e.g., cars, phones) or object properties (front view, side view, far) following a short training procedure. Critically, our study shows that there exists a tradeoff between the complexity of representation (number of visual words used to form the histogram) and classification accuracy by humans.

机译：世界卫生组织估计，大约有2.6％的人有视觉障碍，其中0.6％的人完全是盲人。在这项研究中，我们建议使用视觉超音波作为辅助视力障碍者的手段。视觉声化是将视觉数据转换为声音的过程，该过程将以无创方式允许盲人利用其听觉来区分周围环境中的不同物体。这种方法虽然是非侵入性的，但却带来了许多研究挑战。最重要的是，耳朵的接口带宽比视神经或皮质接口低得多（大约150k bps对10M bps）。我们没有像传统的视觉辅助系统那样将视觉输入转换为对象标签列表（例如“汽车”，“电话”），而是推测了一种将视觉抽象直接转换为听觉签名的范例。这些签名提供了周围物体的丰富特征，并可以有效地传输给用户。随着时间的流逝，该过程利用了用户的能力来学习和适应听觉签名。在这项研究中，我们建议通过使用计算机视觉中一种流行的表示形式（即“视觉袋”）来获得视觉抽象。在BoW表示中，将对象类别建模为图像中出现并在先验离线学习阶段中创建的流行特征（或视觉单词）的直方图。然后使用合适的调制方案将直方图直接转换为音频签名。我们的实验表明，通过简短的训练过程，人类能够成功地区分与不同视觉类别（例如汽车，电话）或对象属性（前视图，侧视图，远处）相关的音频签名。至关重要的是，我们的研究表明，在表示的复杂性（用于形成直方图的视觉单词的数量）和人类的分类精度之间存在折衷。

著录项

来源
《Journal of vision》 |2012年第9期|共1页
作者
Jason Clemons; Yingze Bao; Mohit Bagra; Todd Austin; Silvio Savarese;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类眼科学;
关键词

相似文献

外文文献
中文文献
专利

1. Universal Design of Auditory Graphs: A Comparison of Sonification Mappings for Visually Impaired and Sighted Listeners [J] . BRUCE N. WALKER, LISA M. MAUNEY ACM transactions on accessible computing . 2009,第3期

机译：听觉图的通用设计：视障者和视听者的声音映射的比较
2. Naviton-A Prototype Mobility Aid for Auditory Presentation of Three-Dimensional Scenes to the Visually Impaired [J] . MICHAL BUJACZ, PIOTR SKULIMOWSKI, PAWEL STRUMILLO Journal of the Audio Engineering Society . 2012,第9期

机译：Naviton-A原型移动辅助工具，用于向视障者进行三维场景的听觉呈现
3. Interactive sonification of U-depth images in a navigation aid for the visually impaired [J] . Skulimowski Piotr, Owczarek Mateusz, Radecki Andrzej, Journal on multimodal user interfaces . 2019,第3期

机译：视障人士在导航辅助设备中对U深度图像进行交互式超声处理
4. Auditory augmented reality: Object sonification for the visually impaired [C] . Ribeiro Flavio, Florencio Dinei, Chou Philip A., 2012 IEEE 14th International Workshop on Multimedia Signal Processing. . 2012

机译：听觉增强现实：视觉障碍者的对象超音波
5. Understanding change in place: Spatial knowledge acquired by visually impaired users through change in footpath materials [D] . Payne, Andrew Phillip 2009

机译：了解到位的变化：视障用户通过行人路材料的变化获得的空间知识
6. A Comparative Study in Real-Time Scene Sonification for Visually Impaired People [O] . Weijian Hu, Kaiwei Wang, Kailun Yang, 2020

机译：视障人士实时场景超声的比较研究
7. Auditory image understanding for the visually impaired based on a modular computer vision sonification model [O] . Banf Michael 2013

机译：基于模块化计算机视觉声波模型的视觉障碍者的听觉图像理解

Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures

摘要

著录项

相似文献

相关主题

期刊订阅