...
首页> 外文期刊>Journal of vision >Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures
【24h】

Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signatures

机译:通过视觉特征分析和听觉签名对视觉障碍者使用视听的场景理解

获取原文
           

摘要

The World Health Organization estimates that approximately 2.6% of the human population is visually impaired, with 0.6% being totally blind. In this research we propose to use visual sonification as a means to assist the visually impaired. Visual sonification is the process of transforming visual data into sounds - a process which would non-invasively allow blind persons to distinguish different objects in their surroundings using their sense of hearing. The approach, while non-invasive, creates a number of research challenges. Foremost, the ear is a much lower bandwidth interface than the optical nerves or a cortical interface (roughly 150k bps vs. 10M bps). Rather than converting visual inputs into a list of object labels (e.g., "car", "phone") as traditional visual aids systems do, we conjecture a paradigm where visual abstractions are directly transformed into auditory signatures. These signatures provide a rich characterization of object in the surroundings and can be efficiently transmitted to the user. This process leverages users' capabilities to learn and adapt to the auditory signatures over time. In this study we propose to obtain visual abstractions by using a popular representation in computer vision called bag-of-visual-words (BoW). In a BoW representation, an object category is modeled as a histogram of epitomic features (or visual words) that appear in the image and are created during an a-priori off-line learning phase. The histogram is than directly converted into an audio signature using a suitable modulation scheme. Our experiments demonstrate that humans are capable of successfully discriminating audio signatures associated to different visual categories (e.g., cars, phones) or object properties (front view, side view, far) following a short training procedure. Critically, our study shows that there exists a tradeoff between the complexity of representation (number of visual words used to form the histogram) and classification accuracy by humans.
机译:世界卫生组织估计,大约有2.6%的人有视觉障碍,其中0.6%的人完全是盲人。在这项研究中,我们建议使用视觉超音波作为辅助视力障碍者的手段。视觉声化是将视觉数据转换为声音的过程,该过程将以无创方式允许盲人利用其听觉来区分周围环境中的不同物体。这种方法虽然是非侵入性的,但却带来了许多研究挑战。最重要的是,耳朵的接口带宽比视神经或皮质接口低得多(大约150k bps对10M bps)。我们没有像传统的视觉辅助系统那样将视觉输入转换为对象标签列表(例如“汽车”,“电话”),而是推测了一种将视觉抽象直接转换为听觉签名的范例。这些签名提供了周围物体的丰富特征,​​并可以有效地传输给用户。随着时间的流逝,该过程利用了用户的能力来学习和适应听觉签名。在这项研究中,我们建议通过使用计算机视觉中一种流行的表示形式(即“视觉袋”)来获得视觉抽象。在BoW表示中,将对象类别建模为图像中出现并在先验离线学习阶段中创建的流行特征(或视觉单词)的直方图。然后使用合适的调制方案将直方图直接转换为音频签名。我们的实验表明,通过简短的训练过程,人类能够成功地区分与不同视觉类别(例如汽车,电话)或对象属性(前视图,侧视图,远处)相关的音频签名。至关重要的是,我们的研究表明,在表示的复杂性(用于形成直方图的视觉单词的数量)和人类的分类精度之间存在折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号