【24h】

A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

机译:通过自然语言和RGB深度视频演示进行学习的测试平台

获取原文

摘要

We are developing a testbed for learning by demonstration combining spoken language and sensor data in a natural real-world environment. Microsoft Kinect RGB-Depth cameras allow us to infer high-level visual features, such as the relative position of objects in space, with greater precision and less training than required by traditional systems. Speech is recognized and parsed using a "deep" parsing system, so that language features are available at the word, syntactic, and semantic levels. We collected an initial data set of 10 episodes of 7 individuals demonstrating how to "make tea", and created a "gold standard" hand annotation of the actions performed in each. Finally, we are constructing "baseline" HMM-based activity recognition models using the visual and language features, in order to be ready to evaluate the performance of our future work on deeper and more structured models.
机译:我们正在开发一个通过在自然的真实环境中结合口语和传感器数据进行演示来进行学习的测试平台。 Microsoft Kinect RGB深度摄像头使我们能够以比传统系统更高的精度和更少的训练来推断高级视觉特征,例如物体在空间中的相对位置。语音是使用“深度”解析系统进行识别和解析的,因此在单词,句法和语义级别都可以使用语言功能。我们收集了7个个体的10个情节的初始数据集,演示了如何“泡茶”,并为每个动作中的动作创建了“金标准”手写注释。最后,我们正在使用视觉和语言功能构建基于HMM的“基准”活动识别模型,以便准备评估我们在更深层次和更结构化模型上的工作绩效。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号