Dynamics-based human pose estimation using monocular vision.

机译：使用单眼视觉的基于动力学的人体姿势估计。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Human pose estimation using monocular vision is a challenging problem for both the computer vision and robotics communities. Past work has focused on developing efficient inference algorithms and probabilistic models employing articulated-multibody-dynamics-based priors generated using captured kinematic/dynamic measurements. However, such algorithms face challenges in generalization beyond the learned dataset with tracking performance significantly depending on the underlying articulated-multibody system-parameter estimates, which can be difficult to obtain from unstructured and uncalibrated video sequences.;In this work, we propose a model-based generative approach for estimating the human pose solely from uncalibrated monocular video in unconstrained environments without any prior learning on motion capture/image annotation data. We propose a novel Product of Heading Experts (PoHE) based generalized heading estimation framework by probabilistically-merging heading outputs (probabilistic/non-probabilistic) from time varying number of estimators. Our current implementation employs motion cues based human heading estimation framework to bootstrap a synergistically integrated probabilistic-deterministic sequential optimization framework to robustly estimate human pose. Novel pixel-distance based performance measures are developed to penalize false human detections and ensure identity-maintained human tracking. We test our framework with varied inputs (silhouette and bounding boxes) to evaluate, compare and benchmark it against ground-truth data (collected using our human annotation tool) for 52 video vignettes in the publicly available Defense Advanced Research Projects Agency (DARPA) Mind's Eye Year I dataset (ARL-RT1). Results show robust pose estimates on this challenging dataset of highly diverse activities.;Building upon this framework, we further propose a technique for estimating the lower-limb dynamics of a human solely based on captured behavior using an uncalibrated monocular video camera. We leverage the proposed framework for human pose estimation to (i) deduce the correct sequence of temporally coherent gap-filled pose estimates, (ii) estimate physical parameters, employing a dynamics model incorporating the anthropometric constraints, and (iii) filter out the optimized gap-filled pose estimates, using an Unscented Kalman Filter (UKF) with the estimated dynamically-equivalent human dynamics model. We test this extended framework on videos from the publicly available DARPA Mind's Eye Year 1 corpus (ARL-RT1). The combined estimation and filtering framework not only results in more accurate physically plausible pose estimates, but also provides pose estimates for frames, where the original human pose estimation framework failed to provide one.

机译：对于计算机视觉和机器人社区而言，使用单眼视觉进行人体姿势估计都是一个具有挑战性的问题。过去的工作集中在开发有效的推理算法和概率模型上，这些算法和概率模型采用了基于关节多体动力学的先验，该先验是使用捕获的运动/动态测量值生成的。然而，这些算法在学习数据集之外的泛化方面面临挑战，跟踪性能显着取决于基本的铰接式多体系统参数估计，这可能难以从非结构化和未经校准的视频序列中获得。;在这项工作中，我们提出了一个模型的基于生成的方法，仅在不受限制的环境中仅根据未经校准的单眼视频估计人的姿势，而无需事先学习运动捕捉/图像注释数据。我们通过概率合并来自随时间变化的估计量的航向输出（概率/非概率），提出了一种基于航向专家产品（PoHE）的广义航向估计框架。我们当前的实现采用基于运动提示的人体航向估计框架来引导协同集成的概率确定性顺序优化框架，以稳健地估计人体姿态。开发了基于像素距离的新型性能度量，以惩罚虚假的人类检测并确保保持身份的人类跟踪。我们使用各种输入（剪影和边界框）测试我们的框架，以评估，比较和基准测试（使用人类注释工具收集的）真实数据（国防高级研究计划局（DARPA）的52个视频渐晕）眼年I数据集（ARL-RT1）。结果表明，在这个具有高度挑战性的，具有高度多样性的活动的数据集上，姿态估计很可靠。基于此框架，我们进一步提出了一种技术，该技术仅使用未校准的单眼摄像机根据捕获的行为来估计人的下肢动态。我们利用拟议的人体姿势估计框架来（i）推断出时间相干的缺口填充姿势估计的正确序列，（ii）估计人体参数，采用结合人体测量学约束的动力学模型，以及（iii）过滤出优化的使用Unscented Kalman滤波器（UKF）和估计的动态等效人体动力学模型，对缺口填充的姿势进行估计。我们在来自DARPA Mind's Eye Year 1语料库（ARL-RT1）的视频中测试了此扩展框架。组合的估计和滤波框架不仅可以导致更准确的物理上合理的姿势估计，而且可以为原始人姿势估计框架无法提供的帧提供姿势估计。

著录项

作者
Agarwal, Priyanshu.;
展开▼
作者单位

State University of New York at Buffalo.;

展开▼
授予单位 State University of New York at Buffalo.;
学科 Engineering Computer.;Engineering Mechanical.
学位 M.S.
年度 2012
页码 144 p.
总页数 144
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Monocular human pose estimation: A survey of deep learning-based methods [J] . Yucheng Chen, Yingli Tian, Mingyi He Computer vision and image understanding . 2020,第Mara期

机译：单眼人体姿势估计：基于深度学习方法的调查
2. Human Pose Estimation from Monocular Images [J] . Bastian Wandt Fortschritt-Berichte VDI . 2020,第869期

机译：单眼图像的人类姿态估计
3. Monocular three-dimensional human pose estimation using local-topology preserved sparse retrieval [J] . Yu Jialin, Sun Jifeng, Song Zhiguo, Journal of electronic imaging . 2017,第3期

机译：使用局部拓扑保留稀疏检索的单眼三维人体姿势估计
4. Human Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation [C] . Mykhaylo Andriluka, Leonid Sigal International conference on articulated motion and deformable objects . 2012

机译：人类情境：为单眼3D姿势估计建模人与人的交互
5. Recognition of Human Actions based on 3D Pose Estimation via Monocular Video Sequences. [D] . Ke, Shian-Ru. 2014

机译：通过单眼视频序列基于3D姿势估计的人类动作识别。
6. Human Pose Estimation from Monocular Images: A Comprehensive Survey [O] . Wenjuan Gong, Xuena Zhang, Jordi Gonzàlez, 2016

机译：从单眼图像的人体姿势估计：全面的调查。
7. Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation [O] . Jun Sun, Mantao Wang, Xin Zhao, 2020

机译：基于深度学习的多视图姿势发生器，用于单眼3D人体姿态估计

Dynamics-based human pose estimation using monocular vision.

摘要

著录项

相似文献

相关主题

期刊订阅