...
首页> 外文期刊>International Journal of Environmental Research and Public Health >Clustering Multivariate Time Series Using Hidden Markov Models
【24h】

Clustering Multivariate Time Series Using Hidden Markov Models

机译:使用隐马尔可夫模型对多元时间序列进行聚类

获取原文
           

摘要

In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.
机译:在本文中,我们描述了一种使用变量同时包含分类值和连续值对多元时间序列进行聚类的算法。这种类型的时间序列在医疗保健中很常见,它们代表个人的健康轨迹。这个问题具有挑战性,因为分类变量使得很难定义轨迹之间的有意义距离。我们提出了一种基于隐马尔可夫模型(HMM)的方法,首先将每个轨迹映射到HMM中,然后定义HMM之间的合适距离,最后使用基于距离矩阵的方法对HMM进行聚类。我们在模拟的,但现实的,年龄为45岁及以上的个人的1,255条轨迹数据集,具有已知聚类结构的综合验证集以及从纵向健康和退休调查中提取的一组268条轨迹中测试了我们的方法。所提出的方法可以使用R和Matlab中的标准软件包非常简单地实现,并且可以很好地解决使用不需要高级统计知识的工具来解决使用分类变量对多元时间序列进行聚类的难题。广泛的研究人员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号