首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >The Depth Problem: Identifying the Most Representative Units in a Data Group
【24h】

The Depth Problem: Identifying the Most Representative Units in a Data Group

机译:深度问题:确定数据组中最具代表性的单位

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose a new depth function that allows us to identify central units. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multiattribute data). Therefore, it is very valuable in many biomedical applications, which usually involve noncontinuous data, such as clinical, pathological, or biological data sources. We validate the approach using artificial examples and apply it to empirical data. The results show the good performance of our statistical approach.
机译:本文提出了一个解决方案,即如何识别具有最大集中度并能最好地描述每个组的组或集群中的单元。在诸如肿瘤类型,基因表达谱或一般生物医学数据之类的数据分类中经常出现此问题。在常见情况下,许多单元不能正确地属于任何群集特别重要。此外,在基因表达数据分类中,对簇中最中心单元的良好识别可以识别特定病理过程中最重要的样品。我们提出了一个新的深度函数,使我们能够识别中心单元。由于我们的方法基于对任何一对单元之间的距离或相异性的度量,因此它可以应用于任何种类的多元数据(连续,二进制或多属性数据)。因此,它在许多生物医学应用中非常有价值,这些应用通常涉及非连续数据,例如临床,病理或生物数据源。我们使用人工实例验证该方法,并将其应用于经验数据。结果表明我们的统计方法表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号