The Depth Problem: Identifying the Most Representative Units in a Data Group

Irigoien Itziar; Mestres Francesc; Arenas Concepción

首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >The Depth Problem: Identifying the Most Representative Units in a Data Group

【24h】

The Depth Problem: Identifying the Most Representative Units in a Data Group

机译：深度问题：确定数据组中最具代表性的单位

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose a new depth function that allows us to identify central units. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multiattribute data). Therefore, it is very valuable in many biomedical applications, which usually involve noncontinuous data, such as clinical, pathological, or biological data sources. We validate the approach using artificial examples and apply it to empirical data. The results show the good performance of our statistical approach.

机译：本文提出了一个解决方案，即如何识别具有最大集中度并能最好地描述每个组的组或集群中的单元。在诸如肿瘤类型，基因表达谱或一般生物医学数据之类的数据分类中经常出现此问题。在常见情况下，许多单元不能正确地属于任何群集特别重要。此外，在基因表达数据分类中，对簇中最中心单元的良好识别可以识别特定病理过程中最重要的样品。我们提出了一个新的深度函数，使我们能够识别中心单元。由于我们的方法基于对任何一对单元之间的距离或相异性的度量，因此它可以应用于任何种类的多元数据（连续，二进制或多属性数据）。因此，它在许多生物医学应用中非常有价值，这些应用通常涉及非连续数据，例如临床，病理或生物数据源。我们使用人工实例验证该方法，并将其应用于经验数据。结果表明我们的统计方法表现良好。

著录项

来源
《IEEE/ACM transactions on computational biology and bioinformatics》 |2013年第1期|161-172|共12页
作者
Irigoien Itziar; Mestres Francesc; Arenas Concepción;
展开▼
作者单位

University of the Basque Country, Donostia|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cluster analysis; central unit; data depth; depth function; gene expression data; geometric variability; kernel; proximity function;

机译：聚类分析;中心单元;数据深度;深度函数;基因表达数据;几何变异性;内核;接近度函数;

相似文献

外文文献
中文文献
专利

1. Linked Patient-Reported Outcomes Data From Patients With Multiple Sclerosis Recruited on an Open Internet Platform to Health Care Claims Databases Identifies a Representative Population for Real-Life Data Analysis in Multiple Sclerosis [J] . Valery Risson, Bhaskar Ghodge, Ian C Bonzani, Journal of medical Internet research . 2016,第9期

机译：从开放式互联网平台上招募的多发性硬化症患者的患者报告结果数据与医疗保健索赔数据库相关联，该数据库确定了多发性硬化症现实数据分析的代表性人群
2. Linked Patient-Reported Outcomes Data From Patients With Multiple Sclerosis Recruited on an Open Internet Platform to Health Care Claims Databases Identifies a Representative Population for Real-Life Data Analysis in Multiple Sclerosis [J] . Valery Risson, Bhaskar Ghodge, Ian C Bonzani, Journal of medical Internet research . 2016,第9期

机译：从开放式互联网平台上招募的多发性硬化症患者的患者报告结果数据与医疗保健索赔数据库相关联，该数据库确定了多发性硬化症现实数据分析的代表性人群
3. Importance of cryptic species for identifying 'representative' units of biodiversity for freshwater conservation [J] . Cook BD, Page TJ, Hughes JM Biological Conservation . 2008,第11期

机译：隐性物种对于识别生物多样性“代表”单元以保护淡水的重要性
4. Cluster Analysis of Process Operational Data to Identify Representative. Scenarios for Pinch Analysis and Energy Optimisation Studies [C] . V. E. Araujo, F. P. Bemardo, C. M. Reis European Symposium on Computer Aided Process Engineering . 2017

机译：流程运营数据的集群分析识别代表。捏分析和能量优化研究的情景
5. Data mining medication administration incident data to identify opportunities for improving patient safety. [D] . Gray, Michael David. 2009

机译：数据挖掘药物管理事件数据，以确定提高患者安全性的机会。
6. Weight-of-evidence approach to identify regionally representative sites for air-quality monitoring network: Satellite data-based analysis [O] . Nirav L Lekinwala, Ankur Bharadwaj, Ramya Sunder Raman, 2020

机译：识别空气质量监测网络区域代表性网站的权力证据方法：基于卫星数据的分析
7. Commission Communication on the protection of individuals in relation to the processing of personal data in the Community and information security. Proposal for a Council Directive concerning the protection of individuals in relation to the processing of personal data. Draft Resolution of the Representatives of the Governments of the Member States of the European Communities meeting within the Council. Commission Declaration on the application to the institutions and other bodies of the European Communities of the principles contained in the Council Directive concerning the protection of individuals in relation to the processing of personal data. Proposal for a Council Directive concerning the protection of personal data and privacy in the context of public digital telecommunications networks, in particular the integrated services digital network (ISDN) and public digital mobile networks. Recommendation of a Council Decision on the opening of negotiations with a view to the sccession of the European Communities to the Council of Europe Convention for the protection of individuals with regard to the automatic processing of personal data. Proposal for a Council Decision in the field of information security. COM (90) 314 final, 13 September 1990 [O] . 1990

机译：委员会关于保护个人在社区中处理个人数据和信息安全的沟通。关于保护与个人数据处理有关的个人的理事会指令的提案。欧洲共同体成员国政府代表在理事会内举行会议的决议草案。委员会关于向欧洲共同体机构和其他机构申请的关于保护个人数据处理方面的理事会指令所载原则的声明。关于在公共数字电信网络，特别是综合业务数字网（IsDN）和公共数字移动网络中保护个人数据和隐私的理事会指令的提案。建议理事会关于开放谈判的决定，以期欧洲共同体对欧洲委员会的让步，以保护个人自动处理个人数据。关于理事会在信息安全领域的决定的提案。 COm（90）314决赛，1990年9月13日
8. Relationship Between the Sonic Layer Depth and Mixed Layer Depth Identified from U.S. Navy Sea Glider Data. [R] . Villarreal, V. A. 2014

机译：美国海军海上滑翔机数据识别声波层深度与混合层深度的关系。

The Depth Problem: Identifying the Most Representative Units in a Data Group

摘要

著录项

相似文献

相关主题

期刊订阅