首页> 外文会议>Preferences and Similarities >Proximities in Statistics: Similarity and Distance
【24h】

Proximities in Statistics: Similarity and Distance

机译:统计上的接近度:相似度和距离

获取原文
获取原文并翻译 | 示例

摘要

We review similarity and distance measures used in Statistics for clustering and classification. We are motivated by the lack of most measures to adequately utilize a non uniform distribution defined on the data or sample space. Such measures are mappings from O × O → R_+ where O is either a finite set of objects or vector space like R~p and R_+ is the set of non-negative real numbers. In most cases those mappings fulfil conditions like symmetry and reflexivity. Moreover, further characteristics like transitivity or the triangle equation in case of distance measures are of concern. We start with Hartigan's list of proximity measures which he compiled in 1967. It is good practice to pay special attention to the type of scales of the variables involved, i.e. to nominal (often binary), ordinal and metric (interval and ratio) types of scales. We are interested in the algebraic structure of proximities as suggested by Hartigan (1967) and Cormack (1971), information-theoretic measures as discussed by Jardine and Sibson (1971), and the probabilistic W-distance measure as proposed by Skarabis (1970). The last measure combines distances of objects or vectors with their corresponding probabilities to improve overall discrimination power. The idea is that rare events, i.e. set of values with a very low probability of observing, related to a pair of objects may be a strong hint to strong similarity of this pair.
机译:我们回顾了统计数据中用于聚类和分类的相似性和距离度量。我们缺乏大多数措施来充分利用数据或样本空间上定义的非均匀分布的动机。这样的度量是从O×O→R_ +的映射,其中O是对象的有限集合或向量空间,例如R〜p,R_ +是非负实数的集合。在大多数情况下,这些映射满足对称性和自反性之类的条件。此外,在考虑距离度量的情况下,诸如传递性或三角方程之类的其他特性也值得关注。我们从Hartigan于1967年编制的邻近度量列表开始。优良作法是,特别注意所涉及变量的标度类型,即标称(通常为二进制),序数和度量(区间和比率)类型。秤。我们对Hartigan(1967)和Cormack(1971)建议的邻近度的代数结构,Jardine和Sibson(1971)讨论的信息理论量度以及Skarabis(1970)建议的概率W距离量度感兴趣。最后一种度量将对象或向量的距离与它们的相应概率结合起来,以提高总体判别能力。这个想法是,与一对对象相关的罕见事件(即观察概率极低的一组值)可能强烈暗示了这对对象的强烈相似性。

著录项

  • 来源
    《Preferences and Similarities》|2006年|P.161-177|共17页
  • 会议地点 Udine(IT)
  • 作者

    Hans -J. Lenz;

  • 作者单位

    International Centre for Mechanical Sciences (CISM);

    International School for the Synthesis of Expert Knowledge;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TH1;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号