...
【24h】

Cardinality rules

机译:基数规则

获取原文
获取原文并翻译 | 示例
           

摘要

Data quality is always a problem; no matter how consistent people try to be, they will always enter the same data in different ways - Mr Mr. Mr, MR, Mister, mister and so on. Sometimes the answer is simple: as the designer of the database, you control the data entry in some way -perhaps you provide an interface that has a combo box. This contains the allowable entries and the users pick the one they need. This is fine for data with a low cardinality. Cardinality is a very useful word to add to your database dictionary if it isn't already there. It describes the relationship between the number of entries and the variation you find between those entries. For example, suppose you have a table containing data for 1,000 customers. If there is a Date of Birth field, the table could contain 1,000 different dates. Or perhaps several customers share the same birthday, so there might be 992 different dates in 1,000 entries. Either way, the DOB field has a high cardinality. The Gender field, however, will only have two different entries in those 1,000 rows ('Male' and 'Female') each appearing about 500 times, so it has a low cardinality.
机译:数据质量始终是一个问题。无论人们试图做到多么一致,他们总是会以不同的方式输入相同的数据-先生,先生,先生,先生等。有时答案很简单:作为数据库的设计者,您可以通过某种方式控制数据输入-也许您提供了一个带有组合框的接口。其中包含允许的条目,用户选择所需的条目。这对于低基数的数据很好。基数是一个非常有用的词,可以添加到数据库字典中(如果尚不存在)。它描述了条目数量与这些条目之间的变化之间的关系。例如,假设您有一个包含1000个客户数据的表。如果有“出生日期”字段,则该表可能包含1,000个不同的日期。也许几个客户共享同一个生日,因此1000个条目中可能有992个不同的日期。无论哪种方式,DOB字段都具有高基数。但是,“性别”字段在这1,000行中将只有两个不同的条目(“男性”和“女性”),每条出现大约500次,因此基数较低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号