Data quality is always a problem; no matter how consistent people try to be, they will always enter the same data in different ways - Mr Mr. Mr, MR, Mister, mister and so on. Sometimes the answer is simple: as the designer of the database, you control the data entry in some way -perhaps you provide an interface that has a combo box. This contains the allowable entries and the users pick the one they need. This is fine for data with a low cardinality. Cardinality is a very useful word to add to your database dictionary if it isn't already there. It describes the relationship between the number of entries and the variation you find between those entries. For example, suppose you have a table containing data for 1,000 customers. If there is a Date of Birth field, the table could contain 1,000 different dates. Or perhaps several customers share the same birthday, so there might be 992 different dates in 1,000 entries. Either way, the DOB field has a high cardinality. The Gender field, however, will only have two different entries in those 1,000 rows ('Male' and 'Female') each appearing about 500 times, so it has a low cardinality.
展开▼