Data mining is the search for novel, actionable information within data. It is important to note that the number of records in the data being analyzed is only one (and perhaps a small) factor in determining the complexity of a given data mining technique. Most complexity in data mining arises from the distribution of values contained in the data - not the number of records. In this paper we utilize straightforward histogram-based visualizations to gain insight into how the performance of a well-studied data mining technique, the naive-Bayes classifier, performs under various discretization schemes for both continuous and discrete values. The resulting visualization system provides users with a tool that describes the underlying model of the data used by the classifier. Exploratory visualizations of the distributions of training data can be selected based on expert domain knowledge and then combined to apply to the test data.
展开▼