The Partition platform recursively partitions data according to a relationship between the predictors and response values, creating a decision tree. Variations of partitioning go by many names and brand names: decision trees, CARTTM, CHAIDTM, C4.5, C5, and others. The technique is often considered as a data mining technique for the following reasons:
• it is useful for exploring relationships without having a good prior model
• it handles large problems easily
• the results are interpretable
A classic application of partitioning is to create a diagnostic heuristic for a disease. Given symptoms and outcomes for a number of subjects, partitioning can be used to generate a hierarchy of questions to help diagnose new patients.
Predictors can be either continuous or categorical (nominal or ordinal). If a predictor is continuous, then the splits are created by a cutting value. The sample is divided into values below and above this cutting value. If a predictor is categorical, then the sample is divided into two groups of levels.
The response can also be either continuous or categorical (nominal or ordinal). If the response is continuous, then the platform fits the means of the response values and the split is chosen to minimize the sum of squared errors. If the response is categorical, then the fitted value is a probability for the levels of the response and the split is chosen to minimize the residual log-likelihood chi-square.
For more information about split criteria, see Statistical Details for the Partition Platform.
For more information about recursive partitioning, see Hawkins and Kass (1982) and Kass (1980).