The K Nearest Neighbors report contains a separate report for each response variable. Each response variable report contains information about the fitted model for that response. This information includes a Model Selection report and summary information for each of the k models that were fit. The report shows tables for the training set and for the validation and test sets if you defined these using validation.
The Model Selection report displays a solution path plot across K based on the Misclassification Rate for categorical responses or the RASE for continuous responses. By default, the slider is placed on the value of K that corresponds to the best performing model. You can drag the slider to change the value of K in the report.
The statistics reported depend on the modeling type of the response. Each row in the summary tables corresponds to a model defined by k nearest neighbors, where K ranges from one to the value that you specified as Number of Neighbors, K in the launch window.
By default, in addition to the Model Selection graph, the K Nearest Neighbors report for a continuous response contains a summary table.
An asterisk marks the model for the value of K that has the smallest RASE. The report for a continuous response contains the following columns:
K
Number of nearest neighbors used in the model. K ranges from 1 to the Number of Neighbors, K that you specified in the launch window.
Count
Number of observations.
RSquare
The RSquare value for the model.
RASE
Root mean average squared prediction error for the model. The model with the smallest RASE is marked with an asterisk. If there are tied RASE values, the model with the smallest K is marked with the asterisk.
SSE
Sum of squared errors for the model.
By default, in addition to the Model Selection graph, the K Nearest Neighbors report for a categorical response contains a summary table, a confusion matrix, and a mosaic plot.
An asterisk marks the model for the value of K that has the smallest misclassification rate. The report for a categorical response contains the following columns:
K
Number of nearest neighbors used in the model. K ranges from 1 to the Number of Neighbors, K that you specified in the launch window.
Count
Number of observations.
RSquare
The Entropy RSquare for the model.
Misclassification Rate
Proportion of observations misclassified by the model. This is calculated as Misclassifications divided by Count. The model with the smallest misclassification rate is marked with an asterisk. If there are tied misclassification rates, the model with the smallest K is marked with the asterisk.
Misclassifications
Number of observations that are incorrectly predicted by the model.
By default, a Confusion Matrix Report is shown for the model with the smallest Misclassification Rate. If there are ties for the smallest misclassification rate, a report is shown for the model with the smallest K. The Confusion Matrix Report contains confusion matrices and confusion rates matrices. A confusion matrix is a two-way classification of actual and predicted responses. A confusion rates matrix is equal to the confusion matrix, with the numbers divided by the row totals. If you use validation, confusion matrices and confusion rates matrices for the validation and test sets appear. Use the Confusion Matrix Report and the misclassification rates to evaluate your model.
Tip: If you change the position of the slider in the solution path plot, an additional Confusion Matrix Report is displayed for the chosen value of K. Use the additional report to compare an alternative model to the default best model.
By default, a mosaic plot is shown for the model with the smallest Misclassification Rate. If there are ties for the smallest misclassification rate, a mosaic plot is shown for the model with the smallest K. A mosaic plot is a stacked bar chart where each segment is proportional to its group’s frequency count. For more information about mosaic plots, see Mosaic Plot in Basic Analysis. If you use validation, mosaic plots for the validation and test sets are shown.
Tip: If you change the position of the slider in the solution path plot, the mosaic plot updates to display the results for the chosen value of K.