RMSE

The RMSE tab contains the following elements:

•

Plot P(disease) versus rank of observations of selected methods button

Select models of interest in the oneway plot by clicking and dragging a mouse rectangle over them and clicking this button to generate a plot like the one shown below:

The plot shows observations with a specific value of the dependent variable. The vertical axis gives the probability of having the value, according to the model. Multiple points are plotted for each observation, one from each cross validation iteration. The horizontal axis gives the observations ranked by probability of having the trait value averaged over cross validation iterations.

Without a model, the predicted probability of the dependent variable value equals the proportion of observations with the value in the entire data table. A dashed line is drawn at this height. The further an observation is to the left, the better it was predicted during CV iterations. If most of the points are above the dashed line, the model is predicting better than chance for this trait value.

•

Oneway Analysis of Performance Criterion by MODEL plot

This plot shows the performance criterion (RMSE, (AUC, Accuracy, or Harrell's C) for each model for each cross validation iteration in the form of a oneway analysis of variance. Note that smaller values of RMSE indicate better performance, whereas larger values of AUC, Accuracy, and Harrell's C indicate better performance. Without any model, the predicted probability for a response category equals the proportion of the category in the training data set, and the prediction of a continuous response equals the average in the training data set. The solid black horizontal reference line is the median among cross validation iterations of the performance criterion estimated without any model. The dashed horizontal lines above and below the solid one are the levels of the whiskers in a box plot for these no-model estimates. A model whose cross validated criterion is near or on the wrong side of this baseline range is unreliable, and is likely of little worth for predicting new observations.

The box plots in the oneway display show how each model performed across cross validation iterations. The center horizontal line of the plot is a measure of overall performance and the spread of the red box plot shows the variability of the predictions. Models with smaller spread have higher consistency. Hsu multiple comparisons with the best (MCB) comparison circles are displayed on the right. Use these as rough guides for determining which models are significantly different from the best one. Note that the independence assumptions behind MCB are violated in this case, so the comparisons are only approximate.

The tables below the plot provide various detailed statistics from the models. The Means table is useful for obtaining exact mean values of the performance criterion. Right click on the Mean title and click Sort by Column to sort the methods from best to worst.