The Test Set Model Comparison process enables you to compare the relative abilities of different predictive models to make consistent, valid predictions. It does this by computing performance metrics for one or more test sets for each of the models selected and then displays the results, side-by-side, in a pair of graphs.
It is assumed that you are familiar with the Predictive Modeling processes, have settled upon one or more of them to compare, and have saved specific settings (see
Saving and Loading Settings) for each of the models to be compared.
A saved setting can be edited either in the dialogs for that process or in the
Test Set Model Comparison process itself. If you are not familiar with the individual processes that you want to use, consult the specific chapters for those processes for more information.
At least two SAS data sets are needed to run the Test Set Model Comparison. The first is the
training data set. This is the primary data set you are modeling and it is specified as the
Input Data Set for each of the models to be compared.
In addition to your primary data set, you must specify one or more test data sets. These are the data sets you are using to evaluate the effectiveness of each of the predictive models for making predictions on your data. Test data sets must be saved in one folder and are specified on the
Test Sets tab of this process.
Settings for running the Nicardipine data set described in Nicardipine through each of the predictive processes (
Discriminant Analysis,
Distance Scoring,
General Linear Model Selection,
K Nearest Neighbors,
Logistic Regression,
Partial Least Squares,
Partition Trees, and
Radial Basis Machine) are included with JMP Clinical. These settings are located in the default
Settings folder located within the JMP Clinical directory (typically
C:\Program Files\SASHome\JMPGenomics\15\Genomics\Settings). Each of these individual predictive models and settings were described previously in this manual. The default settings for each predictive model were modified, as described
below, for use in this example.
To generate the training and test data sets used in this example, the samplegmdata_numgeno.sas7bdat data set was divided into two subsets. The first subset, which contained the records for individuals 1 through 400, was saved as the
samplegmdata_numgeno_train.sas7bdat training data set. Data for individuals 401 through 611 were saved in a new
samplegmdata_numgeno_test.sas7bdat test data set.
Important: Both the model comparison and respective main method setting files for any
sample settings that you run must be placed in your user
WorkflowResults folder
1 before you run them. If you ever clear this folder, you should replenish it with the setting files from the
Settings folder
2.