The
Test Set Model Comparison
process enables you to compare the relative abilities of different predictive models to make consistent, valid predictions. It does this by computing performance metrics for one or more test sets for each of the models selected and then displays the results, side-by-side, in a pair of graphs.
It is assumed that you are familiar with the
Predictive Modeling
processes, have settled upon one or more of them to compare, and have saved specific settings (see
Saving and Loading Settings
) for each of the models to be compared.
A saved setting can be edited either in the
dialogs
for that process or in the
Test Set Model Comparison
process itself. If you are not familiar with the individual processes that you want to use, consult the specific chapters for those processes for more information.
At least two SAS data sets are needed to run the
Test Set Model Comparison
. The first is the
training data set
. This is the primary data set you are modeling and it is specified as the
Input Data Set
for each of the models to be compared.
In addition to your primary data set, you must specify one or more
test data sets. These are the data sets you are using to evaluate the effectiveness of each of the predictive models for making predictions on your data. Test data sets must be saved in one folder and are specified on the
Test Sets
tab of this process.
Settings for running the Nicardipine data set described in
Nicardipine
through each of the predictive processes (
Discriminant Analysis
,
Distance Scoring
,
General Linear Model Selection
,
K Nearest Neighbors
,
Logistic Regression
,
Partial Least Squares
,
Partition Trees
, and
Radial Basis Machine
) are included with JMP Clinical. These settings are located in the default
Settings
folder located within the JMP Clinical directory (typically
C:\Program Files\SASHome\JMP\10\Life Sciences\Settings
). Each of these individual predictive models and settings were described previously in this manual. The default settings for each predictive model were modified, as described
below
, for use in this example.
To generate the training and test data sets used in this example, the
adsl_dii.sas7bdat
data set, which contains observations on 906 patients and is included with JMP Clinical, was divided into two equivalent subsets, each containing the data on 453 patients. The first subset, which contained the records for patients 1 through 453, was saved as the
adsl_dii_training_set.sas7bdat
data set. Data for patients 454 through 906 were saved in a new
adsl_dii_test_set.sas7bdat
data set. Both data sets were saved in a new
TSMC
folder placed in the
Sample Data\Nicardipine
folder.
Important
: Both the model comparison and respective main method setting files for any
sample settings
that you run must be placed in your user
WorkflowResults
folder
1
before you run them. If you ever clear this folder, you should replenish it with the setting files from the
Settings
folder
2
.