Process Description

Learning Curve Model Comparison

How do you determine whether you have a sufficient number of observations to adequately predict an outcome? One way to begin addressing this problem is simply to try fitting different models on different sized data sets, one at a time, and compare the results. However, with very wide data sets it is easy to overfit your data, so some type of validation is recommended. Plus, given the huge number of possible modeling variations, it can quickly become overwhelming to compare more than just a few models.

The Learning Curve Model Comparison process constructs and compares learning curves on predictive model settings that you select. Learning curves plot predictive performance and use cross validation to evaluate a model using different sample sizes, thereby revealing the influence of sample size on accuracy and variability of the model.

What do I need?

To run the Learning Curve Model Comparison, your Input Data Set must be in the wide format. The appropriate data import engine as well as each of the predictive modeling processes to be used in the comparison must be configured and the settings saved in one settings folder. Finally, an output folder must be created into which all of the resulting data sets, analyses, graphics, and other output are placed.

It is assumed that you are familiar with the Introduction to Predictive Modeling processes, have settled upon one or more of them to compare, and have saved specific settings (see Saving and Loading Settings) for each of the models to be compared.

Important: The Mode parameter (found on the Analysis tab) for each process must be set to Automate to allow processing with SAS code rather than using the interactive JMP mode.

A saved setting can be edited either in the dialog for that process or in the Learning Curve Model Comparison process itself. If you are not familiar with the individual processes that you want to use, consult the specific chapters for those processes for more information.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The output generated by this process is summarized in a Tabbed report. Refer to the Learning Curve Model Comparison output documentation for detailed descriptions and guides to interpreting your results.