Profilers > Profiler > Additional Examples of the Prediction Profiler > Example of Variable Importance for One Response
Publication date: 07/24/2024

Example of Variable Importance for One Response

This example uses the Assess Variable Importance option in the Prediction Profiler to assess which variables are important in predicting a response modeled by a neural network. This option is helpful for models, like neural networks, that do not accommodate formal hypothesis tests.

Note that your results might differ from, but should resemble, those shown here. There are two sources of random variability in this example. When you fit the neural network, k-fold cross validation is used. This partitions the data into training and validation sets at random. Also, Monte Carlo sampling is used to calculate the factor importance indices.

1. Select Help > Sample Data Folder and open Candy Bars.jmp.

2. Select Analyze > Predictive Modeling > Neural.

3. Select Calories from the Select Columns list and click Y, Response.

4. Select all other continuous columns from the Select Columns list and click X, Factor.

5. Click OK.

6. In the Neural Model Launch panel, select KFold from the list under Validation Method.

When you select KFold, the Number of Folds defaults to 5.

7. (Optional) Enter 123 next to Random Seed.

Note: Results vary due to the random nature of choosing a validation set in the Neural Network model. Entering the seed above enables you to reproduce the results shown in this example.

8. Click Go.

9. Click the red triangle next to Model NTanH(3) and select Profiler.

The Prediction Profiler is displayed at the very bottom of the report. Note the order of the factors for later comparison.

Because the factors are correlated, you take this into account by choosing Dependent Resampled Inputs as the sampling method for assessing variable importance.

10. Click the Prediction Profiler red triangle and select Assess Variable Importance > Dependent Resampled Inputs.

Figure 3.27 Dependent Resampled Inputs ReportĀ 

Dependent Resampled Inputs Report

The Variable Importance: Dependent Resampled Inputs report appears. Check that the Prediction Profiler cells have been reordered by the magnitude of the Total Effect indices in the report. In Figure 3.27, the Total Effect importance indices identify Total fat g and Carbohydrate g as the factors that have most impact on the predicted response (0.437 and 0.28, respectively). Similarly, Oz/pkg and Protein g have positive Total Effects, though slightly smaller in magnitude (0.185 and 0.112, respectively).

You might be interested in comparing the importance indices obtained assuming that the factors are correlated, with those obtained when the factors are assumed independent.

11. Click the Prediction Profiler red triangle and select Assess Variable Importance > Independent Resampled Inputs.

Figure 3.28 Independent Resampled Inputs ReportĀ 

Independent Resampled Inputs Report

The resampled inputs option makes sense in this example, because the distributions involved are not uniform. The Variable Importance: Independent Resampled Inputs report shows that Total fat g and Carbohydrate g are still identified as having the most impact on the predicted value (0.514 and 0.379, respectively). Similarly, Protein g and Oz/pkg have positive Total Effects, though significantly smaller in magnitude (0.05 and 0.03, respectively). Compared to Figure 3.27, the Total Effect importance indices for Total fat g and Carbohydrate are slightly increased, however, the same indices for Oz/pkg and Protein g are decreased drastically. This highlights the variations in Variable Importance measures based on the resampling method used.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).