Profilers > Profiler > Statistical Details for the Prediction Profiler > Statistical Details for Extrapolation Control Metrics
Publication date: 07/08/2024

Image shown hereStatistical Details for Extrapolation Control Metrics

The Extrapolation Control option in the Prediction Profiler has two metrics that are used to determine whether a point is an extrapolation. The type of metric used depends on the type of model fit.

Leverage

In models that are fit in the Standard Least Squares personality of the Fit Model platform, the leverage at the factor settings is used as the default extrapolation metric.

The leverage of the ith observation, hii, is the ith diagonal entry of the matrix X(XX)-1X, sometimes called the hat matrix. The leverage for a new prediction point is calculated as hpred = xpred(XX)-1xpred. The following two criteria can be used to determine whether a prediction with leverage hpred is an extrapolation:

hpred > K × max(hii), where K is a customizable multiplier

hpred > L × p/n, where L is a customizable multiplier, p is the number of variables, n is the number of observations, and p/n is the average leverage

You can use the Set Threshold Criterion option to specify which criterion is used and the value of the multiplier. The default values of the multipliers are K = 1 and L = 3.

Note: Extrapolation control on profilers run from the graph menu using a saved least squares model do not implement the leverage methodology. Instead, the Regularized Hotelling's T2 methodology is used.

Regularized Hotelling’s T2

In models other than least squares models, the Regularized Hotelling’s T2 value is used as the default extrapolation metric. The T2 value for the training data and T2 values for the prediction points are calculated as follows:

Equation shown here

Equation shown here

where Equation shown here is the Schafer and Strimmer regularized covariance matrix estimator estimated on the training data. The target matrix used for the Schafer Strimmer estimator is a diagonal covariance matrix. See Schafer and Strimmer (2005). In platforms that train models using observations with missing values, the covariance matrix is estimated with pairwise deletion.

Note: Categorical variables are converted to indicator variables for these calculations.

The calculation of the threshold depends on the number of nonmissing T2 values computed on the training data.

If there are ten or more nonmissing T2 values, the threshold is set as follows:

Equation shown here

where

K is a customizable multiplier and is set to 3 by default

Equation shown here is the standard deviation of the T2 values.

If there are less than ten nonmissing T2 values, the threshold is set using an F distribution quantile equivalent to a Kσ limit.

Equation shown here

where

q= Φ(K)

Φ(·) is the standard normal distribution

K is a customizable multiplier and is set to 3 by default

Equation shown here

p is the number of parameters

n is the number of nonmissing T2 values

K Nearest Neighbors

If you select K Nearest Neighbors as the Extrapolation Type Option, k nearest neighbor distances are used to calculate both the extrapolation metric and the threshold. The following notation is used for this method.

Equation shown here= the matrix of standardized predictors

xi = the ith point in the data

n = number of observations

p = number of predictors

k = number of near neighbors

d(x, x) = the Euclidean distance between two points

Equation shown here = the kth nearest neighbor of the ith point, xi

For the factor settings defined by x, the extrapolation metric is d(x, x(1)). This is the distance between the point defined by the factor settings and it’s first nearest neighbor in the data. The threshold is set using the following equation:

Equation shown here

where

Equation shown here is the mean of the pairwise distances between all points and their k neighbors

Equation shown here is the standard deviation of the pairwise distances between all points and their k neighbors.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).