Standardization Method

Select a method for standardization using PROC STDIZE.

Since variables with large variances tend to have more effect on the distance measure than those with small variances, it is recommended that you standardize the variables before the computation of the distance measure.

The following table summarizes standardization methods:

Method

Description

STD

Standardizes values to the mean with a scale equivalent to the standard deviation.

MEAN

Standardizes values to the mean with a scale equivalent to one (1).

MEDIAN

Standardizes values to the median with a scale equivalent to one (1).

SUM

Standardizes values to zero (0) with a scale equivalent to the sum of the values.

EUCLEN

Standardizes values to zero (0) with a scale equivalent to the Euclidean length.

USTD

Standardizes values to zero (0) with a scale equivalent to the standard deviation about the origin.

RANGE

Standardizes values to the minimal value with a scale equivalent to the range of the values.

MIDRANGE

Standardizes values to the midrange value with a scale equivalent to the range of the values divided by two (2).

MAXABS

Standardizes values to zero (0) with a scale equivalent to the maximum absolute value.

IQR

Standardizes values to the median with a scale equivalent to the interquartile range.

MAD

Standardizes values to the median with a scale equivalent to the mean absolute deviation from the median.

ABW(4.685)

Standardizes values to the biweight 1-step M-estimate with a scale equivalent to the biweight A-estimate.

Note: 4.685 is the default numeric tuning constant used in this method.

AHUBER(1.345)

Standardizes values to the Huber 1-step M-estimate with a scale equivalent to the Huber A-estimate.

Note: 1.345 is the default numeric tuning constant used in this method.

AWAVE(1)

Standardizes values to the mean with a scale equivalent to the Wave A-estimate.

Note: 1 is the default numeric tuning constant used in this method.

AGK(0.1)

Standardizes values to the mean with a scale equivalent to the AGK estimate.

Note: 0.1 is the default numeric constant giving the proportion of pairs to be included in the estimation of the within-cluster variances.

SPACING(0.1)

Standardizes values to the mid-minimum spacing with a scale equivalent to the minimum spacing.

Note: 0.1 is the default numeric constant giving the proportion of data to be contained in the spacing.

L(1)

Standardizes values to the L(1) value with a scale equivalent to the L(1) value.

Note: 0.1 is the default numeric constant specifying the power to which differences are to be raised in computing an L(p) or Minkowski metric.

DATASET

Reads from a SAS data set. The data set must contain either:

a _TYPE_ variable. The observation that contains the location measure corresponds to the value _TYPE_= ’LOCATION’, and the observation that contains the scale measure corresponds to the value _TYPE_= ’SCALE’.
location and scale variables specified by the LOCATION and SCALE statements.

Refer to the SAS Documentation on Standardization Methods for more information.

To Specify a Standardization Method:

8 Select the desired method from the Standardization Method drop-down list.