PROC FASTCLUS Options

Enter SAS PROC FASTCLUS options in this field to override default parameters for the clustering.

You can specify any PROC FASTCLUS option using the following syntax:

Option=x|y

where:

•

Option is the PROC FASTCLUS option,

•

= is used when a condition is applied to the option,

•

x is the first numeric or character value(s) or condition(s) that modifies the option,

•

y represents one or more subsequent numeric or character value(s) or condition(s) that modifies the option, and

•

the | character is used to delimit individual conditions.

Examples of commonly used PROC FASTCLUS options are listed in the following table:

Option

Definition

BINS=n

This option specifies the number of bins used in the bin-sort algorithm for computing medians for LEAST=1. By default, PROC FASTCLUS uses from 10 to 100 bins, depending on the amount of memory available. Larger values use more memory and make each iteration somewhat slower, but they can reduce the number of iterations. Smaller values have the opposite effect. The minimum value of n is 5.

CONVERGE=c

This option specifies the convergence criterion. Any nonnegative value is permitted. The default value is 0.0001 for all values of p if LEAST= p is explicitly specified. Otherwise, the default value is 0.02. Iterations stop when the maximum relative change in the cluster seeds is less than or equal to the convergence criterion and additional conditions on the homotopy parameter, if any, are satisfied. (See the HP= option.) The relative change in a cluster seed is the distance between the old seed and the new seed divided by a scaling factor. If you do not specify the LEAST= option, the scaling factor is the minimum distance between the initial seeds. If you specify the LEAST= option, the scaling factor is an L1 scale estimate and is recomputed on each iteration. Specify the CONVERGE= option only if you specify a MAXITER= value greater than 1.

DELETE=n

This option deletes cluster seeds to which n or fewer observations are assigned. Deletion occurs after processing for the DRIFT option is completed and after each iteration specified by the MAXITER= option. Cluster seeds are not deleted after the final assignment of observations to clusters, so in rare cases a final cluster might not have more than members. The DELETE= option is ineffective if you specify MAXITER=0 and do not specify the DRIFT option. By default, no cluster seeds are deleted.

DRIFT

After initial seed selection, each observation is assigned to the cluster with the nearest seed. After an observation is processed, the seed of the cluster to which it is assigned is recalculated as the mean of the observations currently assigned to the cluster. Thus, the cluster seeds drift about rather than remaining fixed for the duration of the pass.

HC=c

This option specifies the criterion for updating the homotopy parameter. The homotopy parameter is updated when the maximum relative change in the cluster seeds is less than or equal to c. The default is the minimum of 0.01 and 100 times the value of the CONVERGE= option.

HP=p1

This option specifies p1 as the initial value of the homotopy parameter. The default is 0.05 if the modified Ekblom-Newton method is used. Otherwise, it is 0.25.

IMPUTE

This option requests imputation of missing values after the final assignment of observations to clusters. If an observation that is assigned (or would have been assigned) to a cluster has a missing value for variables used in the cluster analysis, the missing value is replaced by the corresponding value in the cluster seed to which the observation is assigned (or would have been assigned). If the observation cannot be assigned to a cluster, missing value replacement depends on whether the NOMISS option is specified. If NOMISS is not specified, missing values are replaced by the mean of all observations in the DATA= data set having a value for that variable. If NOMISS is specified, missing values are replaced by the mean of only observations used in the analysis. (A weighted mean is used if a variable is specified in the WEIGHT statement.) For information about cluster assignment see the section OUT= Data Set. If you specify the IMPUTE option, the imputed values are not used in computing cluster statistics.

IRLS

This option causes PROC FASTCLUS to use an iteratively reweighted least squares method instead of the modified Ekblom-Newton method. If you specify the IRLS option, you must also specify LEAST=, where 1 < p < 2. Use the IRLS option only if you encounter convergence problems with the default method.

LEAST=p|MAX

This option optimizes an Lp criterion, where 1 < p < .

MAXITER=n

This option specifies the maximum number of iterations for recomputing cluster seeds. In each iteration, each observation is assigned to the nearest seed, and the seeds are recomputed as the means of the clusters.

NOMISS

This option excludes observations with missing values from the analysis. However, if you also specify the IMPUTE option, observations with missing values are included in the final cluster assignments.

REPLACE=FULL|PART|NONE|RANDOM

This option specifies how seed replacement is performed.

•

Specify FULL to request default seed replacement.

•

Specify PART to request seed replacement only when the distance between the observation and the closest seed is greater than the minimum distance between seeds.

•

Specify NONE to suppress seed replacement.

•

Specify RANDOM to select a simple pseudo-random sample of complete observations as initial cluster seeds.

SEED=SAS data set

This option specifies an input data set from which initial cluster seeds are to be selected. If you do not specify the SEED= option, initial seeds are selected from the DATA= data set. The SEED= data set must contain the same variables that are used in the data analysis.

STRICT=s

This option prevents an observation from being assigned to a cluster if its distance to the nearest cluster seed exceeds the value of the STRICT= option. If you specify the STRICT= option without a numeric value, you must also specify the RADIUS= option, and its value is used instead. In the OUT= data set, observations that are not assigned due to the STRICT= option are given a negative cluster number, the absolute value of which indicates the cluster with the nearest seed.

VARDEF=DF|N WDF|WEIGHT|WGT

This option specifies the divisor to be used in the calculation of variances and covariance.

To Specify One or More PROC FASTCLUS Options:

Type specific PROC FASTCLUS options in the PROC FASTCLUS Options field.

Separate individual options with a space.

For Additional Information

Refer to the PROC FASTCLUS documentation for more information.