Publication date: 07/08/2024

Robust Fit Outliers Report

The Robust Fit Outliers report in the Explore Outliers platform includes a set of controls and results organized on multiple tabs.

Robust Fit Outliers Initial Options

The Robust Fit Outliers controls specify the method used for calculating the robust estimates and the multiplier K. Given a robust estimate of the center and spread, outliers are defined as those values that are K times the robust spread from the robust center.

Figure 21.7 Robust Fit Outliers Controls 

Robust Fit Outliers Controls

Huber

Uses Huber M-Estimation to estimate center and spread. This option is the default. See Huber and Ronchetti (2009).

Cauchy

Assumes a Cauchy distribution to calculate estimates for the center and spread. Cauchy estimates have a high breakdown point and are typically more robust than Huber estimates. However, if your data are separated into clusters, the Cauchy distribution tends to consider only the half of the data that is clustered more closely, ignoring the rest.

Quartile

Uses the median as the measure of center and the interquartile range (IQR) divided by 1.34898 as the measure of spread. Dividing the IQR by the 1.34898 factor results in the spread corresponding to one standard deviation if the data are normally distributed.

K Sigma

The multiplier that determines outliers as K times the spread away from the center. Large values of K provide a more conservative set of outliers than small values. The default is 4.

Rescan

Rescans the data after outlier actions have been taken.

Tip: Press Ctrl and click Rescan to rescan across all open outlier methods.

Close

Closes the Robust Fit Outliers panel.

Tip: Press Ctrl and click Close to close all outlier reports.

Outliers by Column

The Outliers by Column tab in the Robust Fit Outliers report contains a table with a row for each column selected in the launch window. The columns of the table depend on the technique that is used to estimate the center and spread of the data: Huber, Cauchy, or Quartile. For each technique, there is a column of the estimated center, the estimated spread, and the number of outliers based on the center and spread.

The Outliers by Column tab contains the following options that can be applied when on or more rows are selected in the outliers table:

Show only columns with outliers

Removes columns without outliers from the table in the Outliers by Column tab.

Identify Outliers in Table

Applies actions to the original data table for selected rows in the outlier summary table.

Select Rows

Selects the rows containing outliers.

Exclude Rows

Applies the exclude row state. Click Rescan to update the Robust Fit Outliers report.

Note: The Exclude Rows option is not supported within the Local Data Filter or with the Auto Recalc option turned on.

Color Cells

Colors the cells containing outliers. Low valued outliers are colored blue and high valued outliers are colored red.

Color Rows

Colors the rows containing outliers.

Clear Outliers in Table

Applies actions to the original data table for selected rows in the outlier summary table.

Add to Missing Value Codes

Adds the selected outliers to the missing value codes column property. Use this option to identify known missing value or error codes within the data. Click Rescan to update the Robust Fit Outliers report.

Note: Add to Missing Value Codes is not available with Robust Fit Outliers if a By variable is specified in the launch window.

Change to Missing

Changes the outlier value to a missing value. Click Rescan to update the Robust Fit Outliers report.

Formula Columns

Creates a new formula column for each column to set outliers to missing. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.

Formula Script

Creates a script that is added to the data table. When the script is run, it creates a new formula column for each column to set outliers to missing. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.

Outliers by Cell

The Outliers by Cell tab in the Robust Fit Outliers report contains a table of individual outliers found by the settings specified by the controls. The table shows the column name, row number, outlier distance and the actual value of the individual outliers. The outlier distance is a measure of how extreme an outlier is and is calculated using the following equation:

Outlier Distance = Equation shown here

where

x = the actual value of the outlier

c = the center of column that contains the outlier, measured by the specified outlier method (Huber, Cauchy, or Quartile)

s = the spread of the column that contains the outlier, measured by the specified outlier method (Huber, Cauchy, or Quartile)

A larger outlier distance indicates a more extreme outlier.

The Outliers by Cell tab contains the following options that can be applied when one or more rows are selected in the outliers table:

Identify Outliers in Table

Applies actions to the original data table for selected rows in the outlier summary table.

Select Row and Column

Selects the rows and columns that correspond to the selected outliers.

Color Cells

Colors the cells containing outliers. Low valued outliers are colored blue and high valued outliers are colored red.

Clear Outliers in Table

Applies actions to the original data table for selected rows in the outlier summary table.

Add to Missing Value Codes

Adds the selected outliers to the missing value codes column property. Use this option to identify known missing value or error codes within the data. Missing value and error codes are often integers and are sometimes a series of nines. Click Rescan to update the Robust Fit Outliers report.

Note: Add to Missing Value Codes is not available with Robust Fit Outliers if a By variable is specified in the launch window.

Change to Missing

Changes the outlier value to a missing value in the data table. Use caution when changing values to missing. Change values to missing only if the data are known to be invalid or inaccurate. Click Rescan to update the Robust Fit Outliers report.

Note: If the selected outlier has been added to the missing value codes, the outlier is not changed to a missing value.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).