Robust estimates of parameters are less sensitive to outliers than non-robust estimates. Robust Fit Outliers provides several types of robust estimates of the center and spread of your data to determine thresholds for identifying outliers.
Figure 21.7 Robust Fit Outliers Window
Given a robust estimate of the center and spread, outliers are defined as those values that are K times the robust spread from the robust center. The Robust Fit Outliers window provides several options for calculating the robust estimates and multiplier K as well as provides tools to manage the outliers found.
Huber
Uses Huber M-Estimation to estimate center and spread. This option is the default. See Huber and Ronchetti (2009).
Cauchy
Assumes a Cauchy distribution to calculate estimates for the center and spread. Cauchy estimates have a high breakdown point and are typically more robust than Huber estimates. However, if your data are separated into clusters, the Cauchy distribution tends to consider only the half of the data that makes closer clusters, ignoring the rest.
Quartile
Uses the interquartile range (IQR) to estimate the spread. The estimate for the center is the median. The estimate for spread is the IQR divided by 1.34898. Dividing the IQR by this factor makes the spread correspond to one standard deviation if the data are normally distributed.
K
The multiplier that determines outliers as K times the spread away from the center. Large values of K provide a more conservative set of outliers than small values. The default is 4.
Show only columns with outliers
Limits the list of columns in the report to those that contain outliers.
Once the report is displayed using your specifications, there are many ways to explore these extreme values. You can select the outliers in a row by selecting the specified row in the Robust Estimates and Outliers report.
Tip: If no columns are selected in the report and you click one of the following buttons, a JMP Alert appears that enables you to select all of the columns.
Select Rows
Selects the rows containing outliers for the selected columns in the data table.
Exclude Rows
Sets the Exclude Row state for outliers in the selected columns in the data table. Click Rescan to update the Robust Estimates and Outliers report.
Color Cells
Colors the cells of the selected outliers in the data table.
Color Rows
Colors the rows containing outliers for the selected columns in the data table.
Add to Missing Value Codes
Adds the selected outliers to the missing value codes column property for the selected columns. Use this option to identify known missing value or error codes within the data. Click Rescan to update the Robust Estimates and Outliers report.
Note: Add to Missing Value Codes is not available with Robust Fit Outliers if a By variable is specified in the launch window.
Change to Missing
Changes the outlier value to a missing value in the data table. Click Rescan to update the Robust Estimates and Outliers report.
Formula Columns
Creates a new formula column for each column that is specified in the launch window. Each new column contains the original column’s value if the value is within the outlier limits and is set to missing otherwise. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.
Formula Script
Creates a script that is added to the data table. When the script is run, it creates a new formula column for each column that is specified in the launch window. Each new column contains the original column’s value if the value is within the outlier limits and is set to missing otherwise. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.
Rescan
Rescans the data after outlier actions have been taken.
Note: Press Ctrl and click Rescan to rescan across all command groups.
Close
Closes the Robust Fit Outliers panel.
Note: Press Ctrl and click Close to close all command windows.