The Quantile Range Outliers report in the Explore Outliers platform includes a set of controls and results organized on multiple tabs.
The Quantile Range Outliers controls enable you to specify how outliers are to be identified. An outlier is considered any value more than Q times the interquantile range from the lower and upper quantiles. You can adjust the value of Q and the size of the interquantile range.
Figure 21.6 Quantile Range Outliers Initial Options
Tail Quantile
The probability for the lower quantile that is used to calculate the interquantile range. The probability of the upper quantile is considered 1 - Tail Quantile. For example, a Tail Quantile value of 0.1 means that the interquantile range is between the 0.1 and 0.9 quantiles of the data. The default value is 0.1.
Q
The multiplier that determines the outlier threshold. Values that fall beyond Q times the interquantile range past the Tail Quantile or 1 - Tail Quantile values are identified as outliers. Large values of Q provide a more conservative set of outliers than small values. The default is 3.
Restrict search to integers
Restricts outlier values to only integer values. This setting limits the search for outliers in order to find industry-specific missing value codes and error codes.
Rescan
Rescans the data after outlier actions have been taken.
Tip: Press Ctrl and click Rescan to rescan across all open outlier methods.
Close
Closes the Quantile Range Outliers report.
Tip: Press Ctrl and click Close to close all outlier reports.
The Outliers by Column tab in the Quantile Range Outliers report contains a table of columns that have outliers found by the settings specified in the controls. The table shows values for the upper and lower quantiles along with their low and high thresholds. Values outside of these threshold limits are considered outliers. The number of outliers in each column is indicated. The values of each outlier are listed in the last column of the table. Outliers that occur more than once in a column are listed with their count in parentheses.
The Outliers by Column tab contains the following options that can be applied when one or more rows are selected in the outliers table:
Show only columns with outliers
Removes columns without outliers from the table in the Outliers by Column tab.
Identify Outliers in Table
Applies actions to the original data table for selected rows in the outlier summary table.
Select Rows
Selects the rows containing outliers.
Exclude Rows
Applies the exclude row state. Click Rescan to update the Quantile Range Outliers report.
Note: The Exclude Rows option is not supported within the Local Data Filter or with the Auto Recalc option turned on.
Color Cells
Colors the cells containing outliers. Low valued outliers are colored blue and high valued outliers are colored red.
Color Rows
Colors the rows containing outliers.
Clear Outliers in Table
Applies actions to the original data table for selected rows in the outlier summary table.
Add to Missing Value Codes
Adds outliers to the missing value codes column property. Use this option to identify known missing value or error codes within the data. Missing value and error codes are often integers and are sometimes a series of nines. Click Rescan to update the Quantile Range Outliers report.
Note: Add to Missing Value Codes is not available with Quantile Range Outliers if a By variable is specified in the launch window.
Change to Missing
Changes the outlier value to a missing value. Use caution when changing values to missing. Change values to missing only if the data are known to be invalid or inaccurate. Click Rescan to update the Quantile Range Outliers report.
Note: If the selected outlier has been added to the missing value codes, the outlier is not changed to a missing value.
Formula Columns
Creates a new formula column for each column to set outliers to missing. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.
Formula Script
Creates a script that is added to the data table. When the script is run, it creates a new formula column for each column to set outliers to missing. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.
The Outliers by Cell tab in the Quantile Range Outliers report contains a table of individual outliers found by the settings specified by the controls. The table shows the column name, row number, outlier distance and the actual value of the individual outliers. The outlier distance is a measure of how extreme an outlier is and is calculated using the following equation:
Outlier Distance =
where
x = the actual value of the outlier
m = median of the column the outlier is in
IQR = the interquantile range using the specified Tail Quantile
A larger outlier distance indicates a more extreme outlier.
The Outliers by Cell tab contains the following options that can be applied when one or more rows are selected in the outliers table:
Identify Outliers in Table
Applies actions to the original data table for selected rows in the outlier summary table.
Select Row and Column
Selects the rows and columns that correspond to the selected outliers.
Color Cells
Colors the cells of the outliers in the data table. Low valued outliers are colored blue and high valued outliers are colored red.
Clear Outliers in Table
Applies actions to the original data table for selected rows in the outlier summary table.
Add to Missing Value Codes
Adds the selected outliers to the missing value codes column property. Use this option to identify known missing value or error codes within the data. Missing value and error codes are often integers and are sometimes a series of nines. Click Rescan to update the Quantile Range Outliers report.
Note: Add to Missing Value Codes is not available with Quantile Range Outliers if a By variable is specified in the launch window.
Change to Missing
Changes the outlier value to a missing value. Use caution when changing values to missing. Change values to missing only if the data are known to be invalid or inaccurate. Click Rescan to update the Quantile Range Outliers report.
Note: If the selected outlier has been added to the missing value codes, the outlier is not changed to a missing value.
The Outliers by Row tab contains a table of rows that contain outliers found by the settings specified in the controls. The table shows the row number and the number of outliers that were found in that row.
The Outliers by Row tab contains the following options that can be applied when one or more rows are selected in the outliers table:
Identify Outliers in Table
Applies actions to the original data table for selected rows in the outlier summary table.
Select Rows
Selects the rows with outliers.
Exclude Rows
Applies the exclude row state. Click Rescan to update the Quantile Range Outliers report.
Note: The Exclude Rows option is not supported within the Local Data Filter or with the Auto Recalc option turned on.
Color Rows
Colors the rows containing outliers.
The Nines tab within the Quantile Range Outliers report shows a table of columns that contain probable missing value codes. These missing value codes are a series of nines (usually 9999) and are the highest number that is all nines and also higher than the upper quantile. If the count is high, it is likely that these outliers are actually missing value codes. If the count is very low, you should explore further to determine whether the value is an outlier or a missing value code. The table also includes the upper quantile value.
The Nines tab is shown only when probable missing value codes are identified.
The Nines tab contains the following options that can be selected when one or more rows are selected in the table:
Add Highest Nines to Missing Value Codes
Adds the selected outlier values to the missing value codes column property. You must click Rescan to update the Quantile Range Outliers report.
Note: Add Highest Nines to Missing Value Codes is not available with Quantile Range Outliers if a By variable is specified in the launch window.
Change Highest Nines to Missing
Replaces the selected outlier values with missing values in the data table.
Note: The first time you use choose an action (such as Change to Missing or Exclude Rows) to change your data, the alert window warns you to use the Save As command to save your data table as a new file to preserve a copy of your original data. When this window appears, click OK. If you decide to save your new data file, you will automatically be prompted to save the file with a new name.