For the latest version of JMP Help, visit JMP.com/help.


Publication date: 11/29/2021

Quantile Range Outliers

The Quantile Range Outliers method of outlier detection uses the quantile distribution of the values in a column to locate extreme values. Quantiles are useful for detecting outliers because there is no distributional assumption associated with them. Data are simply sorted from smallest to largest. For example, the 20th quantile is the value at which 20% of values are smaller. Extreme values are found using a multiplier of the interquantile range, the distance between two specified quantiles. For more information about how quantiles are computed, see Statistical Details for Quantiles in Basic Analysis.

The Quantile Range Outliers utility is also useful for identifying missing value codes stored within the data. As noted earlier, in some industries, missing values are entered as nines (such as 999 or 9999). This utility finds any nines greater than the upper quartile as suspected missing value codes. The utility then enables you to add those missing value codes as a column property in the data table.

Quantile Range Outliers Options

The Quantile Range Outliers panel enables you to specify how outliers are to be identified and how you want to manage them.

Figure 21.6 Quantile Range Outliers Window 

Quantile Range Outliers Window

An outlier is considered any value more than Q times the interquantile range from the lower and upper quantiles. You can adjust the value of Q and the size of the interquantile range.

Tail Quantile

The probability for the lower quantile that is used to calculate the interquantile range. The probability of the upper quantile is considered 1 - Tail Quantile. For example, a Tail Quantile value of 0.1 means that the interquantile range is between the 0.1 and 0.9 quantiles of the data. The default value is 0.1.

Q

The multiplier that determines the outlier threshold. Values that fall beyond Q times the interquantile range past the Tail Quantile or 1 - Tail Quantile values are identified as outliers. Large values of Q provide a more conservative set of outliers than small values. The default is 3.

Restrict search to integers

Restricts outlier values to only integer values. This setting limits the search for outliers in order to find industry-specific missing value codes and error codes.

Show only columns with outliers

Limits the list of columns in the report to those that contain outliers.

After the report is displayed using your specifications, there are many ways to act on these extreme values. You can select the outliers in a column by selecting the specified column in the Quantile Range Outliers report.

Tip: If no columns are selected in the report and you click one of the following buttons, a JMP Alert appears that enables you to select all of the columns.

Select Rows

Selects the rows of outliers in the selected columns in the data table.

Exclude Rows

Turns on the exclude row state for outliers in the selected columns in the Quantile Range Outliers Report. Click Rescan to update the Quantile Range Outliers report.

Color Cells

Colors the cells of the selected outliers in the data table.

Color Rows

Colors the rows containing outliers for the selected columns in the data table

Add to Missing Value Codes

Adds the selected outliers to the missing value codes column property. Use this option to identify known missing value or error codes within the data. Missing value and error codes are often integers and are sometimes a series of nines. Click Rescan to update the Quantile Range Outliers report.

Note: Add to Missing Value Codes is not available with Quantile Range Outliers if a By variable is specified in the launch window.

Change to Missing

Changes the outlier value to a missing value in the data table. Use caution when changing values to missing. Change values to missing only if the data are known to be invalid or inaccurate. Click Rescan to update the Quantile Range Outliers report.

Note: If the selected outlier has been added to the missing value codes, the outlier is not changed to a missing value.

Formula Columns

Creates a new formula column for each column that is specified in the launch window. Each new column contains the original column’s value if the value is within the outlier limits and is set to missing otherwise. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.

Formula Script

Creates a script that is added to the data table. When the script is run, it creates a new formula column for each column that is specified in the launch window. Each new column contains the original column’s value if the value is within the outlier limits and is set to missing otherwise. The new columns are prefixed or suffixed by a user specified name to distinguish them from the original columns. By default, the suffix is set to “Culled”.

Rescan

Rescans the data after outlier actions have been taken.

Note: Press Ctrl and click Rescan to rescan across all command groups.

Close

Closes the Quantile Range Outliers panel.

Note: Press Ctrl and click Close to close all command windows.

Quantile Range Outliers Report

The Quantile Range Outliers report lists all columns with the outliers found using the specified options. The report shows values for the upper and lower quantiles along with their low and high thresholds. Values outside of these threshold limits are considered outliers. The number of outliers in each column is indicated. The values of each outlier are listed in the last column of the report. Outliers that occur more than once in a column are listed with their count in parentheses. To remove columns without outliers from the report, select Show only columns with outliers.

There are several things to look for when reading this report.

Error codes. For some continuous data, suspiciously high integer values are likely to be error codes. For example, if your upper and lower quantile values are all less than 0.5, outliers such as 1049 or -777 are likely to be error codes.

Zeros. Sometimes zeros can indicate missing values. If the majority of your data is reasonably large and you notice zeros as outliers, they are likely to be due to missing data.

Nines Report

The Nines report within the Quantile Range Outliers window shows a list of columns that contain probable missing value codes. These missing value codes are a series of nines (usually 9999) and are the highest number that is all nines and also higher than the upper quantile. If the count is high, it is likely that these outliers are actually missing value codes. If the count is very low, you should explore further to determine whether the value is an outlier or a missing value code. The Nines Report includes the upper quantile value.

This report is displayed only when probable missing value codes are identified.

Add Highest Nines to Missing Value Codes

Adds the selected outlier values to the missing value codes column property. You must click Rescan to update the Quantile Range Outliers report.

Note: Add Highest Nines to Missing Value Codes is not available with Quantile Range Outliers if a By variable is specified in the launch window.

Change Highest Nines to Missing

Replaces the selected outlier values with missing values in the data table.

Note: The first time you use choose an action (such as Change to Missing or Exclude Rows) to change your data, the alert window warns you to use the Save As command to save your data table as a new file to preserve a copy of your original data. When this window appears, click OK. If you decide to save your new data file, you will automatically be prompted to save the file with a new name.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).