The Explore Outliers tool provides four different options to identify, explore, and manage outliers. Exploring and understanding outliers in your data is an important part of analysis. Outliers in data can be due to mistakes in data collection or reporting, measurement systems failure, the inclusion of error or missing value codes in the data set, or simply an unusual value. The presence of outliers can distort estimates and bias results toward those outliers.
Outliers also inflate the sample variance. Sometimes retaining outliers in data is necessary, however, and removing them could underestimate the sample variance and bias the data in the opposite direction.
Whether you remove or retain outliers, it is a good practice to locate them. There are many ways to visually inspect for outliers. For example, box plots, histograms, and scatter plots can easily display these extreme values. See Visualize Your Data in Discovering JMP.
The Explore Outliers tool provides the following options:
Univariate
There are two options for exploring outliers in your univariate data.
Quantile Range Outliers
Uses the quantile distribution of each column to identify outliers as extreme values. This tool is useful for discovering missing value or error codes within the data. This is the recommended method to begin exploring outliers in your data. See Quantile Range Outliers.
Robust Fit Outliers
Finds robust estimates of the center and spread of each column and identifies outliers as those data points that are far from those values. See Robust Fit Outliers.
Multivariate
There are two options for exploring outliers in your multivariate data.
Robust PCA Outliers
Decomposes data into a low-rank matrix and residuals and uses the residuals to detect outliers. See Robust PCA Outliers.
K Nearest Neighbor Outliers
Identifies outliers as values that are far from their k-nearest neighbors. See K Nearest Neighbor Outliers.