Exploring and understanding outliers in your data is an important part of analysis. Outliers in data can be due to mistakes in data collection or reporting, measurement systems failure, or the inclusion of error or missing value codes in the data set. The presence of outliers can distort estimates. Therefore, any analyses that are conducted are biased toward those outliers. Outliers also inflate the sample variance. Sometimes retaining outliers in data is necessary, however, and removing them could underestimate the sample variance and bias the data in the opposite direction.
Whether you remove or retain outliers, you must locate them. There are many ways to visually inspect for outliers. For example, box plots, histograms, and scatter plots can sometimes easily display these extreme values. See Visualize Your Data in Discovering JMP.
The Explore Outliers tool provides four different options to identify, explore, and manage outliers in your univariate or multivariate data.
Quantile Range Outliers
Uses the quantile distribution of each column to identify outliers as extreme values. This tool is useful for discovering missing value or error codes within the data. This is the recommended method to begin exploring outliers in your data. See Quantile Range Outliers.
Robust Fit Outliers
Finds robust estimates of the center and spread of each column and identifies outliers as those far from those values. See Robust Fit Outliers.
Multivariate Robust Outliers
Uses the Multivariate platform with Robust option to find outliers based on the Mahalanobis distance from the estimated robust center. See Multivariate Robust Outliers.
Multivariate k-Nearest Neighbor Outliers
Finds outliers as values far from their k-nearest neighbors. See Multivariate k-Nearest Neighbor Outliers.