Use the Explore Outliers utility to identify outliers that can then be examined using the Distribution platform. The Probe.jmp sample data table contains 387 characteristics (the Responses column group) measured on 5800 semiconductor wafers. The Lot ID and Wafer Number columns uniquely identify the wafer. You are interested in identifying outliers within a select group of columns of the data set.
1. Select Help > Sample Data Library and open the Probe.jmp sample data table.
2. Select Analyze > Screening > Explore Outliers.
3. Click the triangle next to Responses(387/0) to show all of the columns in the group.
4. Select columns VDP_M1 through VDP_SICR and click Y, Columns. There should be 14 columns selected.
Figure 21.2 Explore Outliers Launch Window
5. Click OK.
6. Click Quantile Range Outliers.
The Quantile Range Outliers report shows each column and lists the number and identity of the outliers found.
7. In the Quantile Range Outliers report, select Show only columns with outliers. This limits the list of columns to only those that contain outliers.
Note that several columns contain outlier values of 9999. Many industries use nines as a missing value code.
8. In the Nines report, select each column.
9. Click Add Highest Nines to Missing Value Codes.
A JMP Alert indicates that you should use the Save As command to preserve your original data.
10. Click OK.
11. In the Quantile Range Outliers report, click Rescan.
12. Select Restrict search to integers.
In continuous data, integer values are often error codes or other coded data values. Notice that no additional error codes are included in this set of columns.
13. Deselect Restrict search to integers.
1. Select all of the remaining columns in the Quantile Range Outliers report.
2. Click Select Rows.
3. Select Analyze > Distribution.
4. Assign the selected columns to the Y, Columns role. Because you selected these column names in the Quantile Range Outliers report, they are already selected in the Distribution launch window.
5. Click OK.
Figure 21.3 Distribution of Columns with Outliers Selected
In columns VDP_M1 and VDP_PEMIT, notice that some of the selected outliers are somewhat close to the majority of data. For the rest of the columns, the selected outliers appear distant from the majority of data. Investigate the data points and exclude them from your analyses.
1. In the Quantile Range Outliers report, hold Ctrl and deselect columns VDP_M1 and VDP_PEMIT.
2. With the remaining columns selected in the report, click Exclude Rows.
3. Change Q to 20.
4. Click Rescan.
5. Select columns VDP_M1 and VDP_PEMIT in the report.
6. Click Select Rows.
1. Examine the Distributions report again. Notice the selected outliers are now separate enough from the majority of the data to select and exclude them from your analyses.
2. In the Quantile Range Outliers report, click Exclude Rows.
3. In the Distributions report, click the Distributions red triangle and select Redo > Redo Analysis.
Figure 21.4 Distributions of Columns with Outliers Excluded
The displays of the distributions of the data are now more informative without the outliers.