Predictive and Specialized Modeling > Explore Outliers > Example of Explore Outliers
Publication date: 07/08/2024

Example of Explore Outliers

Use Explore Outliers to identify outliers within a select group of columns of a data set. The identified outliers can then be examined using the Distribution platform.

1. Select Help > Sample Data Folder and open the Probe.jmp sample data table.

2. Select Analyze > Screening > Explore Outliers.

3. Click the triangle next to Responses(387/0) to show all of the columns in the group.

4. Select columns VDP_M1 through VDP_SICR and click Y, Columns. There should be 14 columns selected.

Figure 21.2 Explore Outliers Launch Window 

Explore Outliers Launch Window

5. Click OK.

6. Click Quantile Range Outliers.

The Quantile Range Outliers report uses tabs to organize the results. The Outliers by Column tab of the Quantile Range Outliers report shows each column and lists the number and identity of the outliers found.

7. In the Outliers by Column tab of the Quantile Range Outliers report, select Show only columns with outliers. This limits the list of columns to only those that contain outliers.

Note that several columns contain outlier values of 9999. Many industries use nines as a missing value code.

8. Click the Nines tab.

9. In the Nines report, select each column.

10. Click Add Highest Nines to Missing Value Codes.

A JMP Alert indicates that you should use the Save As command to preserve your original data.

11. Click OK.

12. In the Quantile Range Outliers report, click Rescan.

13. Select Restrict search to integers.

In continuous data, integer values are often error codes or other coded data values. Notice that no additional error codes are included in this set of columns.

14. Deselect Restrict search to integers.

Examine the Data

Return to the Outliers by Column tab of the report.

1. Select all of the remaining columns.

2. Click Select Rows.

3. Select Analyze > Distribution.

4. Assign the selected columns to the Y, Columns role. Because you selected these column names in the Quantile Range Outliers report, they are already selected in the Distribution launch window. Simply click Y, Columns.

5. Click OK.

Figure 21.3 Distribution of Columns with Outliers Selected 

Distribution of Columns with Outliers Selected

In columns VDP_M1 and VDP_PEMIT, notice that some of the selected outliers are somewhat close to the majority of data. For the rest of the columns, the selected outliers appear distant from the majority of data. Now that you have investigated the data points, you can decide which columns to exclude from your analyses.

Refine Excluded Outliers

Return to the Outliers by Column tab of the report.

1. Hold Ctrl and deselect columns VDP_M1 and VDP_PEMIT.

2. With the remaining columns selected in the report, click Exclude Rows.

3. Change Q to 20.

4. Click Rescan.

5. Select columns VDP_M1 and VDP_PEMIT in the report.

6. Click Select Rows.

Reexamine the Data

1. Examine the Distributions report again. Notice the selected outliers are now separate enough from the majority of the data to select and exclude them from your analyses.

2. In the Quantile Range Outliers report, click Exclude Rows.

3. In the Distributions report, click the Distributions red triangle and select Redo > Redo Analysis.

Figure 21.4 Distributions of Columns with Outliers Excluded 

Distributions of Columns with Outliers Excluded

The displays of the distributions of the data are now more informative without the outliers.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).