JMP can compare two open data tables and report the differences between data, scripts, table variables, column names, column properties, and column attributes. The numbers of columns and rows in each data table are shown at the top of the Compare Data Tables window. In this example, the sample data tables Popcorn Trials.jmp and Popcorn.jmp are compared.
Figure 4.22 Basic Information about Data Tables
Columns with the same name are automatically matched. Lines are drawn between each matched column. You can also manually link two columns.
1. Select Help > Sample Data Folder and open Popcorn.jmp and Popcorn Trials.jmp.
2. Show Popcorn Trials.jmp and select Tables > Compare Data Tables.
3. From the With list, select Popcorn.jmp.
Popcorn Trials.jmp should automatically be selected from the Compare list.
4. In the Match Columns pane, select yield and yield1 and then click Link.
Figure 4.23 Manually Linked Columns
Columns that have the same name are automatically linked.
Note: If you click the Compare Data icon , the linked columns will not be compared. The values for that pair of columns will show up in the results, but they won’t be compared.
5. Click Compare.
The results are shown in the Data report.
Figure 4.24 Comparing Columns
The first eight rows are the same so they are not shown. The remaining rows (colored blue) are only in the second table, Popcorn.jmp.
6. In the Data report, deselect Hide rows with no differences.
Figure 4.25 Showing Rows with No Differences
The first eight rows are shown because they are the same in both data tables.
7. Deselect Hide columns with no differences.
Figure 4.26 Showing Columns with No Differences
Columns that are in both data tables and that match are shown. You might select this option to give more context to the matched data.
Note: If the data in a cell isn’t ‘completely shown, select the text and then view the data in the Cell Data box.
Flexible by Row
Searches for common rows to align. Consider this option for smaller data tables if you think that the data tables are nearly the same.
By Row
Compares rows one by one. Consider this option if you already know that the rows should line up row by row. The comparison will run much faster. However, if the default Flexible by Row was selected, and the comparison is taking too long, you might want to select this option.
Use ID Columns
Uses the selected ID column to compare rows. The rows of the data table are uniquely identified by the values of the ID column. Consider this option if the data tables are large, sorted differently, or have missing rows. You can select more than one column.
Ignore missing
Ignores missing data.
Allow Relative Error
Enables you to specify the relative error rate for numeric data. The numeric values are considered equal if they are within the relative error rate that you specify. The smaller the relative error rate, the more precise the comparison.
Ignore case
Disregards case when comparing text.
Ignore whitespace
Disregards whitespace when comparing text.
Show fuzzy differences
Shows differences in numeric and string data that are approximately the same. Works with the value in the Relative Error field to remove insignificant differences.
Hide columns with no differences
Shows or hides all matched columns.
Hide rows with no differences
Shows or hides rows that contain matching data.
Click the red triangle and select Compare Table Properties to see differences in table scripts and variables. Figure 4.27 shows that the table variables and scripts are different. To see the full variable or script, select the line and view the selected metadata.
In this example, both notes differ, and the reference variable and scripts are only in Popcorn.jmp. The Notes variable is selected so that you can see it contents in both data tables in the Selected Metadata box. The red shading indicates that the text is only in Popcorn.jmp. The blue shading indicates that the text is only in Popcorn Trials.jmp.
Figure 4.27 Different Table Variables
Deselect Show Diff to see the name of each data table and the complete contents of each Notes variable instead of only showing the differences.
Shortest run is the smallest number of consecutive characters that are required to be the same (between the two files) before you can declare the characters as a common subsegment. A common subsegment has no background color because it is present in both files. The shortest run is set to 3 to prevent subsegments that are too short from showing up as common, which is typically unhelpful. For example, a shortest run of 1 means that any single character that is in both files could match. This leads to many very short segments of common text and differences, which is usually not a good reading experience.
Click the red triangle and select Compare Column Attributes and Properties to see differences in column notes, value colors, and the like.
Figure 4.28 shows that column notes are different in Popcorn.jmp and Popcorn Trials.jmp. The yield1/yield column is selected so that you can see the complete note and differences between the two notes in the Selected Metadata box.
Figure 4.28 Comparing Column Attributes and Properties