You can use associative arrays to quickly and efficiently perform other tasks.
Note: The strategy described here is not useful for columns with floating point numbers. Use Summarize instead. See Store Summary Statistics in Global Variables in the Data Tables section.
A key can exist only once in an associative array, so putting a column’s values into one automatically results in the unique values. For example, the Big Class.jmp sample data table contains 40 rows. To see how many unique values are in the column height, run this script:
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
unique heights = Associative Array( dt:height );
nitems( unique heights );
17
There are only 17 unique values for height. You can use those unique values by getting the keys:
unique heights << Get Keys;
{51, 52, 55, 56, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70}
Note: This is possible because you can use any JMP data type as keys in an associative array, not only strings.
Using an associative array to discover unique values in a column is efficient and fast. The following script takes some time to create a data table with 100,000 rows. Finding the 39 unique values takes very little time.
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
nms = dt:name << Get Values;
dtbig = New Table( "Really Big Class",
New Column( "name",
Character,
Set Values( nms[J( 100000, 1, Random Integer( N Items( nms ) ) )] )
)
);
Wait( 0 );
t1 = Tick Seconds();
Write(
"\!N# names from Really Big Class = ",
N Items( Associative Array( dtbig:name ) ),
", elapsed time=",
Tick Seconds() - t1
);
# names from Really Big Class = 39, elapsed time=0.116666666639503
Because keys are ordered lexicographically, putting the values into an associative array also sorts them. For example, the <<Get Keys message returns the keys (unique values of the names column) in ascending order:
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
unique names = Associative Array( dt:name );
unique names << Get Keys;
{"ALFRED", "ALICE", "AMY", "BARBARA", "CAROL", "CHRIS", "CLAY", "DANNY", "DAVID", "EDWARD", "ELIZABETH", "FREDERICK", "HENRY", "JACLYN", "JAMES", "JANE", "JEFFREY", "JOE", "JOHN", "JUDY", "KATIE", "KIRK", "LAWRENCE", "LESLIE", "LEWIS", "LILLIE", "LINDA", "LOUISE", "MARION", "MARK", "MARTHA", "MARY", "MICHAEL", "PATTY", "PHILLIP", "ROBERT", "SUSAN", "TIM", "WILLIAM"}
Using associative arrays, determining which values in one column are not in another column (or determining which values are in both columns) is fast. For example, given two data tables with information about countries, which countries are in both data tables?
Place the columns of each data table that contain country names into associative arrays:
dt1 = Open( "$SAMPLE_DATA/BirthDeathYear.jmp" );
dt2 = Open( "$SAMPLE_DATA/World Demographics.jmp" );
aa1 = Associative Array( dt1:Country );
aa2 = Associative Array( dt2:Territory );
Use N Items() to see how many countries appear in each data table:
N Items(aa1);
23
N Items(aa2);
239
Use the <<Intersect message to find the common values:
aa1 = Associative Array( dt1:Country );
aa1 << Intersect( aa2 );
Look at the results:
Show(N Items(aa1), aa1 << Get Keys);
N Items(aa1) = 21;
aa1 << get keys = {"Australia", "Austria", "Belgium", "France", "Greece", "Ireland", "Israel", "Italy", "Japan", "Mauritius", "Netherlands", "New Zealand", "Norway", "Panama", "Poland", "Portugal", "Romania", "Switzerland", "Tunisia", "United Kingdom", "United States"};
This example uses a set operation called intersection. For more examples of using set operations with associative arrays to compare values, see Associative Arrays in Set Operations.