Scripting Guide > Programming Methods > Identify Differences Between Strings, Lines, or Sequences
Publication date: 07/08/2024

Identify Differences Between Strings, Lines, or Sequences

The Shortest Edit Script() JSL function compares two strings, lines, or sequences and returns a list of changes or a matrix that describes the changes. You might use Shortest Edit Script() to identify differences between two columns, lists, or matrices. The function describes a (not the) shortest list of instructions to convert sequence A into sequence B. There might be more than one shortest script.

Example of Sequences

t1 = New Table( "t1",
	New Column( "Column 1",
		Numeric,
		Continuous,
		Format( "Best", 12 ),
		Set Values( [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7] )
	)
);
 
t2 = New Table( "t2",
	New Column( "Column 1",
		Numeric,
		Continuous,
		Format( "Best", 12 ),
		Set Values( [2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 6] )
	)
);
EditScript = Shortest Edit Script( // compares column 1 in each table
	sequences(
		N Rows( t1 ),
		N Rows( t2 ),
		Function( {a, b}, // subscripts the columns
			t1:column 1[a] == t2:column 1[b] // in data tables t1 and t2
		)
	)
);

[-1 1 . 1, // delete 1 item at position 1 in string a

0 2 1 4, // keep 4 items at position 2 in string a, position 1 in string b

-1 6 . 2, // delete 2 items at position 6 in string a

1 . 5 4, // add 4 items at position 5 in string b

0 8 9 5, // keep 5 items at position 8 in string a, position 9 in string b

-1 13 . 1] // delete 1 item at position 13 in string a

Here are the columns in the matrix:

Column 1: delete the items (-1), keep the common items (0), or add the items (1)

Column 2: the position in string a

Column 3: the position in string b

Column 4: the number of items the instruction uses

A missing value indicates that the position isn’t used in the comparison.

Example of Separators

The following example considers “@” and “$” to be separators.

aa = "this is$a test of@shortest$edit script lines$with several words";
bb = "this is a$test of$shortest$edit script lines with several@words";
 

// @ and $ separators

Shortest Edit Script( lines( aa, bb, separators( "@$" ) ) );

{{"Remove", "this is$a test of@"}, {"Insert", "this is a$test of$"}, {"Common",

"shortest$"}, {"Remove", "edit script lines$with several words"}, {"Insert",

"edit script lines with several@words"}}

See Shortest Edit Script( A, B ) in the JSL Syntax Reference.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).