Scripting Guide > Types of Data > Pattern Matching > Parse Strings in Fixed Fields
Publication date: 07/08/2024

Parse Strings in Fixed Fields

Sometimes data is in fixed fields. The Pat Tab(), Pat R Tab(), Pat Len(), Pat Pos(), and Pat R Pos() functions make it easy to split out the fields in a fixed field string. Pat Tab() and Pat R Tab() work from the left and right end of the string and take a number as their argument. They succeed by matching forward to the specified tab position. For example:

p = Pat Pos(10) + Pat Tab(15);

Pat Pos(10) matches the null string if it is in position 10. So at match time, the matcher works its way forward to position 10, then Pat Tab(15) matches text from the current position (10) forward to position 15. This pattern is equivalent to Pat Pos(10)+patLen(5). Another example:

p = Pat Pos(0) + Pat R Tab(0);

This example matches the entire string, from 0 characters from the start to 0 characters from the end. the Pat Rem() function takes no argument and is shorthand for Pat R Tab(0); it means the remainder of the string. Pattern matching can also be anchored to the beginning of the string like this:

Pat Match( "now is the time", Pat Len(15) + Pat R Pos(0), NULL, ANCHOR );

The above pattern uses NULL rather than a replacement value, and ANCHOR as an option. Both are uppercase, as shown. NULL means that no replacement is done. ANCHOR means that the match is anchored to the beginning of the string. The default value is UNANCHORED.

Patterns can be built up like this, but this is not recursive:

p = "a" | "b"; // matches one character
p = p + p; // two characters
p = p + p; // four characters
Pat Match( "babb", Pat Pos(0) + p + Pat R Pos(0) );

A recursive pattern refers to its current definition using Expr():

p = "<" + Expr(p) + "*" + Expr(p) + ">" | "x";
Pat Match( "<<x*<x*x>>*x>", Pat Pos(0) + p + Pat R Pos(0) );

Remember, expr() is the procrastination function; when the pattern is assigned to the variable p, expr() delays evaluating its argument (p) until later. In the next statement, patMatch performs the pattern match operation, and each time it encounters expr(), it looks for the current value of the argument. In this example, the value does not change during the match). So, if p is defined in terms of itself, how can this possibly work?

p consists of two alternatives. The right hand choice is easy: a single letter x. The left side is harder: <p*p> . Each p could be a single letter x, since that is one of the choices p could match, or it could be <p*p>. The last few example have used patPos(0) + ... + patRPos(0) to make sure the pattern matches the entire source text. Sometimes this is what you want, and sometimes you would rather the pattern match a subtext. If you are experimenting with these examples by changing the source text, you probably want to match the entire string to easily tell what was matched. The result from Pat Match is 0 or 1.

This example uses “Left” recursion:

x =  Expr(x) + "a" | "b"; // + binds tighter than |

If the pattern is used in FULLSCAN mode, it eventually uses up all memory as it expands. By default, the patMatch function does not use FULLSCAN, and makes some assumptions that allow the recursion to stop and the match to succeed. The pattern matches either a “b”, or anything the pattern matches followed by an “a”.

rc = Pat Match( "baaaaa", x );
Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).