Publication date: 07/08/2024

Regex Function

Regex() searches for a pattern within a source string and returns a string. It simply identifies a pattern in a string or transforms a string into another string.

Regex(source, pattern, (<replacement string>, <GLOBALREPLACE>), <format>, <IGNORECASE>);

IGNORECASE disregards case. GLOBALREPLACE repeats the match until the entire string is processed. format is a backreference to the matched group. Regex() returns missing if the match fails.

Example of Matching a String

bus|car is the regular expression (in quotation marks because it is a string). The expression means match “bus” or “car”.

sentence = "I took the bus to work.";
vehicle = Regex( sentence, "bus|car" );

"bus"

Examples of Replacing a String

The third optional argument in Regex() is a specification of the result string. The default value, \0, is a backreference to everything that was matched by the regular expression. In the preceding example, the word “bus” is matched in sentence. The default third argument, \0, replaces the entire sentence with “bus”.

A more interesting variation uses parentheses to create additional backreferences.

sentence = "I took the bus to work.";
Regex( sentence, "(.*) bus (.*)", "\1 car \2" );

"I took the car to work."

The (.*) before and after bus are part of the regular expression. The parentheses create a capturing group. The . matches any character. The * matches zero or more of the previous expression. As a result, the first parenthesis pair matches everything before bus, and the second parenthesis pair matches everything after bus. The third argument, \1 car \2, reassembles the text; it leaves out bus and substitutes car.

See Backreferences and Capturing Groups.

Example of Global Replacement

GLOBALREPLACE changes the behavior of Regex(). If the match succeeds, the entire source string is returned with substitutions made for each place where the pattern matches. If there are no matches, an unchanged source string is returned.

sentence = "I took the red bus followed by the blue bus to get to work today.";
Regex( sentence, "bus", "car", GLOBALREPLACE);

"I took the red car followed by the blue car to get to work today."

You can also use backreferences. This example starts with a different sentence.

sentence = "I took the red bus followed by the blue car to get to work today.";
Regex(
	sentence,
	"(\w*) (bus|car)",
	"bicycle (not \2) that was \1",
	GLOBALREPLACE
);

"I took the bicycle (not bus) that was red followed by the bicycle (not car) that was blue to get to work today."

The \w* matches zero or more word characters and becomes backreference 1 because of the parentheses. bus|car becomes backreference 2 because of the parentheses. The third argument, bicycle (not \2) that was \1, describes how to build the substitution text for the part of the source text that was matched.

Notice how the backreferences can be used to swap data positions. This might be useful for swapping the position of first names and last names.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).