Publication date: 07/08/2024

Character Functions

In the JMP Formula Editor, you can create a formula that accepts character arguments or returns character strings and converts the data type of a value from numeric to character, or character to numeric. When you create these formulas, note that:

Character functions can result in either character or numeric data. If you calculate a data type different from the one specified, the data type of the computed column is automatically changed to match the result.

Arguments that are literal character strings must be enclosed in quotation marks.

See Character Functions in the Scripting Guide for more information about syntax.

Char

Produces a character string that corresponds to the digits in its numeric argument. For example, Char(1.123) evaluates as 1.123. See Numbers and Strings in the Scripting Guide.

Collapse Whitespace

Trims leading and trailing whitespace and replaces interior whitespace with single space. That is, if more than one white space character is present, the Collapse Whitespace command replaces the two spaces with one space.

Concat ||

Concatenates character strings to produce a new string with the function’s second character argument appended to the first. For example, "Dr." || " " || name produces a new string consisting of the title Dr. followed by a space and the contents of the name string.

Contains

Returns the numeric position within the first argument of the first instance of the second argument, if it exists. The second argument can contain one or more characters. If the second argument does not exist, Contains returns a zero. For example, Contains("Veronica Layman", "ron") evaluates as 3. Contains("Lillie Layman", "L") evaluates as 1. Contains("Lillie Layman", "Veronica") evaluates as 0. The third argument, offset, is optional. Offset is a numeric value that specifies the search starting position. Contains("Lillie Layman", "L", 5) evaluates as 8 as the search starts at the 5th position, skipping the first “L”. If offset is negative, Contains searches backward from offset from the end of the string.

Munger

Computes new character strings from existing strings by inserting or deleting characters. It can also produce substrings, calculate indices, and perform other tasks depending on how you specify its arguments. The Munger function treats uppercase and lowercase letters as different characters.

Text is a character expression. Munger applies the other three arguments to this string to compute a result.

Offset is a numeric expression indicating the starting position to search in the string. If Offset is greater than the position of the first instance of the find argument, the first instance is disregarded.

Find/Length is a character or numeric expression. Use a character string as search criterion, or use a positive integer to return that number of consecutive characters starting from the Offset position. If you specify a negative integer as the Length value, Munger returns all characters from the Offset through to the end of the string.

Replace (optional argument) can be a string or unspecified. If it is a string and the Find/Offset value is numeric, Munger replaces the search criterion with the Replace string to form the result. If the Find/Offset value is numeric and no string is specified, Munger calculates a substring. If the Find/Length value is a character string, Munger always returns the numeric offset, disregarding the Replace value if it exists. To insert the Replace argument, click any argument in the Munger function and then click the insert button. Press Delete or click the Delete button (Image shown here) on the Formula Editor keypad to remove the Replace argument.

Lowercase, Uppercase

The Lowercase function converts any uppercase character found in its argument to the equivalent lowercase character. For example, Lowercase("VERONICA LAYMAN") evaluates as veronica layman. The Uppercase function converts any lowercase character found in its argument to the equivalent uppercase character. For example, Uppercase("Veronica Layman") evaluates as VERONICA LAYMAN.

Length

Calculates the length of its argument. For example, Length("Veronica") evaluates as 8. If the argument is

a string, length returns the number of characters;

a list, length returns the number of items in the list;

a blob (binary object), the number of bytes.

Num

Produces a numeric value that corresponds to its character string argument when the character string consists of numbers only. If a character string contains a non-numeric value, the result is a missing value. For example, Num(“1.123”) evaluates as 1.123.

Substr

Extracts the characters that are the portion of the first argument. Begins at the position given by the second argument, and ends based on the number of characters specified in the third argument. The first argument can be either a character column or a literal value. The starting argument and the length argument can be numbers of expressions that evaluate to numbers. For example, to show the first name only, Substr("Veronica Layman", 10, 6) starts at position 10 and reads through position 15, which yields Layman.

If start is negative, Substr searches backward from start from the end of the string. If length is negative or absent, Substr returns a string that begins with start and continues to the end of the text string.

Substr can also be used with lists.

Titlecase

Converts the string to title case, that is, an initial uppercase character and subsequent lowercase characters. For example, Titlecase(“Veronica layman”) results in Veronica Layman.

Trim

Produces a new character string from its argument, removing any leading and trailing whitespace. The second argument determines if whitespace is removed from the left, the right, or both ends of the string. If no second argument is used, whitespace is removed from both ends. For example, Trim("john ") evaluates as john. Trim(" john ", both) also evaluates as john.

Word

Extracts the nth word from a character string. One or more spaces define where each word begins and ends unless the optional delimiters argument is specified. For example, Word(2, "Veronica Layman") returns the word Layman.

To insert the delimiters argument, click any argument in the Word function and then click the insert button Image shown here on the Formula Editor keypad. Press Delete or click the Delete button Image shown here on the Formula Editor keypad to remove the delimiters argument. If you do not specify a delimiter, space is used as the delimiter. If you define the delimiter as an empty string, each character is treated as a separate word.

Most special characters act as single delimiters. You can enter any character or set of characters to act as a word delimiter. For example, to extract the last name in the following example, use a comma and blank together as the delimiting characters and ask for the first word. Word(1, "Layman, Veronica", ", ") returns the word Layman.

Words

Extracts the words from text according to the delimiters listed in the optional second argument. The default delimiter is space. For example, Words("the quick brown fox") returns {"the","quick","brown","fox"}.

If you include a second argument, any and all characters in that argument are taken to be delimiters. For example, Words("Doe, Jane P.",", .") returns {"Doe","Jane","P"}.

To insert the delimiters argument, click any argument in the Words function and then click the insert button Image shown here on the Formula Editor keypad. Press Delete or click the Delete button Image shown here on the Formula Editor keypad to remove the delimiters argument. If you do not specify a delimiter, white space is used as the delimiter. If you define the delimiter as an empty string, each character is treated as a separate word.

Left, Right

Returns a substring of the left-most or right-most n characters of the string text, respectively. Both functions also work with lists.

Starts With, Ends With

Returns 1 if whole begins or ends with part, respectively. Returns 0 otherwise. Both functions also work with lists.

Item

Is different than the Word function because of the way it treats word delimiters. If a delimiter is found multiple times, or you enter a delimiter with multiple characters, the Word function treats them as a single delimiter. The Item function uses each delimiter to define a new word position. To compare, suppose a name is of the form lastname, firstname. The delimiter is a comma followed by a blank, such as:

Item(2, "Layman, Veronica", ", ")

Word(2, "Layman, Veronica", ", ")

The Item function returns a missing value because it treats the comma and blank separately and finds nothing between them. The Word function treats the comma and blank as a single delimiter and finds Veronica as the second word.

If you do not specify a delimiter, white space (blank space) is used as the delimiter. If you define the delimiter as an empty string, each character is treated as a separate item.

Hex and Hex to Number

Converts between hex and hex and numbers.

Hex returns the hexadecimal representation of its argument. If the argument is character (in quotes), then the result is a character string twice as long containing the hexadecimal codes for the character values. For example, Hex("A") returns the string 41.

If the argument is numeric and “integer” is specified, the Hex function returns an 8-hexadecimal-character representation of the integer returned. For example, Hex(12, “integer”) returns the string 0000000C.

Hex to Number converts hexadecimals to numbers.

See Hexadecimal and BLOB Functions in the Scripting Guide.

Repeat

Creates a string that is the first argument repeated the number of times specified by the second argument. The first argument can be either a character literal, a character variable, or a character expression. For example, Repeat(“Katie”, 3) creates KatieKatieKatie.

A third argument applies when Repeat is used in a JSL script to repeat a matrix. When the first argument is a matrix, the second argument is the rowwise repeat and the third argument is the columnwise repeat.

Insert

Insert inserts a new item into the list or expression at the given position. If position is not given, it is inserted at the end.

Reverse

Reverse reverses the characters in the string.

Substitute

The first argument is a string, the second is a pattern, and the third is a replacement string. Substitute finds all matches to the pattern in the string, and replaces them with the replacement string.

Regex

The first argument is the source string that Regex searches for a match to the pattern. The second argument is the pattern, in the form of a regular expression. The Formula Editor prompts you for these two required arguments.

Tip: For more information about using regular expressions, search the Internet for regular expression tutorial.

By default, Regex performs a case-sensitive search and returns the parts of the source string that match the pattern that you specified (or returns MISSING if the match fails). There are two optional arguments that you can add. You can type a third argument—the format—that specifies the string to return. If you choose, you can use regular expressions to specify replacement text in the returned string. If you specify the third argument, you can also specify IGNORECASE so that Regex ignores capitalization when searching the source string for a match.

Table A.1 Regex Examples

Sample Regex function

String that is returned

Regex( "@ q3 #", "([a-z])([0-9])" )

q3

The function is case sensitive, so q3 matches but Q3 would not.

Regex( "@ Q3 #", "([a-z])([0-9])", "\0",IGNORECASE)

Q3

Although \0 is the default argument, it is required in this example so that IGNORECASE can be specified.

Regex( "@ Q3 #", "([a-z])([0-9])", "\2\1",IGNORECASE)

3Q

For more information and an example that you can run, select Help > Scripting Index and do a search for Regex.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).