The Term and Phrase Lists report contains tables of terms and phrases found in the text after tokenization has occurred. See Figure 12.8 for an example of the Term and Phrase Lists report. The Count column in the Term List indicates the number of occurrences of the term in the corpus. The Count column in the Phrase List indicates the number of occurrences of the phrase in the corpus; the N column indicates the number of words in the phrase.
By default, the Terms List is sorted in descending count order; terms that are tied in count are sorted alphabetically. The Phrases List is sorted in descending count order; phrases that are tied in count are then sorted in descending length (N) order. Further ties in the Phrases List are sorted alphabetically. The sort order of each list can be changed to alphabetical sorting using the options in each list.
The phrases that appear in the Phrase List are determined by the settings of the Maximum Words per Phrase and Maximum Number of Phrases options in the launch window. Phrases that occur only one time in the data table do not appear in the Phrase List.
Phrases can be specified as terms at various scopes. Phrases in the Phrase List that have been specified as terms are colored based on the scope of the phrase specification (Table 12.1). For more information about specifying phrases in different scopes, see Term Options Management Windows.
Scope |
Color |
---|---|
Built-in |
Red |
User Library |
Green |
Project |
Blue |
Column Property |
Orange |
Local |
Gray |
You can access options in the Term List and Phrase List tables by selecting items and then right-clicking in the left-most column of each table. You can save each table as a data table by right-clicking in the Count column of each table and selecting Make into Data Table.
When you right-click in the Term column of the Term List table, a pop-up menu appears with the following options:
Select Rows
Selects rows in the data table that contain the selected terms.
Show Text
Shows the documents that contain the selected terms.
Note: By default, only the first 10,000 documents are shown. If there are more than 10,000 documents that contain a selected term, a window appears that enables you to increase this limit.
Alphabetical Order
Specifies the sort order of the Term List. When this option is selected, the terms are sorted in alphabetical order. When this option is not selected, the terms are sorted in descending Count order.
Numerical Order
(Available only when the Alphabetical Order option is selected.) Specifies the sort order of the Term List. When this option is selected, the items are split into string and numeric segments, and the numeric segments are then sorted in numerical order. For more information about the sorting rules used by the Numerical Order option, see Numerical Order in Using JMP.
Copy
Places the selected terms onto the clipboard.
Color
Enables you to assign a color to the selected terms.
Label
Places labels on the corresponding points in the Term SVD Plot for the selected terms.
Containing Phrases
Selects the phrases in the Phrase List table that contain the selected terms.
Save Indicators
Saves an indicator column to the data table for each term selected in the Term List. The value of the indicator column for each row is 1 if the document in that row contains the term and 0 otherwise.
Save Formula
Saves a column formula to the data table for each term selected in the Term List. The column formula for each row evaluates to 1 if the document in that row contains the term and 0 otherwise. This is useful for new documents.
Recode
Enables you to change the values for one or more terms. Select the terms in the list before selecting this option. After you select this option, the Recode window appears. See Recode Data in a Column in Using JMP.
Add Stop Word
Adds the selected terms to the list of stop words and removes those terms from the Term List. This action also updates the Phrase List.
Note: If you add a stemmed word as a stop word, all of the tokens that correspond to that stem are added as stop words.
Add Stem Exception
(Available only when the Language option is set to English, German, Spanish, French, or Italian.) Adds the selected terms to the list of terms that are excluded from stemming.
Remove Phrase
(Available only when a specified phrase is selected in the Term List.) Removes the selected phrase from the set of specified phrases and updates the Term Counts accordingly.
Note: If a phrase as been added as a Sentiment Phrase, the Remove Phrase option also removes the phrase from the list of sentiment terms in the current Sentiment Analysis report.
Add Sentiment
(Available only when a Sentiment Analysis report is open in the current report window.) Adds the selected terms to the list of sentiment terms in the current Sentiment Analysis report.
Note: If you add a stemmed word as a sentiment term, all of the tokens that correspond to that stem are added as sentiment terms.
Show Filter
Shows or hides a search filter above the Term List. See Search Filter Options.
Make into Data Table
Creates a JMP data table from the report table.
Make Combined Data Table
Searches the report for other tables like the one you selected and combines them into a single JMP data table.
When you right-click in the Phrase column of the Phrase List table, a pop-up menu appears with the following options:
Select Rows
Selects rows in the data table that contain the selected phrases.
Show Text
Shows the documents that contain the selected phrases.
Save Indicators
Saves an indicator column to the data table for each phrase selected in the Phrase List. The value of the indicator column for each row is 1 if the document in that row contains the phrase and 0 otherwise.
Alphabetical Order
Specifies the sort order of the Phrase List. When this option is selected, the terms are sorted in alphabetical order. When this option is not selected, the terms are sorted in descending Count order.
Numerical Order
(Available only when the Alphabetical Order option is selected.) Specifies the sort order of the Phrase List. When this option is selected, the items are split into string and numeric segments, and the numeric segments are then sorted in numerical order. For more information about the sorting rules used by the Numerical Order option, see Numerical Order in Using JMP.
Copy
Places the selected phrases onto the clipboard.
Select Contains
Selects larger phrases in the Phrase List that contain the selected phrase.
Select Contained
Selects smaller phrases in the Phrase List and terms in the Term List that are contained by the selected phrase.
Add Phrase
Adds the selected phrases to the Term List and updates the Term Counts accordingly.
Add Stop Word
Adds the selected phrases to the list of stop words. This action also updates the Term List.
Add Sentiment Phrase
(Available only when a Sentiment Analysis report is open in the current report window.) Adds the selected phrases to the Term List and to the list of sentiment terms in the current Sentiment Analysis report.
Show Filter
Shows or hides a search filter above the Phrase List. See Search Filter Options.
Make into Data Table
Creates a JMP data table from the report table.
Make Combined Data Table
Searches the report for other tables like the one you selected and combines them into a single JMP data table.
Click the down arrow button next to the search box to refine your search.
Contains Terms
Returns items that contain a part of the search criteria. A search for “ease oom” returns messages such as “Release Zoom”.
Contains Phrase
Returns items that contain the exact search criteria. A search for “text box” returns entries that contain “text” followed directly by “box” (for example, “Context Box” and “Text Box”).
Starts With Phrase
Returns items that start with the search criteria.
Ends With Phrase
Returns items that end with the search criteria.
Whole Phrase
Returns items that consist of the entire string. A search for “text box” returns entries that contain only “text box”.
Regular Expression
Enables you to use the wildcard (*) and period (.) in the search box. Searching for “get.*name” looks for items that contain “get” followed by one or more words. It returns “Get Color Theme Names”, “Get Name Info”, and “Get Effect Names”, and so on.
Invert Result
Returns items that do not match the search criteria.
Match All Terms
Returns items that contain both strings. A search for “t test” returns elements that contain either or both of the search strings: “Pat Test”, “Shortest Edit Script” and “Paired t test”.
Ignore Case
Ignores the case in the search criteria.
Match Whole Words
Returns items that contain each word in the string based on the Match All Terms setting. If you search for “data filter”, and Match All Terms is selected, entries that contain both “data” and “filter” are returned.