JMP 13.2 Online Documentation (English)
Discovering JMP
Using JMP
Basic Analysis
Essential Graphing
Profilers
Design of Experiments Guide
Fitting Linear Models
Predictive and Specialized Modeling
Multivariate Methods
Quality and Process Methods
Reliability and Survival Methods
Consumer Research
Scripting Guide
JSL Syntax Reference
JMP iPad Help
JMP Interactive HTML
Capabilities Index
JMP 12 Online Documentation
Basic Analysis
•
Text Explorer
•
Overview of the Text Explorer Platform
• Text Processing Steps
Previous
•
Next
Text Processing Steps
The text is processed in three stages: tokenizing, phrasing, and terming.
Tokenizing Stage
The Tokenizing stage performs the following operations:
1.
Convert text to lowercase.
2.
Apply Tokenizing method (either Basic Words or Regex) to group characters into tokens.
3.
Recode tokens based on specified recode definitions. Note that recoding occurs before stemming.
Phrasing Stage
The Phrasing stage collects phrases that occur in the corpus (collection of documents) and enables you to specify that individual phrases be treated as terms. Phrases cannot start or end with a stop word, but they can contain a stop word.
Terming Stage
The Terming stage creates the Term List from the tokens and phrases that result from the previous stages.
For each token, the Terming stage performs the following operations:
1.
Check that the minimum and maximum length requirements specified in the launch window are met. Tokens that contain only numbers are excluded from this operation.
2.
Check that the token is qualified to become a term; tokens parsed by the Basic Words tokenization method must contain at least one alphabetical or Unicode character. Tokens that contain only numbers are excluded from this operation. The Regex tokenization method uses regular expressions to determine what characters are part of a token.
3.
Check that the token is not a stop word.
4.
Apply stemming and stem exceptions.
For each phrase that you add, the Terming stage performs the following operations:
1.
Add the phrase to the Term List. Phrases should apply stemming to each word in the phrase that is stemmed in the Term List. Phrases that have different raw tokens but the same stems are combined in the Term List.
2.
Remove token term occurrences that appear in the phrase.
Previous
•
Next
Help created on 9/19/2017