For the latest version of JMP Help, visit JMP.com/help.


Basic Analysis > Text Explorer > Additional Example of the Text Explorer Platform
Publication date: 06/21/2023

Image shown hereAdditional Example of the Text Explorer Platform

This example examines aircraft incident reports from the National Transportation Safety Board for events occurring in 2001 in the United States. You want to explore the text that contains a description of the results of the investigation into the cause of each incident. You also want to find themes in the collection of incident reports.

1. Select Help > Sample Data Folder and open Aircraft Incidents.jmp.

2. Select Rows > Color or Mark by Column.

3. Select Fatal from the columns list and click OK.

The rows that contain accidents involving fatalities are colored red.

4. Select Analyze > Text Explorer.

5. Select Narrative Cause from the Select Columns list and click Text Columns.

6. From the Language list, select English.

7. From the Stemming list, select Stem All Terms.

8. From the Tokenizing list, select Basic Words.

9. Click OK.

Figure 12.14 Text Explorer Report for Narrative Cause 

Text Explorer Report for Narrative Cause

From the report, you see that there are almost 51,000 tokens and about 1,900 unique terms.

10. Right-click pilot· in the term list and select Select Rows.

From the number of selected rows in the data table, you see that some form of the word “pilot” occurs in more than 1,300 of the incident reports.

11. Right-click pilot· and select Add Stop Word.

Because some form of the word “pilot” occurs frequently compared to other terms, these terms do not provide much information to differentiate among documents. All of the terms that stem to pilot· are added to the stop word list.

Image shown hereThe remaining steps of this example can be completed only in JMP Pro.

12. Image shown hereClick the red triangle next to Text Explorer for Narrative Cause and select Latent Semantic Analysis, SVD.

This is the first analysis step toward topic analysis, which performs a rotation of the SVD.

13. Image shown hereIn the Specifications window, type 50 for Minimum Term Frequency.

Because there are approximately 51,000 tokens, this frequency is equivalent to a term that represents at least 0.1% of all the terms.

14. Image shown hereClick OK.

Figure 12.15 SVD Plots for Narrative Cause 

SVD Plots for Narrative Cause

There is not a lot of difference in the document SVD plot between fatal and non-fatal incidents.

15. Image shown hereClick the red triangle next to SVD Centered and Scaled TF IDF and select Topic Analysis, Rotated SVD.

You want to look for groups of terms that form topics.

16. Image shown hereType 5 for Number of Topics.

17. Image shown hereClick OK.

Figure 12.16 Top Loadings by Topics for Narrative Cause 

Top Loadings by Topics for Narrative Cause

The terms for each topic with the highest loadings enable you to interpret whether the topic is capturing a theme in the incident reports.

For example, Topic 1 has high loadings for power, loss, and engine, indicating a theme of losing power to the engine as a cause of the incident. This corresponds to the phrase “loss of engine power” occurring 273 times in the set of incident reports.

Based on the words with high loadings in Topic 2, it can be described as being related to incidents that involved darkness or low altitude.

18. Image shown hereClick the gray disclosure icon next to Topic Scores Plots.

Figure 12.17 Topic Scores Plots for Narrative Cause 

Topic Scores Plots for Narrative Cause

Each topic score plot contains a point for each document in the corpus. You can select points in these plots to further examine the text of specific documents.

You want to further explore the subject matter of Topic 2.

19. Image shown hereSelect the three right-most points in the Topic 2 plot and click the Show Text button at the top left of the graph.

The text of the three documents with the highest scores in Topic 2 appear in a new window. From these, you can confirm that Topic 2 relates to low altitude.

At this stage of the text analysis, you have many choices in how to proceed. Text analysis is an iterative process, so you might use topic information to further curate your term list by adding stop words or specifying phrases. You might save the weighted document-term matrix, the vectors from the SVD or rotated SVD as numeric columns in your data table and use them in other JMP analysis platforms. When you use these columns in other platforms, you can also include other columns from your data table in further analyses.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).