Pareto Chart
What is a Pareto chart?
A Pareto chart shows the ordered frequency counts for levels of a nominal variable.
How are Pareto charts used?
Pareto charts help people decide which problems to solve first. They are useful for identifying the most frequent outcome of a categorical variable.
See how to create a Pareto chart using statistical software
- Download JMP to follow along using the sample data included with the software.
- To see more JMP tutorials, visit the JMP Learning Library.
Pareto charts show the ordered frequency counts of data
A Pareto chart is a special example of a bar chart. For a Pareto chart, the bars are ordered by frequency counts from highest to lowest. These charts are often used to identify areas to focus on first in process improvement.
Pareto charts show the ordered frequency counts of values for the different levels of a categorical or nominal variable. The charts are based on the “80/20” rule. This rule says that about 80% of the problems are the result of 20% of causes. This rule is also called the “vital few and trivial many.” Again, the idea is that you can focus on a vital few root causes of the problem and ignore the trivial many.
Figure 1 is an example of a Pareto chart. The chart shows the types of findings from an audit of business processes. The most common finding is that a standard operating procedure (SOP) was not followed.
What is the difference between Pareto charts and bar charts?
As mentioned above, a Pareto chart is a special example of a bar chart. For a Pareto chart, the bars are ordered from highest to lowest. For a bar chart, the ordering is not forced from high to low. Bar charts often use alphabetical ordering or some other logical order.
Figure 2 shows a bar chart for the same audit data as the Pareto chart in Figure 1.
Although you can still use the bar chart to identify the most frequent problem, it is not as effective for that purpose as the Pareto chart.
Pareto chart example
Most people use software to create Pareto charts. Some tools allow you to add custom features.
The Pareto chart for the audit findings in Figure 1 above showed the basic results. To help make decisions, you can add a note to the chart as shown in Figure 3 below.
Adding a cumulative frequency line
Pareto charts can also include a line for the cumulative frequency. Figure 4 shows a cumulative frequency line added to the results from the audit.
The cumulative percent curve and the cumulative percent axis are on the right. The first two findings account for about 75% of all findings. (Keep in mind that the 80/20 rule is approximate.) Here, the business is likely to focus on the first two findings. This example also uses colors to highlight the top two findings.
Combining categories with few responses
In addition to a “vital few” categories, some data will also have a lot of “trivial many” categories. Figure 5 shows results from an investigation of complaints about a Help Desk.
We can see that the Pareto chart shows several types of complaints with only a few responses. With JMP, we can combine the causes for bars 6 through 9. The Pareto chart in Figure 6 shows the results of combining these causes into an “Other” category.
A different color is used for the last bar that combines multiple causes into the Other category. When combining categories, the best practice is to place the combined category as the last bar. JMP does this automatically. This approach highlights the fact that the bar is comprised of combined categories and avoids mixing the combined bar in with the bars for individual causes. For the Help Desk data, it's clear the focus needs to be on determining the root cause for the first three types of complaints.
Packed bar chart
When a variable has many categories, the Pareto chart may become too wide for useful visualization. One solution is to combine categories into an Other category, as shown in Figure 6. An alternative is to use a packed bar chart.
Pareto charts and types of data
Pareto charts make sense for data with counts for values of a nominal variable. Pareto charts are not a good option for data that have values for a continuous variable.
With ordinal data, a type of categorical data, the sample is divided into groups with a defined order. For example, in a survey where you are asked to give your opinion on a scale from “Strongly Disagree” to “Strongly Agree,” your responses are ordinal. A Pareto chart is not likely to be useful here, because it orders the data by frequency counts and not by the defined order for the variable.