Types of Graphs
Below is a list of several types of graphs that can be used in exploratory data analysis (EDA). Click on each one to see an example of that type of graph, the number of variables that graph uses and a description of its purpose.
Histograms
- Number of variables: 1.
- Displays the shape or distribution of data; may help identify outliers.
- Learn more about histograms.
Side-by-Side Histograms
- Number of variables: 2.
- Displays the shapes or distributions for groups of data; may help identify outliers.
Bar Charts
- Number of variables: 1.
- Displays the frequency count of values for a categorical variable; may be vertical (as shown below in Figure 3) or horizontal.
- Learn more about bar charts.
Grouped Bar Charts
- Number of variables: 2 or more, depending on how many variables are used to define groups.
- Displays bar charts for groups defined by another variable. Grouped bar charts have a separate chart within each level of the grouping variable.
Stacked Bar Charts
- Number of variables: 2 or more, depending on how many variables are used to define groups.
- Displays bar charts for groups defined by another variable. Stacked bar charts have a single bar for each level of the grouping variable. Colors or patterns for counts of another variable are stacked in each bar.
Pareto Charts
- Number of variables: 1.
- Displays ordered frequency counts for a variable. Useful for highlighting the “vital few.” A type of bar chart, Pareto charts often include a cumulative percent curve.
- Learn more about Pareto charts.
Packed Bar Charts
- Number of variables: 1.
- Displays ordered frequency counts for a variable. Used instead of a Pareto chart, especially when there are many categories. Useful for highlighting the “vital few.”
- Learn more about packed bar charts.
Mosaic Plots
- Number of variables: 2 or more.
- Shows possible relationships between categorical variables. Useful for finding data errors, such as mistyped categories. A special type of stacked bar chart that shows more than one variable on the x-axis.
- Learn more about mosaic plots.
Treemaps
- Number of variables: 2 or more.
- Shows possible relationships between variables. A special type of stacked bar chart that colors, orders, and sizes by different variables.
- Learn more about treemaps.
Box Plots
- Number of variables: 1.
- Shows the distribution of data. Parts of the box identify the 25th percentile, median (50th percentile), and 75th percentile. Depending on the data, whiskers show minimum and maximum; outliers occur beyond the whiskers. Used for finding data errors and exploring one variable.
- Learn more about box plots.
Side-by-Side Box Plots
- Number of variables: 2 or more, depending on how many variables are used to define groups.
- Displays box plots for groups defined by another variable. Used for finding data errors and exploring two or more variables.
Normal Quantile Plots
- Number of variables: 1.
- Determines whether or not the assumption that a variable has a normal distribution is reasonable.
Line Graphs
- Number of variables: 2.
- Shows changes over time. The x-axis must have values ordered by time. Line graphs, also called line charts or run charts, are useful for finding outliers.
- Learn more about line graphs.
Line Graphs with Categories
- Number of variables: 2 or more, depending on how many variables are used to define groups.
- Displays multiple line graphs for groups defined by another variable. Used for understanding changes over time for multiple variables and for finding outliers.
Scatter Plots
- Number of variables: 2 or more, depending on how many variables are used to define groups for colors and markers.
- Shows a possible relationship between two variables and identifies outliers. Adding colors and/or markers for other variables can help with EDA. Adding reference lines or specification limits can help identify outliers.
- Learn more about scatter plots.
Scatter Plot Matrix
- Number of variables: Many.
- Shows possible relationships between multiple variables, looking at all two-way combinations. Additional graphs can be added: histograms for each variable to identify outliers, density ellipses for each scatter plot to identify multidimensional outliers, heatmaps of correlations to clarify possible relationships.
Pie Charts
- Number of variables: 1 or more.
- Displays part-to-whole relationships for a variable. Adding categories for multiple pie charts is more useful than a single pie chart. For a single variable, a bar chart is easier to distinguish small differences in values.
- Learn more about pie charts.
Heatmaps
- Number of variables: 2 or more.
- Shows possible relationships between variables. Most often used for data that changes over time. Uses color to explore relationships between variables.
- Learn more about heatmaps.
Stem-and-Leaf Plots
- Number of variables: 1.
- Shows the shape of data and identifies outliers. More widely used before computers were available; histograms are used more often now.