|
DOE/EIA-0465(98)
EIA Guidelines for Statistical Graphs
Vertical and Horizontal Bars, Pie and Dot Charts, and Three-Dimensional Features
This chapter discusses the design of four graph formats that are commonly used when the focus of interest is the categories into which a quantity is divided. The four formats are vertical bar charts, horizontal bar charts, pie charts, and dot charts.
Simple, stacked, and clustered (or grouped) bar charts and their design criteria (i.e., bar ordering, scales, shading, color, and labels) are discussed first. Pie charts, including multiple pies, are discussed next. Pie charts (Figure 16) are often not the best format for presenting proportions of a whole because it is difficult to compare the relative magnitude of slices in a pie. This is not the case with bar charts and dot charts. Data presented in bar charts (Figures 14 and 15) and dot charts (Figure 17) are read more quickly and accurately than when the same data are graphed as a pie chart.
Three-dimensional (3-D) elements in graphs are the final topic. They often distort or hide data, making comprehension difficult or impossible. Further, some software packages (such as Harvard Graphics) compound these difficulties by not accounting for the rules of depth perception in presenting 3-D charts by not drawing distant objects smaller than closer ones of the same nominal size.
Bar Charts
Bar charts are used for direct comparison of magnitude for descriptively labeled categories (i.e., fuel use by various household appliances in a given year). Bar charts can also be used to show time series data when the number of time intervals is small.
Simple Bar Charts
The simple bar chart is the major method to compare the magnitude of two or more discrete categories of a variable. When the bars are of equal width, the length of each bar is proportional to the value it depicts. Thus, comparison is based on direct linear values, which is easier than assessing relative areas. Occasionally, uneven widths are needed and the comparison is more difficult.
| Figure14. |
U.S. Electric Utility Hydroelectric Net Generation,
Change in Daily Rate as a Percent of the Previous
Month, 1992 |
Figure 14 is a simple vertical bar chart. It portrays the daily rate percent change from the previous month in 1992 for electric utility hydroelectric net generation in the United States. The graph clearly shows a varying pattern of decreases and increases throughout the year. For example, there is a small (1.5 percent) decrease in January, followed by 11- and 12-percent increases in February and March, respectively, and a 7-percent decrease in April, and so on throughout 1992. If readers know what regions of the country use the most hydroelectric power, they could deduce 1992 weather conditions in these regions. For example, from the July through October decreases and November and December increases, readers could assume these regions had drought in the summer and heavy rainfall later in the year. The visual impact would be less obvious if the original values were plotted instead of the percent change in the daily rate.
Bars may be vertical or horizontal. The only difference between horizontal and vertical bars is that horizontal bar charts are seldom used to portray time series. The horizontal bar chart is used, characteristically, for direct comparisons of categorical data. The vertical bar chart can be used for comparisons of both categorical data and variables over time. In both types, the length of the bars is measured on a continuous scale.
The purpose of the Y-axis scale in the bar chart is to gauge the length of the bars with accuracy. The scale must always start at zero, and the bars are measured continuously from zero without any break. Single "freak" bars (i.e., bars that are much taller than the others in the graph) may have a break under certain circumstances.
-
Horizontal bar charts have the advantage that there is more room on the vertical axis for labels, expecially if the number of bars to be portrayed is relatively large (i.e., a distribution for the 50 States).
-
The specifications and suggestions that pertain to ordering of bars, shading, coloring, labeling, and the title are generally applicable to both horizontal and vertical bar charts. The discussion below is in terms of vertical bars.
Subdivided (Stacked) Bar Chart
Stacked bar charts consist of one or more segmented bars. Each segment of a bar chart represents the relative share of a total that a component contributes. If the subdivided bar chart is horizontal, the characteristics of the chart are the same as those of a subdivided vertical bar chart, except that the bar segmentation is horizontal.
Subdivided bar charts are not a preferred format. They have to be used with caution because they possess the same problems as cumulative line graphs. It is difficult to make comparisons among the second, third, or subsequent segments in a stacked bar because judgment is not being made from a common base. This is why the restrictions that govern the use of cumulative line graphs (as discussed in the chapter on "Measuring from the Baseline") also govern the use of stacked bar graphs:
-
The components do not exhibit marked irregularities; i.e., irregular change in relative share.
-
The components do not exhibit seasonality; i.e., systematic changes.
-
The components do not exhibit any sharp upward or downward trends.
Grouped (Clustered) Bar Chart
Two or more bars representing different series, or different classes in the same series, can be grouped together side by side. In grouping, the bars may be joined together or separated by a narrow space. If bars within a group overlap, only part of the bar is visible and this visually distorts the comparison. Grouped bars (Figure 15) are preferable to stacked bars.
| Figure15. |
U.S. Household Expenditures of Major Fuels by Census Region, 1990 |
Figure 15 portrays four fuels as individual bars within a group of bars representing each region both vertically and horizontally. The horizontal graph has more space on the vertical axis (X-axis) for labels. A legend is not necessary. In either graph, readers can see that in every region the two largest expenditures are for electricity, followed by natural gas. Readers can also quickly see that electricity expenditures are greatest in the South, while natural gas expenditures are greatest in the Midwest. Finally, there is a reversal of the relative magnitudes of fuel oil and LPG in the West compared with other regions. Nevertheless, the component (fuel) bars are in the same order in each group. Bars arranged in descending or ascending order are most effective, unless there is a logical reason for a different order. Yet, once an order is selected, it is essential that the order remains unchanged for all groups in the graph. If total expenditures of fuels stratified by each region are of interest, another bar depicting total can be added to each group. Alternatively, a separate graph of totals can be presented.
Design Criteria for Bar Charts
The basic design criterion of bar charts is that the areas enclosed are proportional to the quantity depicted. Bar widths are essentially a scaling device, so they need to be equal when comparison is to be made within or between graphs. Special accommodation is necessary for any bar of an odd width. In designing a bar chart, the columns should not be excessively wide or narrow, or disproportionally long or short. As a working principle, the spaces between the columns can be from one-fourth to the full width of the bars. The number of columns and the size and proportions of the chart are ultimately the determining factors for both the width of the bars and the spacing between them.
Ordering of Bars
To facilitate comparison and analysis, it is desirable that columns be arranged in some systematic order. The most common and visually effective schema is according to size or value (i.e., ascending or descending order). For the reader interested in a particular element, such as a specific State, it is of course easier to locate the value alphabetically, but the contribution of that element relative to the others is not easily perceived. It is common practice to place categories, such as "unclassified," "miscellaneous," and "all other," at the end of the series of bars.
Scale
Always use a zero base line. The vertical scale that represents size or amount is, with one exception, never broken because the design is based on the principle that the area displayed is proportional to the quantity of interest. Occasionally, there are bar charts where a "freak" bar is so disproportionately long that all other bars are dwarfed by comparison; in such cases, it is permissible to break the bar. The break is neat and simple, at a point higher than the next longest column, and the bar's value should be indicated above it. But it should be clear, either with a footnote or a message in the graph, that - for this bar only - the value cannot be read from the scale, which does not extend as far as this large value.
The digits on the scale have equal intervals expressed in whole numbers. For vertical bar graphs, the vertical scale digits and label are placed on the left side of the grid. If the chart is very wide, the scale digits and label may also be placed on the right side. Each bar or group of bars carries each own label, which is located directly beneath the X-axis. For horizontal bars, the scale is horizontal, and the labels are usually on the left side.
In time series graphs that portray unequal time intervals, the width used for a bar must be proportional to the depicted time interval. For example, if one bar represents 1961 - 1980 and another 1981 - 1990, the width of the 1961 - 1980 bar is exactly twice as wide as the 1981 - 1990 bar. Using unequal time intervals is not recommended because of the need for height adjustment if totals are being plotted. This issue is discussed in the Frequency Distribution chapter (histograms).
In addition to the width of the bars, attention must be paid to distances between the bars. If, for example, data are to be presented for 1980, 1985, 1990, and 1992, the distances between bars are proportional to 5, 5, and 2.
There may also be time series bar charts with open-ended time categories, i.e. "Before 1900" or "After 1950" at the beginning or end of the time scale. In these cases, the open-ended category is separated from the finite categories by break marks on the axis. Also, it is preferable that the boundary lines for open-ended bars be dotted or dashed to indicate that, although the total area is known, the approximate height and width are not known.
Shading and Color
It is accepted practice to use shading in bar charts. The choice of shading patterns can enhance or distort the data. Use of densely to lightly cross-hatched patterns (or dark to light ones) clearly represents high to low data values, but coarse, wavy, and bizarre patterns hide the trends and comparisons in the display. Further, some patterns can create optical illusions and some are hardly distinguishable from others. These patterns are undesirable.
This latter point is particularly important when using subdivided and grouped bar charts where several different shading categories are involved. Shading with distinct patterns is essential to differentiate the components. To avoid the problems discussed above, and to save time, the array of patterns available in the software being used should be checked before constructing the bar chart. Finally, if colors are used instead of shading patterns in subdivided and grouped bar charts, the same "rules" for using color in line graphs that are discussed in the Design Criteria chapter (under "Color") apply. Since segments and/or bars are touching each other in subdivided and grouped bar charts, it is particularly important not to choose a color or colors that may overpower or create "noise" on the adjacent segments and/or bars.
Pie Charts
Pie charts have limited utility. They are sufficient to display simple messages (those which communicate one or two points) but they do not communicate complex messages well. This is one reason graphics researchers argue the format is not a good one in which to display data. The research has demonstrated that other formats, i.e., bar charts and dot charts, display data, particularly complex data, more accurately and clearly than pie charts:
-
Edward Tufte, in The Visual Display of Quantitative Data, wrote "the only worse design than a pie chart is several of them." [14] Tufte reiterated this position in a statistical graphs seminar in February 1992 in Washington, D.C. [15]
-
Howard Wainer of the Educational Testing Service stated in a 1987 Independent Expert Review of EIA Statistical Graphs policies that "the use of pie charts is almost never justified" and that they "ought not to be used." Wainer recommended to EIA that dot charts be used instead of pie charts in EIA products. [16]
-
William Eddy of Carnegie-Mellon University, formerly vice chair of the American Statistical Association (ASA) Committee on Energy Statistics, said of pie charts at the April 1988 ASA committee meetings in a session on the EIA Standards Manual, "death to pie charts." [17]
Proponents argue that pies are an effective format to communicate both simple and complex messages. Some use the analogy of telling time. They contend that readers are accustomed to reading clocks and can readily discern the relative distance that each segment occupies along the circumference. For example, a wedge with a 30-degree angle (5 minutes on the clock face) can be readily distinguished from one with a 60-degree angle (10 minutes on the clock face), and it is easy to assess that they represent 1/12 and 1/6 of the total, respectively. In short, readers are used to pie charts. Others contend that pies are a pleasant change from straight bars and lines and that they clearly portray the concept of the sharing of a total among multiple segments.
The question, thus, is: can readers visually deduce more quickly and accurately more data from pie charts than from other formats? [18] The examples in Figure 16 attempt to clarify this debate. The segments in the hypothetical pies in Figure 16 differ from each other by only a few degrees, as the quantities differ from each other from 4 to 6 percent. Thus, it is easy to pick out the largest and smallest segments, but it is hard to determine (even in the size ordered pie) the relative size of the others. In the similarly ordered bar chart, the relative size of all the segments can be discerned. Yet, only in the size ordered bar chart can the reader discern both the relative magnitude of the segments and the relative size differences between them, without the need to place the numeric value next to each segment. In short, more information can be quickly and accurately drawn from the size ordered bar graph than from the others in Figure 16. This agrees with Cleveland and McGill's research that it is not easy to judge the order of pie chart values. [19]
It seems fair to conclude that relative sizes of pie segments are hard to distinguish unless they differ by more than 5 percent. This, in turn, implies that there is an upper limit to the number of segments that can be presented in a pie.
Figure16. Advantages of a Bar Chart Over a Pie Chart
Shading and Color in Pie Charts
Pie chart proponents believe that proper segment shading can enhance clarity in pie charts. The "rules" for shading and color for bar charts outlined earlier in this chapter are also applicable to pie charts. Particular care must be used in the selection of shading to avoid making the visual comparison more difficult. Striped shading is misleading because a selected pattern has a constant orientation on a page, and its appearance in relation to a wedge-shaped segment changes according to the orientation of the segment. It is not possible to produce shading of concentric circles with existing packages.
The purpose of color is to differentiate, to further facilitate comparison. If color is used, it needs to be used thematically. For example, if color were to be used to differentiate the sectors in the size-ordered pie chart in Figure 16, a progression of dark to light hues could be used effectively from the largest slice to the smallest one. Conversely, a dark color and a light color could not be used successfully to differentiate two (or more) slices of equal or near equal size. This creates a visual illusion where the sector with the dark color looks larger than the one with the light color.
Dot Charts
While readers comprehend bar charts more quickly and accurately than pie charts, another format, dot charts, is preferred over bars. Dot charts are an underutilized format in EIA. In dot charts, readers judge position along a common baseline, "the most accurate (visual) elementary task," rather than from an angle (pie charts) or a moving baseline (stacked bar charts).
In Figure 17, the reader can quickly and accurately see the comparative volumes of the six fuel types used to generate electricity in the United States in December 1993. The order of the use of the six fuels, particularly natural gas and hydroelectric, would not be as obvious in a pie chart or a stacked bar chart. Dot charts can also be used to display grouped data by using different symbols (i.e., diamonds, triangles, open circles) for each component, and by adjusting the vertical spacing.
| Figure17. |
U.S. Net Generation by Energy Sources, December 1993 |
Three-Dimensional Features
Three-dimensional (3-D) graphs can be useful for displaying three related, continuous variables, but no examples of this have been found in EIA products. To add a nondata spatial dimension to a graph often introduces distortion and ambiguity that distracts from the clarity of presentation. The reader does not know if the quantities displayed on a bar chart, for example, are represented by length, volume, or area. The reader spends much more time and effort than should be necessary to judge effectively the data presentation. Three-dimensional features also take up space that otherwise could be used to present more data in the graph.
In a now common technique of exploratory data analysis, researchers can analyze large clusters of data points in three-dimensional space on a computer monitor. They also can "rotate" these clusters in this three vector or three-variable space and see shapes, bumps, outliers, and other features of the data set that would not be visible in two-dimensional space (William Cleveland, Visualizing Data, 1993).
An Unacceptable 3-D Graph:
U.S. Average Imports of Petroleum Products, 1985 Through 1992
This figure is an example of the distortions 3-D puts into graphs. First, the visual perception is given (correctly) that the variables are not being measured or presented from the same plane (base line). On the Y-axis, it appears that jet fuel starts at a point below all the other components while, conversely, it appears that residual starts at a point higher than the other components. This makes visual determination of the values difficult.
Another problem with this figure is that readers can not analyze residual imports. The 1985 and 1990 through 1992 bars are hidden behind the "other fuels" bars for those years. Readers only know that in 1985 and 1990 through 1992 less residual fuel oil than "other fuels" was imported into the United States. It is difficult to know that something exists, let alone measure it, if it is hidden. This problem reinforces the first distortion.
Finally, the Y-axis and X-axis scales are distended from the data. In a line graph, bar chart. or any graph with a horizontal and a vertical scale plotted on a flat plane, the data are "flush" (on the same plane) against the scale, and the human eye can quickly and accurately read the values for the components both cross-sectionally and longitudinally. In this figure, trying to quickly and accurately measure "other fuels" in 1986 or distillate in 1987, for example, takes time. (The actual values of "other fuels" in 1986 and distillate in 1987 are 504,000 and 255,000 barrels per day, respectively.)
The visual difficulties are compounded because "many designers of three-dimensional charts lack familiarity with the principles of projection techniques. [20] The third dimension is only conveyed satisfactorily when distant objects are drawn smaller than closer objects of the same size. In the figure above, the horizontal line at the top is not drawn shorter than the horizontal line at the bottom of the graph; the diminishing effect of distance is ignored throughout. Among EIA supported software, SAS/GRAPH can plot three variables (x, y, and z axes) and draw the three dimensions in proper perspective to each other. [21]
Pie graphs are particularly difficult to understand when prepared in 3-D because all information is on an elliptical surface of a cylinder. Instead of wedges of a circle, the comparison must now be made among wedges of an ellipse. This means that it is not possible to judge relative size by looking at the arcs of the circumference. As radii are now not constant, and the length of the arcs depend on whether they are in the front or the back of the pie, the task is nearly impossible. In Figure 18, the 3-D graph from above is redrawn as a line graph. Figure 18 has six data lines, which often can be excessive. Yet, in this graph, there is little overlap among the lines and makes the figure much less confusing than the 3-D example above. It now clearly shows the trends and comparisons that are hidden in the 3-D graph. Readers can readily see, for example, the wide fluctuations of "other fuels" and residual imports during this time period in contrast to less varying levels of imports for the other petroleum products. Figure 18 also has a right-side Y-axis to illustrate how a right-side Y-axis allows a user to more easily relate the recent data points to their values.
| Figure18. |
U.S. Average Imports of Petroleum Products, 1984 Through 1991 |
Chapter Summary
-
Bar charts and dot charts are both used to display the relative share of a total that each component contributes. Bar charts, generally, are a good format to use, but dot charts are more effective than bar charts.
-
Vertical bar charts are often used to display time series. Compared with line graphs, bar charts can provide greater emphasis for relatively few periods of time in portraying a single time series and greater contrast in portraying two or more series, but they are less effective for five or more data points.
-
The choice between vertical and horizontal bar charts will depend, largely, on the number of bars, extent of labeling, and complexity. The graph designer is advised to try both and see which one is more easily understandable to readers.
-
Because pie charts have perceptual weaknesses when the relative differences among the partitions are small, bar charts and dot charts are more effective than pie charts for multiple partitions.
-
It is ill-advised to add 3-D elements in any published graph if the purpose is to provide an accurate data presentation. Three-dimensional elements complicate and distort the purpose (i.e., to display the relative shares of a total) of graph formats. Any perceived advantage in making the graph more attractive are outweighed by the increased idfficulty a reader will have in comprehending the data quickly and accurately. The primary purpose of a statistical graph is to illustrate the data effectively, not hide them
The statistical packages that produce 3-D graphics usually do not handle perspective properly. They are geared towards a "tinker toy graphics" type of presentation that is used for purposes other than the clear presentation of data. [22] This type of presentation may be a useful technique for a verbal presentation, but it is seldom appropriate in a factual publication.
Click here to return to front of report.
|