Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register New Posts
 
 
 


Reply
 
Author Comment
 
camoesjo

Registered: 02/05/06
Posts: 32
Reply with quote  #1 
Stephen

I was looking at the traditional ways of representing data and wondering if that representations made sense today. Take the population pyramids, for example.

I think a paired bar chart should be used only if we have or expect asymmetric data, and you (almost) always have a symmetric distribution of sexes, with small differences on the top. So, why shouldn't we "fold" the chart and show the bars superimposed (never juxtaposed)? It is much easier to see the differences between sexes this way, and you still can see the overall shape, something that demographers always like to know. We should only "unfold" the chart when comparing two populations. And a "folded" chart saves space...

The image below shows three variations of a "folded" population pyramid. I like dot plots, but this is not a good place to use them. Using lines, you can put more series on the chart, but I would only use them if I had data by age, not by age groups. So, with one or two populations and ages by age group I would use a bar chart. It works well to show both the differences between sexes and the general structure.

(By the way, have you seen those colorful population pyramids in the US Census Bureau?

Regards
Jorge Camoes

Attached Images
Name: pyramidsm.gif, Views: 4884, Size: 8.75 KB


sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #2 
Jorge,

I think that comparisons and evaluations such as the one you've proposed here are an excellent use of this discussion forum. I would like to propose other visualization solutions to this problem as well, critique the effectiveness of each approach, and invite others to weigh in on the topic. Could you please email the data set that you used to produce your charts to sfew@perceptualedge.com? Having the data will save me some time.

Thanks,

Steve


__________________
Stephen Few
sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #3 
In the case that Jorge presents, we want to find the best way to display male and female populations so their distributions can be compared. The best solution depends on the nature of the comparison that must be made: either the overall shapes of the distributions or the values within a particular age group.

Jorge suggests that a Paired Bar Graph could be improved by folding the display, placing the two sets of bars on top of one another. Figure 1 shows an example of how the Paired Bar Graph solution might look.

Notice that I reversed the order of the age groups so they proceed from the youngest (least) at the bottom upwards to the oldest (greatest) at the top, which I believe is a bit more intuitive.

In Figure 2, I've folded the graph to unite the two sets of bars and to also reorient the graph to proceed from left to right with vertical bars, which is an arrangement that most business people would find more familiar. In the book Show Me the Numbers I refer to similar display as a Correlation Bar Graph.

The next example, Figure 3, is a more typical Grouped Bar Graph.

Figures 2 and 3 make it easy to compare the male and female populations within a given age group, supporting this task better than the Paired Bar Graph in Figure 1. None of these examples that use bars, however, do a very good job of enabling a comparison of the shapes of the two distributions.

Perhaps two bar graphs, arranged one on top of the other, would help us see and compare the shapes of the distributions more easily. Figure 4 shows how this would look.

It is a little easier to see the shape of each distribution when the two data sets are displayed separately. Removing the gaps between the bars also makes it a bit easier to see the overall shape rather than focus on individual bars, but I think we can still do better. While we can now see the shapes of the two distributions fairly well, because the two sets of bars are not right next to one another, it is still a little too hard to compare their distributions.

What about a Dot Plot? Figure 5 shows how this would look.

The shapes of the distributions are perhaps a little easier to see and compare with the Dot Plot, but not good enough.

I believe that lines will work best to help us see and compare the shapes of these two distributions, shown in Figure 6.

The presence of light grid lines makes it easy to compare the population values of females and males in a given age group without distracting from the overall shapes of the distributions.

Another alternative is to use a Box Plot to compare the distributions, but boxes in this case would mask the differences in their shapes.

I believe that if a single display must be selected, the one that supports all comparisons best is a line graph with subtle grid lines to mark the age groups. What do you think?

Attached Images
Name: All_Figures.jpg, Views: 4881, Size: 166.98 KB



__________________
Stephen Few

PSu

Registered: 05/09/06
Posts: 32
Reply with quote  #4 

Stephen:

 

I vote for solution 4 for sure, but would suggest to change the green bars into blank bars and using a border to display then.  It's then a mix between Jorge's fig 1 and your fig 4.  As this is essentially a classic "histogram" I prefer the class display horizontally over the bar graph represntation of Jorge's Fig 1.  The transparant bar with a clear border in Jorge's fig 1, is less disturbing than the green fill, although I appreciate it may be a taste issue here.  However, filling them both is less discriminating than one filled-without-a-border vs one non-filled-with-a-distinctive-border. To enhance this effect the latter non-filled-with-a-distinctive-border is best to be wider than the filled one, as you used in fig 4, where Jorge's used the same width, making them less clearly stand out. 

As you discussed, I think now the picture says it all:

- difference per age class are clear

- shapes/distribution as a whole are clear

 
 


__________________
Henk
sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #5 
Henk,

Please clarify which figure you prefer. You refered to solution 4, but your description doesn't seem to match this small multiples solution in figure 4.

Thanks,

Steve


__________________
Stephen Few
sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #6 
I just noticed that the grid lines in my prefered solution--Figure 6 above--are not positioned properly. Here's the corrected image.

Attached Images
Name: Line_Graph_-_Distribution_Comparison.jpg, Views: 4795, Size: 39.52 KB



__________________
Stephen Few

camoesjo

Registered: 02/05/06
Posts: 32
Reply with quote  #7 
Stephen

Tufte, discussing Chernoff faces, says (in The Visual Display..., p. 97) "bilateral symmetry doubles the space consumed by the design in a graphic, without adding new information" and "an asymmetrical full face can be used to report additional variables".

Population pyramids report different data in each side, so they are not symmetric, but I think we can use the same general principle. We can fold the chart to put men and women on the same side, saving some space, and we can use the full pyramid to report additional data (two regions instead of one). This is one of the advantages I see of putting the age groups on the vertical axis. You loose that if you reorient the chart.

I do think the traditional population pyramid is inefficient, but it is almost an icon of the demographic knowledge, and this raises another interesting and general question: how far should we go? Should we say "let's change everything, because this is more efficient" or "let's see if we can have a more efficient design with some minor changes"? The shape of a population pyramid is central for demographic analysis (there are three major pyramid types: expansive, constrictive or stationary). In this case,  I wouldn't reorient the chart, only fold it, because this still fits the traditional analysis in the field.

I like bars/columns with different widths, but is there an easy way to do it in Excel without an add-in?

sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #8 
Jorge,

Tufte's argument that redundancy can be eliminated by not repeating both sides of a symmetrical Chernoff face definitely doesn't apply to this comparison of male and female population distributions. In this case we have two different sets of data, which happen to share the same categorical scale (age groups). I believe that the only relevant question is, "What design displays this information clearest and in a way that is easist to compare and understand?"

I believe that the line graph, which I've proposed, offers sufficient advantages over the paired bar (or any bar chart for that matter) to warrant a replacement of paired graph design for comparing population distributions across age groups. Many traditional graphical approaches to representing data work poorly compared to other solutions and ought to be replaced.

Regarding the creation of the overlapping bars with different widths, I created my example using Excel. To do this, you simply associate one data series with the primary axis and the other with the secondary axis, set the bars to overlap 100% in the Options tab of the Format Data Series dialog box, and then set one of the data series to have a greater gap width than the other (also in the Options tab), which causes the associated bars to be thinner.




__________________
Stephen Few
nixnut

Registered: 12/27/06
Posts: 68
Reply with quote  #9 

As I was interested how things would look with a bit more data points I've made some graphs with the data for population per age in 1-year intervals. Data is for the population of the city of Delft per 1-1-06.

 

The first image is the line graph. The second is two sets of superimposed bars. I made these since histograms can be pretty data dense while still readable. I'm not too disappointed actually. Both the shape of the overall data sets and values of individual data points can be made out. In both graphs the shape gets harder to discern if the values of the two sets are close, but the line graph does a slightly better IMO.

 

The last image is basically the same graph as the second, but the width has been substantially increased. I think the superimposed bars makes it a bit easier to compare the values for male and female for the same age than lines would (but that could be just me :) ).

If space is short I think I'd do for the first line graph. If the space were available I would seriously consider the last one.

Attached Images
Name: ages1.gif, Views: 4640, Size: 4.18 KB

Name: ages2.gif, Views: 4600, Size: 6.05 KB

Name: ages3.gif, Views: 4572, Size: 8.38 KB


sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #10 
Nicely done. In addition to the providing a nice weighing the benefits of bars versus line for comparing the the distributions of males versus females by age, you have introduced two new considerations to the discussion: aspect ratio and level of data granularity.

Beginning with the advantages of bars versus lines, as expected, the bars make it easier to compare male and female populations for a given age, but the overall shape of the distribution still seems easier to see and compare with the lines. I agree that your superimposed bars, made distinct as unfilled versus gray-filled bars, can be viewed separately, but it requires a perceptual effort that isn't necessary with the lines.

Regarding aspect ratio, with this much detail (one value per age), it definitely helps to extend these values across more space by widening the graph. Some interesting work is being conducted at the University of California, Berkeley, by my friends Maneesh Agrawala and Jeff Heer in the application of William Cleveland's principle of "banking to 45 degrees." They have tested the efficacy of several automated algorithms for setting a graphs aspect ratio for optimal viewing. I hope that eventually commercial software products will incorporate this functionality to automatically adjust or suggest aspect ratios that are optimal for various viewing purposes.

Regarding the level of granularity, it is always important to determine the right level of detail for the intervals across a distribution. Dividing the distribution into too many intervals results in a jagged distribution shape, which fails to reveal the general shape, and dividing it into too few intervals fails to reveal important detail. I think that with this particular distribution, setting the intervals at the year level, rather than groups multi-year spans, works well.

Thanks for contributing to this discussion.


__________________
Stephen Few
nixnut

Registered: 12/27/06
Posts: 68
Reply with quote  #11 

Quote:
Originally Posted by sfew
Regarding aspect ratio, with this much detail (one value per age), it definitely helps to extend these values across more space by widening the graph. Some interesting work is being conducted at the University of California, Berkeley, by my friends Maneesh Agrawala and Jeff Heer in the application of William Cleveland's principle of "banking to 45 degrees." They have tested the efficacy of several automated algorithms for setting a graphs aspect ratio for optimal viewing.

Quite interesting. From reading their paper 'Multi-Scale Banking to 45ยบ' [1] I get the impression that their multi-banking technique is particularly interesting for data with multiple cycles (frequencies). It also considers only banking of line segments if I understand things correctly. For research that makes perfect sense, but for practical use other things need to be considered as well. Limitations on the space available for the graph, resolution of the media the graph is displayed on, legibility of the scales and other text elements of the graph, etc.

 

I find their example of spark-lines on page 7 very interesting. I've taken the liberty to add a screen shot of those spark-lines below. Their argument that major trends (low frequency) are better presented in the set of spark-lines on the right is convincing. However, that set of spark-lines looks very jagged. With brings me to granularity.

 

Quote:
Originally Posted by sfew
Regarding the level of granularity, it is always important to determine the right level of detail for the intervals across a distribution. Dividing the distribution into too many intervals results in a jagged distribution shape, which fails to reveal the general shape, and dividing it into too few intervals fails to reveal important detail.

I agree completely. That's why I wonder if the set of spark-lines on the right would look better if it were generated with less granular data. An interesting line of research might be to not only consider banking, but also the impact of data granularity on readability of graphs generated with banking techniques.

 

All in all their technique is very interesting. It could be very helpful in automated generation of a set of graphs for a data set from which one could choose the graph most suitable for the situation in which it will be presented.


Quote:
Originally Posted by sfew
Thanks for contributing to this discussion.

My pleasure. The subject is fascinating and this forum is great place to learn and exchange views and ideas.

 

edit: perhaps this post should be moved elsewhere, since we're drifting away from the subject of population pyramids :-)

[1] http://vis.berkeley.edu/papers/banking/2006-Banking-InfoVis.pdf

Attached Images
Name: sparklines.gif, Views: 4508, Size: 6.06 KB


PSu

Registered: 05/09/06
Posts: 32
Reply with quote  #12 
Sorry, Stephen, for not being clear, and for my slow reply.  I don't know what I was thinking!  I meant your figure 2 has my preference, with the suggested modifications. 

 

__________________
Henk
koday

Registered: 04/28/06
Posts: 9
Reply with quote  #13 

Stephen:

 

Great discussion. I'd like to follow-up on your comments about aspect ratio and banking to 45o and Heer & Agrawala's work.

 

I used their techniques to build an Excel bank to 45o workbook. Here's the link:

 

http://processtrends.com/pg_data_vis_bank_to_45.htm

 

 

In Beautiful Evidence, Tufte (page 60), builds on Cleveland's work by asking the question "How should a sparkline aspect ratio be chosen?". Part of Tufte's  answer is reproduced bellow....

 

".. a graphic's width/ height [w/h] ratio makes a big difference in displaying data." ...


In general , statistical graphics should be moderately greater in length than in height. And, as William Cleveland discovered, for judging slopes and velocities up and down hills in time-series, best is an aspect ratio that yields hill-slopes averaging 45o, over all the cycles in the time-series. That is, variations in slopes are best detected when the slopes are around 45o, uphill or downhill. ... the aspect ratio should be such that the time-series graphics tend toward a lumpy  profile rather than a spiky profile .. or a flat profile." Tufte, Beautiful Evidence

 

How does one measure lumpy versus spikey?

 

I'd like to hear your thoughts on the need for banking to 45o with sparklines. If we force chart height and width, we control the aspect ratio without regard to the data.

 

Dashboards and editors tend to want all charts the same size. Banking to 45o implies the opposite, the data pattern should specify the aspect ratio. 

 

Kelly O'Day

http://processtrends.com

 

sfew

Moderator
Registered: 12/30/05
Posts: 615
Reply with quote  #14 
Kelly,

Unfortunately, banking to 45 degrees is not a practice that can be applied to dashboards, because it would cause graphs to be resized automatically with changes to the data, which would play havoc with the layout of the dashboard. It is a useful practice, but just not one that can be used with any display that combines a number of graphs in a limited amount of space, such as a dashboard, where changes to the dimensions of graphs caused by automatic data updates will alter the overall layout.

I think it's great that you incorporated this practice into Excel. Kudos to you for taking on and conquering this challenge. I plan to send a link to your workbook to Maneesh Agrawala and Jeff Heer, who are both friends of mine at the University of California, Berkeley.

Steve

__________________
Stephen Few
Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Powered by Website Toolbox - Create a Website Forum Hosting, Guestbook Hosting, or Website Chat Room for your website.