Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment   Page 1 of 2      1   2   Next
bpierce

Moderator
Registered:
Posts: 98
Reply with quote  #1 
For the January/February/March 2014 Visual Business Intelligence Newsletter article, Stephen examines a type of graph often found in statistical visualization tools, called a "mosaic plot," and asks the question: "Are Mosaic Plots Worthwhile?" Currently, mosaic plots aren't standard in Excel or most general Business Intelligence (BI) tools, but that could change in time. In conducting his review, Stephen's goal is to determine whether mosaic plots work more effectively than their alternatives, so he can either encourage their widespread adoption or try to "head them off at the pass." 
 
What are your thoughts about mosaic plots? Do you agree with Stephen's conclusions? We invite you to post your comments here.

-Bryan
grasshopper

Registered:
Posts: 245
Reply with quote  #2 
Interesting article.

I was wondering if you could elaborate a little on the distinction between Mosaic Plots and Tree Maps
(http://en.wikipedia.org/wiki/Treemapping). And do you consider one more (or less) evil than the other? :)
wd

Registered:
Posts: 167
Reply with quote  #3 

These sentences as written by Steve in this article describe what is a most frequent problem in dv:

"In addition to realizing that it isn’t necessary to force everything into a single graph, it’s also important to realize that no single view of data will ever answer every question. This is an underappreciated fact of visual analysis. Much time and effort is wasted trying to cram everything into a single view when moving fluidly from one view to the next usually works much better".


 

Filling the page with one large complicated graph / chart is reader inefficient and often ineffective.  Several smaller graphs each with a simpler message can be placed on one page to tell the story that's in the data with much greater effectiveness.  Unfortunately, the default mindset seems to be the 'one big one'.

Keep at it Steve - and thank you!


__________________
Bill Droogendyk
danz

Registered:
Posts: 186
Reply with quote  #4 
A mosaic plot is just a treemap which uses a combination of strip and slice-and-dice algorithms for tilling. The optional aditional space is not relevant. 

I assume both methods are similar useful when a large amount of items have to be explored. For a small amount, as they were presented in Stephen article, simple bars are easier to read.

I consider that a simple stacked bar graph performs better than a mosaic plot in most of cases. For a larger amount of elements a wrapped stacked bar can be also used.  
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #5 
Grasshopper,

Tree maps and mosaic plots, although they share some obvious characteristics in common, they are also different in fundamental ways. To address your question regarding comparative evil (or usefulness), as I mentioned in the article, the trade off in perceptibility is justified with tree maps because they do something that cannot be done in a more effective way: they give us a way to view and compare a huge number of values at once. Other than the visualization called "wrapped bars" or "wrapped dots" that I proposed in my newsletter last year, there is no better way to compare a huge number of values than a tree map. Mosaic plots, however, don't handle large numbers of values, so the trade off in perceptibility that they impose isn't worthwhile when compared to a properly designed pair of bar graphs.

Other differences between tree maps and mosaic plots include:

1) Tree maps can encode two quantitative variables (e.g., sales revenues and profits), but mosaic plots encode a single quantitative variable--typically counts or percentages--per the items of two or more categories (e.g., sex, class, and age in the Titanic example).

2) Tree maps encode a single set of proportions as rectangle areas, but mosaic plots encode two sets of proportions in each rectangle using height for one and width for the other.

3) Tree maps are arranged hierarchically (e.g., products within product families within product lines). The categories that are displayed in mosaic plots do not necessarily represent hierarchies and are not arranged hierarchically.

4) Mosaic plots are divided into columns--one per item in the dominant category--but tree maps arrange groups of rectangles in a way that best fills the space.


__________________
Stephen Few
neilism

Registered:
Posts: 7
Reply with quote  #6 
Excellent article -- the critique with a demonstrably better alternative is very powerful. I wonder if the vendors who add in the more gimmicky functionality will take the well articulated hint and add in some easier ways to easily present multivariate data as auto-formatted graphs?

One tricky issue is having both proportions and counts. You need the counts to allow you to judge whether the outlying proportion is just a small numbers effect. But the data are far apart. The standard solution is to overlay confidence intervals (particularly for surveys, but you could do something similar here...) or to arbitrarily suppress the 'invalid' proportions (alternatively highlight the valid ones and grey out the invalid ones). But none of these are very elegant and seem to confuse more than illuminate. Has anyone found a clever way of combining proportion/count/confidence?
jlbriggs

Registered:
Posts: 194
Reply with quote  #7 
Quote:
Originally Posted by wd
Unfortunately, the default mindset seems to be the 'one big one'.


Indeed.

This is painfully evident in all of the headlines and tweets that exclaim ".......  in one chart" as if the author has captured some hidden secret or wisdom in this way.

Also painfully evident in the technical help questions at places like http://stackoverflow.com/ , where an incredibly large number of questions for charting libraries include something along the lines of "how can I show this, but also this, and this, and this other thing on my chart?"

Also unfortunate: people get outright angry when you suggest that they should display the data more effectively on multiple plots.
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #8 
Neilism,

By "proportions and counts," I assume you're referring to the information that a mosaic plot displays as the width versus the height of a rectangle. Both are proportions, expressed either as counts or percentages. In my redesigns, I displayed the proportions that were expressed in the mosaic plots by rectangle width as a separate set of bars in the bottom graph to make those proportions directly accessible when viewing the other set of proportions in the upper graph. In the article's examples, there were no cases when confidence intervals were required, because there was no uncertainty in the measures. If there were, however, the bar graphs could include regular confidence/error bars as well.

Did I understand your question correctly?

__________________
Stephen Few
Thorri

Registered:
Posts: 21
Reply with quote  #9 
I want to show you a sketch that which might be useful in this conversation.

A client once asked me to draw "brand usage diagrams", which are similar to mosaic plots. Which I did, but couldn't resist trying out better ways to represent the data. "Brand usage diagrams" are supposed to show the percentage of people who like or dislike the brand, and to what degree. The question has six possible answers ranging from "Like very much" to "dislike very much". There is no neutral answer available.

On the sketch, each box represents one brand. Deep blue is "Like very much", and deep red is "dislike very much". Below each box is the same data, represented with eight columns. The first two columns, with grey background, are total positive and negative answers. The next six are the individual answers.
brand-usage-diagram-study.png 
Adding the answer popularity/count below, in another chart is a very good idea, which I will add to my arsenal.

neilism

Registered:
Posts: 7
Reply with quote  #10 
SFew,

I was specifically thinking about the graph on page 12, which shows the percentage deaths by class, age and gender, as well as the raw counts.

When I first started looking, I was drawn to the percentages first. I noticed the high proportion of female crew (ladies first!) in comparison with the third class females (adults and children) and then started constructing some fanciful narrative in my head about loyalty to your workmates (blah, blah...). Then I noticed the small number of female crew, which made me doubt the significance of the high percentage.

So to interpret the percentages I had to bounce backwards and forwards visually to check whether I thought the high percentages were meaningful or possibly the result of simple randomness. It's this bouncing backwards and forwards between the percentages and the associated counts that I'd like to eliminate.

In this case, confidence intervals wouldn't necessarily be relevant, but where I've used them in surveys they just seem to confuse people more -- they cope better with the point estimate. Similarly, I've 'greyed out' results that I've decided are more dubious (e.g. where a statistical test suggests that the result isn't significant) to bring attention to the items that are significant ('why's that greyed out again...?) and I've tried highlighting the ones that are significant with a symbol, which gets confused for being important rather than significant.

I was wondering if anyone had come up with a visual way to combine the percentage with the associated count to help the reader intuit that one percentage is meaningfully greater than another.

Kind regards

Neil
danz

Registered:
Posts: 186
Reply with quote  #11 
Stephen,
 
For part of a whole information, a "containment" approach has the merit of being visually intuitive for any audience. Using the percentages encoded as bars in first column, can lead, maybe, to a visual misinterpretation. Swapping columns and using another encoding method for percentages would avoid such of situation. Below I made a quick "hack" of your graph using dots instead of bars for percentages. A quick advantage is the possibility of encoding groups percentages as well. 

sfew-dots-vs-bars.png 
 
Dan

sfew

Moderator
Registered:
Posts: 814
Reply with quote  #12 
Dan,

The concept of containment is easy to understand, but neither a mosaic plot nor a tree map is intuitive when you first see one. Even a simple pie chart is only intuitive because we were introduced to them in school as children. We tend to put too much emphasis on making visualizations intuitive. There is value in presenting data in familiar ways, but that does not trump the value of presenting data effectively. If I can show data much more effectively using an unfamiliar form of display that requires a minute of instruction, the end result in understanding is worth the cost of training.

I understand what you're trying to achieve in your redesign above, but I don't agree that it's necessary or effective. Firstly, it is easy to point out that the two quantitative scales are different from one another without using different encoding methods: bars and dots. Secondly, using different encoding methods does not tell the reader that the scales are different. And finally, it is slightly easier to see and understand two sets of bars in relation to one another than it is to relate one set of bars and another set of dots.

Neilism,

Thanks for the further explanation. It is true that you eyes must bounce a bit between the upper and lower graphs to understand the importance of data in the upper graph in relation to the volume of data in the lower graph, although you can get a rough sense of the data in the lower graph without shifting your gaze from the upper graph. I don't believe that this results in greater perceptual or cognitive effort, however, than the shifting between focusing on the heights and widths of rectangles in a mosaic plot. To answer your question about ways that the importance of the data could be added to the upper graph, several possibilities exist. If variations in color intensity were already being used, you could vary the color of the bars from light to dark to encode importance (in this case based on the number of people that each bar represents). If you can't use color, you could also add a gray scale border around the bars and vary its color intensity or its stroke weight to indicate this. You could, of course, vary the widths of the bars to do this, much as a mosaic plot varies the widths of the rectangles, and it would still work better than the mosaic plot because the bars are always aligned along a common baseline, so comparisons of their heights is easy.

__________________
Stephen Few
danz

Registered:
Posts: 186
Reply with quote  #13 
Stephen,

My reason for using a different method for encoding percentages was not to suggest a different measure or scale, but to avoid ink color ratio, which has already a "containment" perception in stacked bar. Also (is not the case above) it could be that in some cases a non zero based scale for second measure to be more apropriate for encoding.

And yes, two chart bars next to each other are also my choice when I have to encode, for instance, sales and growth. It is just easier to relate the values.


Dan
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #14 
Dan,

I don't follow what you're saying. You offered your redesign as a way to avoid a "visual misinterpretation." Are you now saying that the "ink color ratio" is the "visual misrepresentation" that you were trying to avoid? What do you mean by "ink color ratio."

__________________
Stephen Few
danz

Registered:
Posts: 186
Reply with quote  #15 
Stephen,

The stacked bar chart contains colored bars. Each color has its own meaning (legend is there). Area covered by same color is in ratio with the total amount for men, women, survivals or death people category.

The area of the colored bars in the percentage graph not being in ratio with absolute values can lead to possible visual misinterpretation (see long bars for children). I used dot graph as encoding method for percentages to avoid this. I assume that using different widths for percentage bars is also possible.
Previous Topic | Next Topic
Print
Reply

Quick Navigation: