Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment  
jlbriggs

Registered:
Posts: 194
Reply with quote  #1 
Xan Gregg has written a blog post about a display type that he is calling "packed bars".

When I first started to read, my immediate reaction was "this isn't new", and it holds true that parts of it are certainly not new.

But I find the technique very interesting, with a lot of potential, for data sets that you might be inclined to plot as a treemap, or that some users might be inclined to plot in the horrendous form of packed bubbles.

Post: https://community.jmp.com/t5/JMP-Blog/Introducing-packed-bars-a-new-chart-form/ba-p/39972

Thoughts?

I did think initially about Stephen's wrapped bar chart as well. I think there is definitely an overlap of data sets that could be plotted with either.

My main line of thought at the moment is that, if I were going to plot a data set as a tree map, and found this method, I would much rather plot as a "packed bar" than a tree map. I like the consistent axis, and the fact that the length of each segment still represents a standard amount that can be compared, unlike a tree map where you need to consider the area.

I am curious to know what other people think about potential shortcomings that either aren't addressed in the blog post, or whose explanations you don't find satisfactory.


danz

Registered:
Posts: 190
Reply with quote  #2 
I very much appreciate the effort of the author. But I don't think a packed bar is a serious alternative to a treemap or wrapped bar.

A packed bar is a variation of a treemap with one level depth. As a programmer, but also as a fan of data visualization, I developed around 20 experimental parametric tiling algorithms for treemaps for different purposes. The most known technique goal is to "squarify" the tiles (for better comparison), ordering them in descending order in a spiral like direction top->down, left->right. This technique usually is the only tiling algorithm provided by vendors for their treemap representations. Treemaps were designed to display additive measures in hierarchical way, across several levels. "Packed Bars" is the result of a parametric trivial tilling algorithm applied to one level depth only.  

Let's see a few aspects of a Packed Bar against a Treemap.

Clarity. I am sure that we may be seduced by the clarity of first picture, but I assure you that a perfect fill of rectangle space does not happen often. The author briefly mentioned about it, but the situation is actually much worse in real world. The usual result of a packed bars will be a bunch of distracting unequal stacked bars with no meaning for the stacked value.

Axis. Except the first column of tiles, the rest of the rectangles do not benefit much from the axis presence. On contrary, they might even suggest sort of stacked logic, which actually does not exist. The claim that we can estimate the total amount multiplying the amount of rows with axis max value is sort of funny. We can display total as number, is no need to do this exercise, at all.

Constant bar width. The only real advantage against a treemap is the constant width of the tiles, yet the fact we can compare against a scale only first few items and small items are really difficult to decode, makes the gained advantage too little to be considered.

Order. A treemap standard squarifying algorithm has the advantage of spiral like parsing direction of tiles in descending size order, order which can hardly be achieved in a Packed Bar without seriously affecting the right edge (first picture suggests a top->down, down->top order, but I cannot be sure).

Hierarchy. Obviously no extra level depth is possible.

Amount of items. As the author mentioned, is more difficult to estimate tiles they go below pixel size. 1x12 is still visible, but 0.3x12 ?

Manual tuning. A treemap standard tiling algorithm will always fit 100% the reserved rectangular space, disregarding the amount of items and their values, unlike a packed bar, which will require a lot of manual tuning (amount of rows) for acceptable fitting result. 

Dan




xan

Registered:
Posts: 44
Reply with quote  #3 
Thanks for taking the time to look over the packed bars work, Jamie and Dan. Though I started packed bars as a way to bring Focus+Context to bar charts, framing it as a special bar chart does increase the learning curve because you have to unlearn the stacked bar interpretation. Framing it as a treemap-like tiling that trades imperfect filling and nesting for a natural x axis is often easier to understand. But it really is neither a bar chart nor a treemap.

I do not claim objectivity, of course, but I would like to comment on Dan's observations in case some things might merit another look.


Quote:
Clarity. ..The usual result of a packed bars will be a bunch of distracting unequal stacked bars with no meaning for the stacked value.

Surprising even to me, the packing comes out pretty evenly in most real world data sets I've tried, including all of them (about 25) that fit the many-category/skewed-distribution target. My original algorithm included extra logic to smooth out the right edge by spreading out the final values, but it turned out to be unnecessary, and I got rid of it to keep the purity of the bar-length value encoding.

Here's a data set that I'm putting near the low end of "skewed-distribution" scale -- the 200 values have a mostly linear distribution, and its right edge is not too uneven, for instance.

collegeEnrollment2014.png 
Quote:
Axis. Except the first column of tiles, the rest of the rectangles do not benefit much from the axis presence. On contrary, they might even suggest sort of stacked logic, which actually does not exist. The claim that we can estimate the total amount multiplying the amount of rows with axis max value is sort of funny. We can display total as number, is no need to do this exercise, at all.


In the above example I think I do get some benefit from the axis for the secondary bars, such as estimating the Kentucky enrollment at at a little over 20K even though it's far from the axis (it's actually 25K) and noticing that the enrollments are above 10K until almost the end.

Here's a squarified treemap of the same data for comparison.

enrollmentTreemap.png 
Good to get the area multiplication feedback. It was one of those fun (not funny!) observations that's at most a minor freebie.

Quote:
Constant bar width. The only real advantage against a treemap is the constant width of the tiles, yet the fact we can compare against a scale only first few items and small items are really difficult to decode, makes the gained advantage too little to be considered.


Perhaps my example above shows the advantage of the axis on the secondary bars isn't so little, but, you're right: I can't be sure it's of practical value at this point.

But I have to think packed bars are significantly better than treemaps at understanding the primary bars, both for absolute and relative value estimations.

Quote:
Order. A treemap standard squarifying algorithm has the advantage of spiral like parsing direction of tiles in descending size order, order which can hardly be achieved in a Packed Bar without seriously affecting the right edge (first picture suggests a top->down, down->top order, but I cannot be sure).

I updated by post to explain the order a little better. In most of the examples, the secondary bars are ordered left to right by size which really means the left edges of the bars are ordered left to right, regardless of row. If bar A's left edge is to farther left than bar B's left edge, then bar A has a larger size.

In at least one example, I ordered the secondary bars alphabetically for interactive look-up of names; however, that does result in a ragged right edge.

Quote:
Hierarchy. Obviously no extra level depth is possible.


Yes, not trying to make a better treemap.

Quote:
Amount of items. As the author mentioned, is more difficult to estimate tiles they go below pixel size. 1x12 is still visible, but 0.3x12 ?

Yes, that does happens with 15K+ categories, but they still work to contribute context. That is, 15 bars of width 0.3 will contribute 4.5 to the total length.

Quote:
Manual tuning. A treemap standard tiling algorithm will always fit 100% the reserved rectangular space, disregarding the amount of items and their values, unlike a packed bar, which will require a lot of manual tuning (amount of rows) for acceptable fitting result. 


Yes, it's certainly not for all data sets and is not a general replacement for treemaps.

Xan


jlbriggs

Registered:
Posts: 194
Reply with quote  #4 
Perhaps I set the tone somewhat wrong by making such a direct comparison to tree maps.

Danz - extremely thoughtful insight as always. I do, however, disagree with quite a few things :)

I think the only thing that a tree map actually has over the packed bars concept is the hierarchical aspect. But plenty of tree maps are not used in that way. I think if you have a single layer data set, the packed bars make a much more clear alternative.

I think Xan's response covers a lot of my thoughts, so I won't repeat them.

To me, the question is, is the packed bar something that works great for the right data set, but not in general (like the connected scatter plot, for instance)? Or will it be something that can be widely applicable?

More  than the tree map, I would like to see more thought in relation to the wrapped bar chart. Are there different circumstances that would make one better than the other for different purposes, or would the cleaner, more precise layout of the wrapped bars be preferable?



xan

Registered:
Posts: 44
Reply with quote  #5 
I had forgotten about wrapped bars when starting this effort, but I've now added a small bit about it to my blog post (which I will happily update if it misses the mark):
  • Compared to wrapped bars, which reorganize the bars into multiple columns, packed bars are more space efficient, which helps support more categories (1000s instead of 100s), and packed bars add a sense of total area. Wrapped bars support more accurate reading of the “secondary” bar values since their bases are aligned, and wrapped bars should have a lower learning curve since they look more like regular bar charts.
I think the last part is most relevant for new audiences. Wrapped bars have a small hurdle to overcome: the resemblance to separate (possibly linked) bar charts. Packed bars have a bigger hurdle: the resemblance to stacked bars.

Since packed bars use the secondary bars mainly for context, the reduced accuracy is OK (even a feature: "don't focus here"). Wrapped bars would be better when you need a truer sense of those values.

danz

Registered:
Posts: 190
Reply with quote  #6 
Thank you, jlbriggs. I am trying my best, even if we are not always on the same page :)

I did the comparison against treemap, because they share a tiling concept. And, of course, because I had quite of an extensive look into tiling algorithms.

If you like to compare with a Wrapped Bars design, the only little advantage of Packed Bars is the amount of items can fit in the same space. In rest, ambiguous axis sense, not aligned elements, missing clear order, impossible automation of the design (amount of rows) are few too many serious issues to consider Packed Bars a replacement of Wrapped Bars.

However, I do like Xan article, his work has certain value in exploring bars arrangement in a fixed space. I consider his article an invitation to reflection and debate, I wish more participants will express their opinions.

PS. By the time I was writting my post, Xan posted his comment above. I would not be bothered too much about learning curve. If certain concept is valuable, learning it will never be a serios problem.
sfew

Moderator
Registered:
Posts: 823
Reply with quote  #7 
Xan,

In what sense do packed bars provide communicate total area? Are you referring to the total number of values? If so, based on the way that our brains assess large numbers of items, I would argue that packed bars merely communicate that there are "many" values.

__________________
Stephen Few
xan

Registered:
Posts: 44
Reply with quote  #8 
By total area, I was referring to the 2D area of the blue and gray bars combined, which is usually rectangle, like in the S&P 500 example. There we can estimate, for instance, that the sum of the primary bars if about 15 - 20% of the total.

With the drug trials example, we can see that the primary bars account for about half of the total reported side effects.

With many rows or with really skewed data, like the GDP with 7 rows, the combined area is not a rectangle. Then it's not so easy to estimate a proportion, but I think it's informative in highlighting the skew. One feature of those cases, is that you can readily tell that the top bar accounts for over 1/nth of the grand sum, where n is the number of rows.
sfew

Moderator
Registered:
Posts: 823
Reply with quote  #9 
Xan,

We can roughly estimate the proportion of total area that the highlighted bars on the left represent, but this seems to be the only area comparison that can be done with a potentially useful degree of precision. This comparison, however, is fairly arbitrary, for its usefulness depends on highlighting a set of bars that one would want to compare to the total. If the highlighted number of bars represented a 20% of the total bars, this particular area comparison could tell us of the data set exhibited the Pareto Principle, but it will never represent 20% of the total if the data set is large.

__________________
Stephen Few
xan

Registered:
Posts: 44
Reply with quote  #10 
Quote:
If the highlighted number of bars represented a 20% of the total bars, this particular area comparison could tell us of the data set exhibited the Pareto Principle, but it will never represent 20% of the total if the data set is large.


Steve, I'm not sure what you mean by large, but 20% or the value total occurs pretty often with 100s of categories. Maybe you meant 20% of the count representing 80% of the area. The packed bars usually highlight much fewer than 20% of the bars, so it takes less that 80% of the area to suggestion a Pareto-like skew.

Here are two packed bar charts of Medicare drug spending. Both show the Pareto-like skew. And since they're on the same scale, together they illustrate another use of having total area available: comparing the two value totals. The 2015 spending can be estimated as close to 2x the 2011 spending (pretty accurately because of the axis, and roughly without it).


medicareSpending.png 
Thanks for taking the time here. I realize that the usefulness of these charts (whatever that might be) requires some learning (or unlearning) and that my blog exposition was terse (blog-sized). However, other useful chart forms require some learning (horizon charts and parallel coordinates come to mind; jlbriggs mentioned connected scatter plots in this category).


Speaking of my terse exposition, I mentioned without support in my blog post that some sense of distribution can be gleaned from the sizes of the bars (highlighted and secondary). You've hit on the main aspect: that one can sense a Pareto-like exponential distribution when it exists. To demonstrate, here are three packed bar charts of data with different values distributions: exponential, linear, and uniform. I think the difference is visible.

athleticRoyalties2014.png 

collegeEnrollment2014.png 

collegeTuition2014.png 










danz

Registered:
Posts: 190
Reply with quote  #11 
That proportion can be estimated only if first column of values are quite similar. That estimation becomes more difficult when first column of values vary more or the bars fit poorly the rectangle (Xan skewed values example). 

==============
 
We all know that bars length can be easier compared, but their alignment is very important as well. Some time ago in an exchange of emails with Steve (around the time of Steve Wrapped Bars article), it came up an interesting idea I never had the time to research properly. The design was even announced by Steve on this forum (Wrapped Bars article comments), under the name of Layered Graphs yet it was never presented in a proper post. That design was considered a viable alternative to Wrapped Bars.

In short if all the values are aligned to the same edge sharing the same axis and values are drawn in descending order and instead of WRAPPING on next column, we LAYER them in top of the previous column, we can get pictures (in very first drafts) like the following:


unnamed1.png

unnamed2.png  


The design easily accommodates a large amount of values, both positive and negative values, making the comparison way easier than in a packed bars. I am aware that colors choice can be somehow limited and requires a more thoughtful interpretation, some values (especially small values or many similar values) can overlap, the design can be also confused with stacked bars, yet the Layered Graph can easily use bars, lines, area or dots in a vertical or horizontal layout. The initial name of the design, Overlapped Graphs was dropped in the favor of Layered Graphs. 

I share the idea with the participants of this forum, I admit I don't have the right skills or time to write an articulated post about it. More than that, is holiday time for me, so in case will be any interest around this design I will answer back in a couple of weeks or so. 

Dan



sfew

Moderator
Registered:
Posts: 823
Reply with quote  #12 
Xan,

The Pareto Principle would be exhibited if 20% of the number of values (e.g., 20 out of 100) represented roughly 80% of the total sum (not count) of the values. Wrapped bars could indeed exhibit this if the display consisted of 100 values, but there is no reason to use packed bars for such a small data set. We can display 100 values in a regular bar graph. Neither of your Medicare examples can be read to detect a Pareto-like pattern. Even though they both consist of 20 featured bars, they consist of far more than 100 values in total, which prevents a Pareto assessment.

__________________
Stephen Few
xan

Registered:
Posts: 44
Reply with quote  #13 
Dan, thanks for pointers. I missed Layered Graphs, but it does look interesting.

Steve, I see I misunderstood your "20% of the total" to be about value sum rather that counts. Yes, the number of primary bars will rarely come to 20% of the number of bars. However, 20/80 is just one cut point of an ideal Pareto distribution.

I think having less than 1% of the bars representing more than 10% is the value sum is an indicator of a Pareto-like distribution. However, it's an interesting idea to look at how the 20% are represented by coloring the top 20% a different color:

drug20.png 
That's ~600 red bars and ~2400 blue bars.


Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Easily create a Forum Website with Website Toolbox.