Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment  
bpierce

Moderator
Registered:
Posts: 98
Reply with quote  #1 
For the July/August/September 2014 Visual Business Intelligence Newsletter article, titled Distribution Displays, Conventional and Potential, Stephen describes the five most common graphs for displaying distributions, examines their strengths and weaknesses, and proposes ways that some of them could be enhanced by software vendors to improve their effectiveness.
 
What are your thoughts about the article? We invite you to post your comments here.

-Bryan
jlbriggs

Registered:
Posts: 194
Reply with quote  #2 
Great article.  A nice distributions primer that should be useful for introducing the concept to a user base who is, as mentioned, often neglectful of their importance.

"Now, to check for more subtle differences in the shape of Clinic A’s and Doctor’s Office A’s distributions, imagine holding down a mouse button to have the following set of frequency polygons temporarily appear..."

I use almost exactly this functionality on several of my dashboards, where a box plot is displayed for the sake of space and simplicity, and the user can click the box plot to pop open a full histogram and statistics summary table.  Several can be opened at once and dragged around the screen/ resized at will.

A link on each histogram can be clicked to load the data in a separate full page display that includes additional statistical analyses and visualization.

To overcome the 'problem' of histograms not showing the central tendency, I usually include a box plot of the same data below the histogram, or if I need only the center line will plot a vertical line at the median (or mean) point of the chart (and often will include controls that allow the user to click to show/hide the relevant measure...whether it's median and quartiles, or mean and standard deviation / normal distribution curve, etc).

I do this all with custom designed web based dashboards making heavy use of a variety of jQuery tools, along with a variety of custom developed statistical calculation tools in PHP (though we are currently trying to make use of the ability to tie R into the back end, and run it through pre-configured commands sent through PHP).

I like the hierarchical histogram concept. I've used similar displays in standard bar charts (large bars for a weekly number, small inner bars for the daily numbers), but not in a histogram.  I'm sure I have some examples where it will be useful.




TomMowlam

Registered:
Posts: 1
Reply with quote  #3 
Awesome article as usual. I really look forward to these.

Quote:

...imagine holding down a mouse button to have the following set of frequency polygons temporarily appear. ... What couldn’t be seen in the box plot can now be seen with ease. I’d love this feature, but I’ve not yet found it.
in a product.


I am sorry to be boring if this has been covered elsewhere but, for display in browsers, interactive charts using d3.js (http://d3js.org/) could give this sort of interactivity... the data is separated from the display, allowing the display to be changed easily - a simple example: http://bl.ocks.org/mbostock/3885705 

A gallery of D3 examples are here (ignore the pie-charts, they haven't had the good fortune to read Stephen's book on dashboard design ;-)): 
http://bl.ocks.org/mbostock/3943967

...most are not interactive, and the majority of interactions are about changing from one display to another within the same chart-type, but I believe it should be possible to change the chart type as well...

There is even a bullet chart example, so these authors could be a good point of contact for Stephen if he wanted to collaborate:
http://bl.ocks.org/mbostock/4061961
sfew

Moderator
Registered:
Posts: 812
Reply with quote  #4 
Tom,

Mike Bostock's graphics programming tool D3 is wonderful, but I don't feature it because it isn't a exploratory data analysis tool, nor is it a data presentation tool that is accessible to non-programmers. It is powerful and flexible, but accessible to relatively few. Mike intended that it to be used by graphic designers with programming skills, not be a broad, general audience.

__________________
Stephen Few
danz

Registered:
Posts: 186
Reply with quote  #5 
What I learned during time is that "conventional" is easier to understand by a larger audience, while innovations are rather rejected. Distribution displays in the order of perception are (based on my audience feedback): strip-plot, histogram, box-plot, frequency polygon. Because a strip-plot was the most intuitive for them, I quickly came up with a "quantile plot" using an XY dot graph which measures the values on Y axis and the rank of the values on X axis. However, as Stephen mentioned in his article, space might be an issue being limited to few dozens dots.  


orig_sfew_m.png 
 
For a higher density of information I used circles instead of filled dots, I found this technique easier and more printer friendly than transparency. A combination of both can be also beneficial. Still it looked for me that enough space was wasted in such of representation.

Some time ago, I had an exchange of emails with Stephen regarding a potential dense representation of data using an overlapped or layered technique which I found it beneficial for dot charts and even for bar charts. While I never found the time to write about it, I use this opportunity to show how this technique can improve usage of space for a "quantile plot".

layered_m.png   

As you can easily see, I just split in four Stephen graph and overlap the pieces in just one representation. Second version uses the color background for quartiles. This is a very compact and clear representation of a quantile plot. You can consider, if you want, that is a variation of a wrapped graph. Obviously a horizontal design is also possible.

layered2_m.png 

Similar ideas I used to enhance histograms and frequency polygons with a great deal of details, the results you can see in the following images.

histo-plot-m.pngfreq-plot-m.png   

As I mentioned before, circle representation (instead of filled dot) allows even a higher density of information in the same available space, but for the purpose of my remarks I just used Stephen graphs, cut them and rearrange the pieces in these alternative displays.

A strip-plot, even is intuitive as representation, provides no statistical information and is limited to a relatively small amount of elements. A flexible approach which combines a quantile-plot with other representations using techniques like wrapping and overlapping (layering) can produce a better output, both intuitive and consistent in statistical information.

Dan

giannisg5

Registered:
Posts: 2
Reply with quote  #6 
Hi Stephen.
"Quite often, when we use box plots to compare distributions, there are moments when it would be useful to 
see the shape of the distributions in greater detail. Wouldn't it be nice if we could press a mouse button while 
hovering over a control and have that action causes the boxes to be temporarily replaced with histograms or 
frequency polygons, then switch immediately back to the boxes when the button is released?"
 
QlikView has a similar functionality. It is called fast type change and you can easily change between different visualizations on the same chart (see attached images). The only difference being that you need to click, not hover.
 
Thank you, Ioannis.

FromTable.PNG  ToBars.PNG  OrLineChart.PNG 

sfew

Moderator
Registered:
Posts: 812
Reply with quote  #7 
Ioannis,

Does Qlik support a box plot?

__________________
Stephen Few
giannisg5

Registered:
Posts: 2
Reply with quote  #8 
Hi Stephen.
yes it does. And it is native to the product. I am mentioning that because you can also use custom made or 3rd party charts and visualizations in QlikView, but box plot is there by default :

Capture.PNG 

Apologies for the poor example. Generally speaking, QlikView is a very powerful tool, of course with some disadvantages and flaws like every tool :)
I am really looking forward to meet you in person next year in the Copenhagen workshop!
BR, Ioannis.

UPDATE : Box plot is not included in the fast type change functionality I described in my previous post.

jlbriggs

Registered:
Posts: 194
Reply with quote  #9 
Stephen - I fully understand your point in regard to ad-hoc design vs an established tool, but I think it's still important to talk about programmatic solutions.

While they may not be something that the average end user can implement themselves, there are certainly plenty of end users working for a company who does have programmers and designers available to build web based dashboards and analysis tools.

I think that many such people would benefit greatly from pushing internal design in lieu of hoping and waiting for vendors to offer tools that provide the functionality they need (thankfully, my company is one such place, and I have been able to do a great deal of work building web based dashboards data analysis tools for our users).
Berry

Registered:
Posts: 6
Reply with quote  #10 
Nice overview - certainly worth passing it on to my students when I'm discussing distributions!

In the last two histograms, you label e.g. the second bar 20s, while it indicates the people between 10 and 20, so they would be in their 10s. Is that correct?
I like to avoid this kind of potential confusion by just labeling the histogram bar borders, as R does by default. That's also much cleaner and easier to read than labels like >=20 & <25 and >=25 & <30. Is there a drawback to this custom I do not see yet?

sfew

Moderator
Registered:
Posts: 812
Reply with quote  #11 
Berry,

People label the ranges in histograms and frequency polygons in various ways, some of which are ambiguous. I try to eliminate the ambiguity. In the last two histograms the age labels 10s, 20s, etc., mean what they suggest: the label 10s refers to people from age 10 through 19, the label 20s refers to people from 20 through 29, etc. I avoid labeling just the bar borders as it is often done by software such as R because it isn't clear. For example, would a bar with borders that span the scale from 10 to 20 include people who are age 20? Unless you know the convention and are confident that the software interprets the convention as you do, you really don't know the answer.

__________________
Stephen Few
Derek_C

Registered:
Posts: 69
Reply with quote  #12 
I like danz's overlapping quartiles, and it occurs to me there's no reason they have to be equal in size. For instance, in a distribution of income in a population, we are often more interested in the top 1% than the bottom 1%. A variant of danz's scheme could show the 100-10%, 10-1%, 1-0.1%, 0.1-0.01%, and even more rarified upper income percentiles.  (0.01% is 20,000-50,000 individuals in a population of 200-500 million)  

The horizontal scale need not be logarithmic as such, as long as it's understood to be shifting up a decade for each higher quantile. 

jotrak

Registered:
Posts: 2
Reply with quote  #13 
Hello,
Just wondering if I can put a question out there on dashboards and service catalogues. Hoping to get a better look at examples that deal with monitoring the catalogue usage and effectiveness. Particularly the types of metrics used in a dashboard.

thanks,
John


__________________
john
Previous Topic | Next Topic
Print
Reply

Quick Navigation: