bpierce
Moderator
Registered:1182202224 Posts: 99
Posted 1380822864
Reply with quote
#1
For the October/November/December 2013 Visual Business Intelligence Newsletter article, titled

Variation and Its Discontents , Stephen Few and his colleague Katherine Rowell introduce "funnel plots," which are a type of graph that's not well known, but is nevertheless quite useful. Not to be confused with "funnel charts," funnel plots enable fair comparisons between data samples that vary in size.

What are your thoughts about funnel plots? We invite you to post your comments here.

-Bryan

Mike
Registered:1380826107 Posts: 1
Posted 1380826411
Reply with quote
#2
Nice article. I'm working with data that is essentially a sample size of 1, so it is a good reminder that noise could easily overwhelm any signal. OK, I'll bite on this one: "Suffice it so say that there are better ways to display decreasing values at sequential stages in a process." What would you recommend over the typical sales funnel chart?

GiovanniMilan
Registered:1380878409 Posts: 4
Posted 1380881808
Reply with quote
#3
In my opinion you'd better display O/E ratios on a log-axis, so that a 4/2=2.0 ratio would stray from the reference value (which is 1.0: why did you take the sample mean, instead?) the same of a 2/1=0.5 ratio. Of course, you couldn't display 0/n ratios this way but they are significant only in big samples. Moreover, being the the O/E sampling distribution asymmetrical you'd better use a log-normal to compute the confidence limits of the funnel plot at the bottom on page 8 (as you did in the graph on page 6, I suppose: they are asymmetrical). This issue doesn't apply to percentages and proportions, so I think that the graph at the top on page 8 is correct. Here you can find some useful templates, especially in epidemiology: - for counting data: http://www.apho.org.uk/resource/view.aspx?RID=47239 - for proportions: http://www.apho.org.uk/resource/view.aspx?RID=47241 But... instead of drawing those boundary lines, which are too puzzling to untrained people, I suggest you could just draw the outliers in a different colour: it would result in a much plainer 'unfun(nel) plot', conveying the same information.

bpierce
Moderator
Registered:1182202224 Posts: 99
Posted 1380908698
Reply with quote
#4
Hi Mike, In response to your question about sales funnel charts, Stephen asked me to post the section that addresses them from his book Information Dashboard Design , Second Edition:Quote:

The funnel chart is one of the most absurd graphs that is often used on dashboards. What it intends to do isn't absurd, but the way it's usually designed is. The intention is usually to display a count of items or amounts of revenue that flows through a series of sequential stages and decreases in value with each. More than anything else, funnel charts are used to track sales from initial leads, stage by stage, until those that remain result in actual revenue. The following example tracks visitors to a website, which shows decreasing numbers of those who progress through greater stages of commitment to the site.

In this case, the funnel metaphor was definitely taken too literally. As a metaphor, a funnel works well, but depicting the values of each stage as parts of an actual funnel does not. Even if the parts were accurately sized to represent the number of visitors in each stage, it wouldn't support useful comparisons. We must rely on the numbers to see and compare the values as they decrease. This perceptual problem is easily solved, however, by using a simple bar graph, such as the one below.

A funnel process may be enriched in several ways. In the example below, short black lines appear relative to the blue bars to show the previous day’s values, and a second graph with red bars directly displays the percentage loss between each stage of the process and the next.

-Bryan

sfew
Moderator
Registered:1135986598 Posts: 823
Posted 1380934972
Reply with quote
#5
Giovanni, I'm familiar with the templates that were created by APHO. My Excel template was partially derived from an earlier version of APHO's template. Regarding your suggestions, are you essentially saying that I should have transformed the values logarithmically to make the distribution normal before placing them in the funnel plot? Regarding the boundary lines, I don't agree that people would have any difficulty understanding them after a brief explanation. Seeing the boundaries rather than merely highlighting the outliers is informative.
__________________ Stephen Few

GiovanniMilan
Registered:1380878409 Posts: 4
Posted 1380987893
Reply with quote
#6
Stephen, thanks for you prompt reply. First I think you should stress that the aim of a funnel plot is showing the results of the comparison of each value in the data with a fixed refence value. We are performing a statistical test on each single value to see if it collocates outside the confidence bands. If so, than we can say that it's statistically different from the refence value. Reading a funnel plot it's easy to fall in the pitfall of comparing the values of two points, instead of one single point and the reference value. That's a crucial point. To answer your question, I'd like to point out three different issues: 1. Calculation of confidence intervals. The O/E ratio is a statistical variable with a very complex sample distribution because its numerator can be assumed to follow a simple Poisson sample distribution, while its denominator is a weighted average of many Poisson variables. Anyway, the range of the O/E ratio goes from 0 to infinity, so it clearly has a right-skewed distribution which can't be correctly approximated by the normal. (See: Fleiss, Rothman and other widely used manuals of Epidemiology). The simplest way to cope with this issue is by applying a logarithmic (or a square-root) transformation to your data before calculating confidence limits based on the normal. Then, when you'll take their anti-log you will get correctly right-skewed confidence intervals. By the way, in the graph on page 6 you show a series of O/E values with their confidence intervals. I can see that many of them are asymmetrical, so they can't be based on the normal curve. But it puzzles me that some of them (the first red one on the left, for instance) are left-skewed instead of all being right-skewed. That's why I wonder where they come from. 2. Calculation of control limits in a funnel plot. First, you should choose your reference value: external (usually 1.0 for O/E ratios) or internal (usually the mean). In the caterpillar plot on page 6 and in your funnel plot you use different reference values (1.0 in the first case, the mean of your data in the second), so the two results aren't comparable. Then, you can calculate your control limits applying a trasformation as above or some more complicated statistical technics. In any case, for O/E ratios they (I mean, their antilog) will result to be asymmetrical. 3. Representation. As O/E is a ratio, I think it should be represented with a logarithmic, then an additive scale. So, my answer is that you should have transformed the values logarithmically to make the distribution normal before calculating the control limits, then plot their log and the log of the control limits in the funnel plot on a logarithmic scale, but showing their antilog on the scale labels. (... I agree that it'd be a nightmare in Excel!) 4. Post-Surgical_Mortality_OE_Ratios_Funnel_Plot.xlsx 'Incident( case)s' and 'Incident Rate' have a precise meaning in epidemiology, very different from 'Observed' and 'O/E Ratio'. The labels in the columns 'J', 'K' and 'M' should be corrected accordingly. The labels of the y-axys are wrongly formatted as '%' (or is it a conversion bug of my old MS Excel 2002?).

sfew
Moderator
Registered:1135986598 Posts: 823
Posted 1380996978
Reply with quote
#7
Giovanni, Thanks for taking the time to bring these problems to my attention. Regarding your initial point, I believe that the funnel plot serves a broader purpose than merely comparing individual values to a reference value. It also makes it possible for us to compare individual values to one another in light of varying degrees of confidence due to differing sample sizes. In other words, it allows us to say, this is greater than that but the difference could be entirely due to chance, not an actual difference in performance that should concern us. Regarding item #1, I understand your point about the O/E ratio distribution being positively skewed and therefore needing a transformation to make it normal in shape before the mean and standard deviation based calculations for confidence intervals can be applied. I failed to do this and, more importantly, I failed to address this issue in the article, which I will soon correct in a revised version of the article. Thank you for pointing out the error. Regarding item #2, using the observed vs. expected (O/E) ratio as an example in this article was probably a poor choice. It is too complicated a measure to use when simply trying to demonstrate a funnel plot. I chose to use the value of 1.0 as the reference for the caterpillar plot and the mean as the reference in the funnel plot, however, for a meaningful reason. I was using the caterpillar plot to illustrate how measures like this are typically compared in the healthcare community. I was using the funnel plot to show a better way of comparing the measures, which involves a statistical measure of center rather than the value of 1.0. My error was in failing to transform the distribution into normal form before displaying it in a funnel plot, as described above.

Regarding item #3, I understand why the log transformation should have been done, but not why the plot would need a log scale on the Y-axis. It is complicated enough as it is. Using a log scale would only create confusion for the typical users of this data in the healthcare community.

Regarding item #4, thanks for pointing out that I failed to change some of the terms in the Post-Surgical Mortality O/E Ratios version of the funnel plot Excel to reflect the appropriate terms regarding O/E ratios. I have now fixed this in the Excel file. You’re still using Excel 2002 (actually, I believe it’s 2003)? Unfortunately, I cannot blame the improperly formatted labels on your old version of Excel. I have corrected the formatting to reflect a ratio without the inappropriate percentage sign. Thanks!

__________________ Stephen Few

sfew
Moderator
Registered:1135986598 Posts: 823
Posted 1381008783
Reply with quote
#8
Mea culpa once again. I have now realized that I made another mistake in the article in that the Excel template that I provided for funnel plots will only work for proportional measures (rates between 0 and 1; percentages between 0% and 100%). Proportional values are produced when you divide the number of times a particular outcome occurred by the total number of possibilities, such as the number of post-surgical infections that occurred divided by the total number of surgeries. In the article, I used the example of an observed vs. expected (O/E) mortality ratio to illustrate the funnel plot, which produces values greater than one and therefore cannot be handled by the calculations that I built into the Excel Funnel Chart Template. I will get all of my errors fixed soon and reissue the article. Please forgive my errors. They occurred because, in my excitement, I chose to write about funnel plots before taking the time to thoroughly understand them.
__________________ Stephen Few

Berry
Registered:1380833438 Posts: 6
Posted 1381685387
Reply with quote
#9
Hey there, I implemented funnelplots for proportions in R.https://dl.dropboxusercontent.com/u/4836866/Sonstiges/FunnelPlotsBerry.r This was a nice programming excercise, as I tried a couple of new things. Any feedback and ideas for a suitable package are very welcome! Berry Potsdam, Germany

acraft
Registered:1306510245 Posts: 51
Posted 1381941847
Reply with quote
#10
I like how funnel plots work and have already used them for some of our data: A system of reports I'm working on looks at invoices received from vendors that need to be processed, and one of its concerns is the rate at which invoices are problematic (as these take additional time to resolve). I was able to plot problem rates by department, vendor, etc., which gives us some good information (specifically, where to focus process improvement methods).
However, the main project is a dashboard, and I'm reluctant to add a funnel plot to it, as it seems a little complex for the user to interpret quickly. Does anybody consider funnel plots to be a useful display for a dashboard? Maybe it's just me.
Though without a funnel plot, how else might one compare measurements on a dashboard while taking variation in sample size into account? The best I can do right now is use the usual barplot/ranking method described at the start of the article, but use actual problem invoice counts and not rates. This might meet my customer's needs, but they do have an interest in the rates still.
Any ideas?

sfew
Moderator
Registered:1135986598 Posts: 823
Posted 1381942591
Reply with quote
#11
acraft, A funnel plot is unfamiliar to most people, but not difficult to understand given a brief explanation. Public health organizations in the UK have been using them for several years with great success. Assuming that the dashboard that you mentioned will be used by people on a regular basis, it is worth a few minutes of time to introduce an unfamiliar form of display that adds significant value but requires relatively little training. Always sticking to what's already familiar won't get you very far.
__________________ Stephen Few

acraft
Registered:1306510245 Posts: 51
Posted 1381950634
Reply with quote
#12
Thanks Stephen. I'm not so much concerned with familiarity of the funnel plot (I have no problem introducing new forms of visualization on dashboards) as I am accessibility of the information presented.
Seeing that entities lie outside the 99.8% is certainly useful, but to the user they are just dots. Do I annotate these entities? If so, what happens when they become cluttered and the plot becomes difficult to read? I could require the user to hover or click on them, though I normally reserve interaction for deeper analysis and not things that the user should be able to see clearly on the dashboard.
Another question that might resolve that issue - while I understand the reasons for not ranking entities due to variation in sample size (particularly for the entities with small sample sizes), if we can identify entities that are outside the 99.8% lines, are they fair game for ranking? Obviously they become items-of-interest, but is it still a mistake to list them sorted by value, perhaps with a barchart for quick comparison? It seems that would be okay, or am I overlooking something?

sfew
Moderator
Registered:1135986598 Posts: 823
Posted 1381951856
Reply with quote
#13
acraft, You're asking good questions. I appreciate the fact that you don't want to clutter a funnel plot that is part of a dashboard by labeling dots that fall outside the limits. I recommend that you identify them in two ways:Enable access to information about them in response to hovering over them, as you suggested. Make it possible for the user to click on the funnel plot to access a complementary display separate from the dashboard that provides more information about the values that exceed the limits. This could be another graph with more detail, a table, or both, plus some functionality for interacting with the data for analytical purposes. In a separate display with more details, you could order the items in some way, but when sample sizes vary significantly, the ranking could be misleading. You can alleviate this problem somewhat, however, by showing the varying levels of uncertainty for each of the items and ranking them by the degree to which they exceed the limits rather than on the Y-axis value in the funnel plot.
__________________ Stephen Few