Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment   Page 1 of 2      1   2   Next
Porter_Thorndike

Registered:
Posts: 3
Reply with quote  #1 

Hello Stephen, I have a question about the best way to visualize three ordinal metrics derived from survey data.  For this example let’s assume that I want to compare 25 products based on three survey questions that respondents would score from 1-10.  Those questions are

1. Rate the product's visual appeal
2. Rate the product's taste
3. Rate the product's perceived value

Is it credible to visualize these three metrics on a scatterplot with the z-axis being color or size like seen below?

product.png 

Is there a better way to visualize data like this?

Assuming that this is a good way to visualize this data, how would you decide which metric to use as the x, y and z axis?  Usually users look to the upper right in a visualization like this and assume these are the best performers.  The products in the upper right could change based on the selection of which metrics to use on which axis.



__________________
Porter Thorndike
Stephen Few Superfan
jannepyykko

Registered:
Posts: 39
Reply with quote  #2 
Hello,

If your goal is to create a graph for print media, here's my answer -- with a different data set, but you probably get the point.

[23243610636_8a2fb25c31_b] 

I have here ordered each column independently to emphasize Nordic countries among others. In your case, if there is no reason to emphasize any group of products particularly, then you better sort products based on one variable (left column) and display other variables not sorted.

Regards,
-Janne

__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
danz

Registered:
Posts: 190
Reply with quote  #3 

Porter,

This is no "best way to visualize three ordinal metrics" without defining an objective first.

I can only assume you are during learning process of the basics of data visualization and for that you are using Tableau, one of the most prominent tool on the market. Learning by example is what Tableau offers by providing Show Me panel. However, it does not say much about the scope of the visualization, but the required type of information. In order to perform analysis, you need to define your objectives before you perform visual analysis. Once you have that the task of selecting the most appropriate method becomes easier.

For me your question sounds very much like: "what can we analyse having information of three ordinal metrics for X products and what are the most appropriate visualizations for each of them".

In a random order, here are just a few:

1. Independent absolute values comparison: side by side independent sorted bar charts (as Janne suggested)
2. Independent values distribution: side by side independent stripplot, histograms, frequency polygons, box-plots, quantile plots
3. Correlation between absolute values of two measures with extra information added from the third: bubble chart (better than your example)
4. Correlation between absolute values of three measures: correlated bar charts - usually sorted after one measure, parallel plot, 3x scatter charts - if available space allows it
5. Correlation between the ranking values of the studied measures. I think Tableau has an example they call it bump chart, which is a sort of a parallel plot applied on ranks instead of absolute value
...


Dan

sfew

Moderator
Registered:
Posts: 827
Reply with quote  #4 
Hi Peter,

The approach that you've illustrated is certainly one of the viable ways to show the relationship between three quantitative variables. (To correct one of Dan's statements, these variables are not ordinal--they're quantitative.) The approach that you've shown works fine as long as you only need an approximate sense of the variable that you're displaying using color variation, which I suspect is true in this case. When using color variation for the third variable, I suggest that you make two changes to improve the visualization slightly: 1) change the shape of the data points from diamonds to dots (i.e., filled circles), and 2) change the colors from a diverging scale (two hues) to a sequential scale (one hue only). The diverging scale would be appropriate if there is a break point somewhere in the middle of the quantitative scale that the change from one hue to another would represent. In your case, however, the scale is continuous without a natural break point, so a sequential color scale would represent that best. Despite what I've just said, I suggest that you try switching from color variation to size variation for the third variable. I think you'll find that size differences are slightly more perceptible than color differences. If you do this, I suggest that you switch to an unfilled circle as the mark. That way if values overlap in space, one won't hide the other. Also, whether using color or size variation for the third variable, be sure to scale your axes so that they each begin just slightly below the lowest value and end slightly above the highest value. Currently, the scale on your Y axis extends well beyond the highest value, which wastes the upper portion of the plot area.

The visualization that Janne suggested would only work if you wish to feature a small set of values, as he did by highlighting particular bars. Otherwise, the fact that each variable is sorted in a different order makes it difficult to locate all three values for a single product, which requires you to find the matching label because they are not aligned in the same row. This problem could be addressed in Janne's example by sorted the values on the first variable (i.e., leftmost column of bars) and then keeping them in the same order for all three variables. This approach would provide an equally precise representation of all three variables, but it would not necessarily work better overall than your bubble plot.

__________________
Stephen Few
danz

Registered:
Posts: 190
Reply with quote  #5 
True, Stephen. They are quantitative measures, "ordinal" term belongs to Peter statement (and topic title), which I quoted, but I should have mention that myself.
Porter_Thorndike

Registered:
Posts: 3
Reply with quote  #6 
Thank you Stephen, Dan and Janne for your replies.

Do you feel consumers of a visualization like this have any bias around the importance of the x, y or z axis?  Do they assign greater importance to the y-axis as opposed to the z axis as an example, even if this is incorrect.

__________________
Porter Thorndike
Stephen Few Superfan
sfew

Moderator
Registered:
Posts: 827
Reply with quote  #7 
Peter,

The answer to your question probably depends on the nature of your audience. Those who are well versed in the use of graphs like this would usually be inclined to see the variables on the X and Y axes as more important than the variable that uses size or color rather than a scale on an axis. This is because the use of 2-D location in relation to an axis represents values in a way that is easier to interpret and compare. Someone who is untrained, however, might be inclined to assign more importance to a variable that is represented using size if it is salient and catches their eyes. If there is a hierarchy of importance among the variables that you want your audience to understand, you might want to give the chart a title that emphasizes what's most important.

__________________
Stephen Few
jannepyykko

Registered:
Posts: 39
Reply with quote  #8 
Porter,

To make a small supplement, here's one of the slides that I show when talking about scatterplots in general -- and particularly Gapminder World, perhaps the most well-known scatterplot.

[23291810636_d8bae6a34e_b] 
The point is, there are both good and bad ways to select four variables into a scatterplot: X, Y, size, color.

First, you should make a difference between absolute variables and relative variables. With relative variables, you can compare smaller and bigger countries (or regions, products, clients, anything) in a reasonable manner, because they easily fit into the same scale. (The distinction absolute/relative is not always clear.)

Secondly, you should recognize the benefit of grouping variables that provide structure and make the scatterplot more enlightening.

Best selection:
  • X (relative variable), Y (relative variable), size (absolute variable), color (grouping variable)

Other good selections:
  • Use absolute variables for both X and Y axis.
  • Use any variable type in color.

Bad selection:
  • Use a relative variable in X and an absolute variable in Y (or the other way round).

In your case, your variables are visual appeal (relative), taste (relative), and perceived value (absolute if measured in total sales or profit). Based on this, it's easy to make a choice:
  • X (visual appeal), Y (taste), size (perceived value), color (for this you can use product group, if available)
In addition, if possible, when selecting variables for X and Y, X should be the influential variable and Y should be the result. Because taste could be seen as a result of visual appeal, therefore X (visual appeal), Y (taste).

As a summary, this is how I train scatterplots to be best utilized in data analysis. It's not something that I've read from books, so I'm interested to hear, if you have different opinions?

Regards,
-Janne

PS. Porter, oops, after reading your original question again, I recognize that "perceived value" is a relative variable too -- not an absolute one as I supposed above.

__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
sfew

Moderator
Registered:
Posts: 827
Reply with quote  #9 
Janne,

Please explain the reasoning behind your guidelines for the assignment of variables to axes, size, and color. Offhand, they don't make any sense to me. I can think of no reason why rates (what you call relative variables) should be treated differently from other quantitative variables. Also, why are you assuming that the variables "visual appeal" and "taste" are rates? As I understand it, they are not.

__________________
Stephen Few
jannepyykko

Registered:
Posts: 39
Reply with quote  #10 

Stephen,

OK. I try to clarify my reasoning. Let's see how far I can get...

First, I think it's reasonable to make a distinction between (what I call) absolute variables (such as total sales, total expenses, total profit) and relative variables or rates (profit %). Many novel products sell less, while some others have kept selling hundred or thousand times more for long. If you compare total sales, the graph shows that new products are small -- and you hardly can make a distinction between many small products, because the big ones dominate the scale. On the other hand, if you put products side by side using profit %, the potential of many new products appear in a meaningful comparison to bigger ones and among each other.

Then, if you use a scatter plot as an analysis tool, you probably want to utilize variables that show hundreds of products in a meaningful comparison among each other. Thus, it makes sense to me to prefer relative variables for both axis, such as profit %, GDP per capita, etc. (Hans Rosling uses relative variables in his Gapminder videos.)

When there are hundreds or thousands of data points in a scatter plot, such as products, it's hard to identify them. To recognize individual products more easily (if that's important), the other two variables, size and color, are able to help you do this by emphasizing the magnitude of a product (size, absolute variable) and type (color).

That's how I concluded the best selection of variable types in a scatterplot.

I also gave one example of bad variable type selection in a scatterplot. I admit I cannot give valid reasons for that -- other than I don't remember that kind of variable type selection used in a beneficial manner. So I withdraw this guideline.

PS. When many persons answer survey questions and give points to "visual appeal", finally you calculate the average of all answers = sum of points given to a product divided by the number of persons that gave points. That qualifies a relative variable or a rate to me.


__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
sfew

Moderator
Registered:
Posts: 827
Reply with quote  #11 
Janne,

Yes, it makes sense to make a distinction between rates and actual quantities. As you said, for many purposes rates enable more useful comparisons between things that vary significant in quantity. Not for all purposes, however. Whether you use rates or quantities on the axes of scatter plots depends on what you wish to see and compare. This is sometimes rates (e.g., number of traffic deaths per 100,000 people) and sometimes actual quantities (e.g., number of traffic deaths). Both are useful.

When size or color are used to represent quantitative values in scatterplots, the same reasoning applies. We choose between actual values and rates depending on what we want to understand, not because one has more inherent value than the other.

A measure of average (e.g., the mean) is not a rate, it is a measure of the center of a distribution of values. It is an example of an aggregate value, along with a sum, in that it summarizes (i.e., aggregates) a series of individual values as a single number.


__________________
Stephen Few
jannepyykko

Registered:
Posts: 39
Reply with quote  #12 

Hmm. Albeit I'm far from the original post above, I'd like to elaborate the distinction between (what I call) absolute and relative variables.

While Hans Rosling shows in his TED video "Children per woman (total fertility)" and "Life expectancy (years)", these are relative variables. If you're not sure, here's how to decide.

1. Take measures of two countries, such as population (US 317 million, Nigeria 173 million), children per woman (US 2.0, Nigeria 6.0), and life expectancy (US 77 years, Nigeria 60 years).

2. Now combine the countries and calculate measures again, such as population (317+173 = 484 million), children per woman (something between 2 and 6), and life expectancy (between 60 and 77).

If you can use simple addition, it's an absolute variable (that disables meaningful comparisons of small things in a graph).

If you can't use simple addition, it's a relative variable (that allows useful comparisons between things that vary significantly in quantity).


__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
danz

Registered:
Posts: 190
Reply with quote  #13 
Janne,

Absolute vs. relative.
As far as I understand you use "absolute" and "relative" terms to express the degree of summability sense for certain variables. I am not a native speaker of English but I have to say these terms do not sound right for me.

How do you see the following: absolute or relative? Temperature, Price, Age. Absolute because they are a result of a direct measurement or relative because their values make no sense to be added together?

Scatter plot usage restriction.
I have volume of sales, quantities and prices this year and previous year for 2000 products. I extend my variables with extra calculations price variation in %, the volume sales variation in % and quantities variations in %. Can I combine in a scatter chart the following? Last years vs current years sales? Current year sales vs price percent variation? Current year sales vs volume sales variation?  Current year quantities vs price variation? Current years sales vs last year sales vs growth?
Almost any combination of last year sales, last year quantities, last year prices, this year sales, this year quantities, price variation, sales volume variation, quantity variation can be used as measures in a scatter/bubble graph. I see no obvious restriction other than common sense to combine any 2 or 3 of them (this year sales vs last year prices makes not to much sense to me).

Derived variables (rates are derived calculations) are important for good analysis. It does not mean we cannot analyze them in relation with original measurements. I would dare to say that sometimes it becomes mandatory.


Dan

jannepyykko

Registered:
Posts: 39
Reply with quote  #14 
Dan,

Absolute vs. relative variables: Your questions are good, because you found a flaw in the terminology. My objective is to find terms for two types of variables that either allow or don't allow you to easily compare things that vary significantly in quantity. Terms "absolute" (total sales, total profit) and "relative" (profit %) work pretty well for financial variables. However, Temperature, Price, and Age make me search better alternatives, because I would classify them "relative" (in the sense that you can effortlessly compare mean temperatures of large cities and small villages).

I'm not a native English speaker either. I wish somebody would see the necessity for terms and help.

Scatter plot usage restriction: As I wrote earlier, on a second thought, I cannot defend my original argument for usage restriction on using different types of variables for X and Y axis, so I withdrew it. I would like to see a good example of that, though.


__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
danz

Registered:
Posts: 190
Reply with quote  #15 
Janne,

I see no diferrence in studying values varying between 0 and 100 or values varying between 0 and 100 milions.

What I think it bothers you is the way any unbalanced distribution of values makes a scatter/bubble plot look very cluttered in a small area. In such of cases, the outliers worth to be studied in separate views. Just be aware of the outliers identification in multivariate analysis. The metrics required for calculation of the median and distances in 2-3 dimensions are quite different than those calculated for a boxplot (1 dimension). Tukey boxplot generalization for 2 dimensions is called Tukey bagplot. The shape and the orientation of the "bag", isolated from outliers can be visually interpreted so nice.

Unfortunatelly, none of the major tools I checked were able to provide bagplots as an aid of bivariate analysis (scatter/bubble plot is the only decent possibility), yet they provide boxplot as a simplified method of reprezenting a distribution. Probably if they would stop focusing in meaningless, but colorful visualizations like bubble packed chart, voronoi diagrams or wordle, they may eventually build tools we all need.

Dan
Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Easily create a Forum Website with Website Toolbox.