sfew
Moderator
Registered:1135986598 Posts: 803
Posted 1448646610
Reply with quote
#16
Janne, One of the problems with your term "relative" values is that all values are relative. Even a sum is relative to the total set of values included in the sum (i.e., the total divided by one). You are selecting particular examples of relative values and investing them with special merit. It is absolutely true that rates (a sum divided by a number greater than one, such as the number of deaths per 100,000 people or the number of births divided by the number of women) sometimes often advantages over sums. The point that I'm making is that rates are not better than sums, they are merely useful for different purposes. Rates do not have general precedence over sums when selecting the values that are displayed along the axes of a scatterplot. Your term "absolute" values also suffers from the fact that it has a specific mathematical meaning. I would encourage you to consider using more precise terms, such as rates vs sums or additive vs. non-additive variables. Sums are meaningful if you add them up. Rates are not meaningful if you add them up. We must treat them differently. This difference, however, does not give one precedence over the other, however, when creating scatterplots.
__________________ Stephen Few

jannepyykko
Registered:1185524236 Posts: 39
Posted 1448874856
· Edited
Reply with quote
#17
Stephen, Thanks for your valuable input. I try to adopt the terms additive vs. in-additive when describing two types of variables. I previously argued that in Gapminder-like scatterplots, where you want to identify each data point, it's best to use: - X = in-additive variable - Y = in-additive variable - Size = additive variable - Color = grouping variable I admit that I cannot give perfect proof for my argument. However, I took time and went through all predefined graphs in Gapminder (click "Open graph menu" in top-left corner in http://www.gapminder.org/world/). Here they are: Most of the graphs are scatterplots (bubble charts), but not all. Below is what I picked up:Is child mortality falling?

X = GDP per capita [log scale] Y = Child mortality (0-5 year-olds dying per 1,000 born) [log scale] Bubble size = Total population Bubble color = Geographic region (From now on, when size and color are total population and geographic region, I don't mention it.)

The Bangladesh Miracle

X = Child mortality (0-5 year-olds dying per 1,000 born) [log scale] Y = Children per woman (total fertility) CO2 emissions since 1820

X = GDP per capita [log scale] Y = CO2 emissions (tonnes per person) USA or China, who emits most CO2 - Same as previous but different pre-selected countries

People killed in earthquakes - Map based visualization where:

Bubble size = Earthquake - deaths annual number Bubble color = Geographic region People killed in floods - Map based visualization, where:

Bubble size = Floods - deaths annual number Bubble color = Geographic region Asia's rise - how and when? - Line graph, where:

Y = Income per person, with projections [log scale] Bubble size = Total population Bubble Color = Geographic region Does geography matter?

X = GDP per capita [log scale] Y = How far to the north (latitude) Asia best in math

X = GDP per capita [log scale] Y = Math achievement - 8th grade Arab women marry later and later

X = Children per woman (total fertility) Y = Age at 1st marriage (women) High age at marriage – a long tradition - Same as previous but different pre-selected countries

200 years that changed the world

X = GDP per capita [log scale] Y = Life expectancy (years) Africa is not a country - Same as previous but different pre-selected countries

High age at marriage – a long tradition - Same graph as before but now under title "Global Trends"

Smaller families and longer lives

X = Children per woman (total fertility) Y = Life expectancy (years) US spends most on health

X = Total health spending per person (international $) [log scale] Y = Life expectancy (years) Wealth & Health of Nations

X = GDP per capita [log scale] Y = Life expectancy (years) HIV epidemic 1980-2009

X = GDP per capita [log scale] Y = Adults with HIV (%, age 15-49) HIV is concentrated in a few countries - Map based visualization where:

Bubble size = People living with HIV (number, all ages) Color = Adults with HIV (%, age 15-49) The rise, fall and rise of health in Botswana

X = GDP per capita [log scale] Y = Life expectancy (years) Who has the best teeth?

X = GDP per capita [log scale] Y = Bad teeth per child (12 yr) That's it. Here's a summary of variables in X/Y axis:Adults with HIV (%, age 15-49) Age at 1st marriage (women) Bad teeth per child (12 yr) Child mortality (0-5 year-olds dying per 1,000 born) [log scale] Children per woman (total fertility) CO2 emissions (tonnes per person) GDP per capita [log scale] How far to the north (latitude) Income per person, with projections [log scale] Life expectancy (years) Math achievement - 8th grade Total health spending per person (international $) [log scale] All of these are in-additive variables. Moreover, bubble sizes were always additive variables, and bubble color was always a grouping variable -- with only one exception: in the map-based graph "HIV is concentrated in a few countries" the bubble color was an in-additive variable "Adults with HIV (%, age 15-49)". Therefore what I found out was, in every predefined Gapminder scatterplot, the variable type usage was consistent with my recommendation. At least this should give some grounds that the recommendation is valid -- at least in Gapminder-like scatterplots, where data points vary significantly in quantity and you want to identify each data point as easily as possible.
__________________ -- Mr. Janne Pyykkö, Espoo, Finland, Europe

sfew
Moderator
Registered:1135986598 Posts: 803
Posted 1448903453
Reply with quote
#18
Janne, The terms that I mentioned were "additive vs. non -additive," not "in -additive." In English, "in-additive" wouldn't be recognized. Regarding GapMinder, Rosling often uses rates (i.e., non-additive variables) on the axes of his bubble plots, not because they are more appropriate than additive variables on scatter plot or bubble plot axes in general, but because they better serve his purpose when telling those particular stories. When you're comparing entities that vary significantly in size, such as countries, which vary significantly in population, variables that measure things like wealth, health, and education would be difficult to compare as sums because countries with small populations will always have tiny values compared to large countries. He therefore chooses rates, such as GDP per capita, average number of children per woman, or average years of education, to make it possible for the countries to be compared. Other stories, however, might require additive rather than non-additive variables on the axes. For example, imagine that Rosling wants to show the amount of fossil fuel production per country relative to the amount of debt. In this case, if he wants to show how much in total each country contributes to the world's fossil fuel production, he would use the sum rather than the per capita rate. In other words, if you want to derive a guideline from Rosling's choice of variables--additive vs. non-additive--the better guideline would be: choose the variable that best supports the understanding that you wish people to gain from the data. There is no general guideline regarding the superiority of additive vs. non-additive variables on the axes of scatter plots.
__________________ Stephen Few

jannepyykko
Registered:1185524236 Posts: 39
Posted 1448910247
Reply with quote
#19
How can I be so blind! Of course non -additive. All I can say, thanks for your patience Stephen. By the way, I tried to create the suggested scatterplot (fossil fuel production / amount of debt) to see if that gives some insight. The nearest available variables in Gapminder are: total oil production / total external debt (US$, not inflation-adjusted): available in this link . Unfortunately, only a few countries are displayed, because of lack of data (Saudi Arabia and United States not listed). It's possible that this graph would have told an interesting story, if more data were available. Why Hans Rosling only recommends non-additive axis variables in his videos, that I do not know either?
__________________ -- Mr. Janne Pyykkö, Espoo, Finland, Europe

sfew
Moderator
Registered:1135986598 Posts: 803
Posted 1448911078
Reply with quote
#20
Janne, Rosling uses rates in his displays because they provide the most useful comparisons for the particular stories that he's telling. He's making good choices.
__________________ Stephen Few