Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment   Page 2 of 3      Prev   1   2   3   Next
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #16 
Dan -- Adding separate trend lines to specific time periods does not reveal a time series. Potentially, it reveals patterns of correlation during specific time periods, but not how the values changed from year to year. The purpose of a connected scatterplot of paired time series, according to the research paper, is to show patterns of change through time for two related quantitative variables. The patterns that you revealed in your scatterplot with color-coded time periods demonstrated the usefulness of a scatterplot combined with the ability to distinguish time periods, not the usefulness of a connected scatterplot. Had you connected the points with a line to show the chronological sequence, it would not have worked nearly as well.
__________________
Stephen Few
danz

Registered:
Posts: 186
Reply with quote  #17 
Yes, Steve. This is what I want to prove by redesigning Janne chart. With the right interpretation, several trend lines will show much better the way correlation changes in different intervals of time, not how the two variables change in time. Two lines chart showing the variation in time of the variables would not reveal the correlation as good as a scatterplot enriched with trend lines.

The temptation of overcomplicating charts is a common practice among many designers. Connected scatterplot looks to me to be one of the possibilities. I don't deny that sometimes connecting points might be helpful, but there are better ways to enrich a scatterplot. My example is just one of them. An effective design is usually the result of data sensemaking rather than just applying arbitrary methods.
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #18 
Well said, Dan.
__________________
Stephen Few
jlbriggs

Registered:
Posts: 194
Reply with quote  #19 
Janne, I have to say that after reading all of your explanations, and viewing all of your graphics, I don't see how any of the benefits you talk about actually exist.

What I am seeing is that you have been able to identify information outside of the data plotted that helps to explain some of the patterns that exist in the connected scatter plots - but not that the connected scatter plot is what brought the question(s) to light.

While I won't say that the technique is entirely invalid, it's essential to note that every single point of interest that you mention is very readily apparent in a line chart (to my eyes, more so), and would invite the same questions, which would result in the same answers.

And the line chart would make the change over time much more straightforward and clear, of course.

I don't see how this results in the conclusion you've reached.

 




jannepyykko

Registered:
Posts: 39
Reply with quote  #20 
jlbriggs, thanks for the comment. I'm curios how do you do the following task without a connected scatterplot:

1. Download data of France from Gapminder (life expectancy, children per woman), years 1840-2015 is enough.
2. The first years of this data tell story about agricultural France, where both variables vary within a certain range without a development towards longer lives and smaller families. Display visually what is your method to find the range for both variables (exclude individual outliers, if they exist during that time).
3. Find a way to define what is the year, when the development starts towards longer lives and smaller families. Display how you read it from the graph that you are using.

I'm not saying this is not possible. I'm only curious to see your way step-by-step, so that we can compare methods.

__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #21 
Janne,

There really isn't a need for anyone to go to this trouble. It appears that only you have the opinion that line graphs wouldn't clearly show this particular change in these two variables. If you take the time to build the line graphs, I believe that you will see for yourself that changes in the slopes of the lines clearly show the point in time when people in Finland began to live longer and have smaller families.

__________________
Stephen Few
acraft

Registered:
Posts: 51
Reply with quote  #22 
@Janne - Sorry for the late response.

Quote:
I don't want to throw away the tangled mess, because it carries an important information.

You missed my point.  Connecting the dots in that tangled mess provides no value whatsoever, as you cannot follow any trend, you can only recognize that both variables are fluctuating.  You can see that (as well as examine trends) more clearly using a line chart.

Quote:
During years 1840-1909 Finland was an agricultural country, where family size averaged in 4.5-5.2 children and life expectancy varied between 33 and 47 years. In the big picture, there is no need to analyze micro movements within those years. As a person interested in history, I can summarize it by saying: It was just normal life for that era.

Was it?  From the CS plot there's no way to tell.  Maybe there are trends.  Maybe there are loops.  Maybe there are important changes happening there, but we can't see them in a CS plot because it just looks like a ball of yarn.

Quote:
In 1910 modernization (better health care) overtook Finland enough to be seen in the connected scatterplot. Since then, there was no way back to the "old normal". A new normal was a long development towards longer lives and smaller families.

Is that what you see in the scatterplot?  Or did the fertility rate start to decline in a way that moved the subsequent datapoints to a less-crowded area of the chart?  The line charts show that life expectancy had a general upward trend for some time prior to 1910, but its fluctuations combined with those in the fertility rate made it impossible to see this on the CS plot.  Furthermore, had fertility rate not decreased from 1910 to WWII, you'd just be looking at a bigger ball of yarn, unable to see any trends in life expectancy (which are clearly there).

My point is that following time-based trends in a connected scatterplot is really hard when the data fluctuates.  That's because it isn't what a scatterplot is meant to do.  The axes (X and Y) represent variables that are not time, therefore even a connected scatterplot is a scatterplot first, and a timeline (maybe) second.  If you are analyzing the relationship of the two variables, then a scatterplot is what you need, datapoints connected or not.  If you are trying to follow a timeline - even to compare relationships between multiple variables over time - then time should be one of your axes, which means you should be using a line chart.
jlbriggs

Registered:
Posts: 194
Reply with quote  #23 
While I fully agree with Stephen that this is an unnecessary exercise, I'll play along for the sake of illustration.

Using the data from Gapminder, as suggested, for France, from 1800-2015, for 'life expectancy' and 'number of children per woman'.

A set of basic line charts thrown together in Excel:

france_1800-2015.png 

It is clear that there is a general trend of people living increasingly longer, with increasingly smaller family sizes.

It is clear that there are some significant interruptions to this general pattern.

It is quite obvious at first glance that WWI and WWII see significant drops in both life expectancy and children per mother.
A quick look at Wikipedia explains that the 1870/1871 can be explained by the Franco-Prussian war.

Around 1890 we seem to see a move to a somewhat more steadily increasing life expectancy, until the war.
Following WWII, we have a modest uptick of life expectancy, beyond the return to normal, and the upward trend continues, modestly.
Following WWII, we have a significant increase in the number of children per mother, as expected, followed by a gradual reduction, and what may be a settling in at around 2.

These are things that are extremely readily apparent, more or less at a glance, from these two line charts.
I spent more time writing the first sentence than I spent understanding the points I've laid out.

In comparison, with a connected scatter plot:

france_1800-2015_csp.png 

There are some significant aberrations that I can begin to explore to attempt to eventually figure out, and I can obviously annotate certain points to call out significant events, and I can piece together much of what is already clear and obvious in the line charts. But I can also annotate the line charts and enrich the information and the story there, so that's a null point.

I can't possibly see any argument that truthfully says "the scatter plot more clearly displays the trends over time", or "the scatter plot gives a more complete representation of the data" by looking at the above example.


jannepyykko

Registered:
Posts: 39
Reply with quote  #24 
jlbriggs, I appreciate the time that you took to do the exercise and write out the observations that you made by looking the line graphs about life expectancy and children per woman in France. I agree that the observations are the same that I would have done.

Regarding the connected scatterplot that you included, I modified it a bit.

[23557059614_4d3650c57f_b] 

I drew a red box around the "tangled mess" to show more clearly where France's "old normal" locates -- in fact two boxes, because with this precision at hand, there are two possibilities to draw. Then, if this image were an online graph, I could use a tool tip to check that the first year outside the inner box is 1885 (outer box 1887).

You wrote: "Around 1890 we seem to see a move to a somewhat more steadily increasing life expectancy, until the war". Based on the connected scatterplot, the change started 1885-1887 -- though rather with reduced family size, not with increased life expectancy.

I tried to showcase with this example that there are questions that are quickly and more precisely answered by using a connected scatterplot, such as the moment when some 2-dimensional "normal limits" are overrun.

I use line graphs probably 20-50 times more often than connected scatterplots, because line graphs are more clear and more understandable for the public -- and also more understandable for myself, when I start to dig into new data. However, every now and then, I also like to see how data looks in a connected scatterplot, because some phenomenon become more prominent and that might raise new interesting questions (and provide answers).

I hope my reasoning makes it clear, how and why I use connected scatterplots in my work. It's not a big part, but not a nonexistent part either. This is not about a rivalry between line graphs and connected scatterplots. I just want to keep both in my toolset now and tomorrow.

During this debate I have became aware that perhaps my point (expressed in the paragraphs above) has been somehow unclear and that might have increased the length of conversation needlessly. If that is the case, I apologize and try to be more precise in future.

__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
jannepyykko

Registered:
Posts: 39
Reply with quote  #25 
acraft,

Thanks for the response to my previous response, where I tried to say that "balls of yarn" are important pieces of information as well. It's true that when saying this, I might be generalizing too much, because I've stopped looking inside the ball, where interesting fluctuations may occur. When watching Finland's course 1840-2015 in the big picture, however, I'm not interested in those micro fluctuations.

On the other hand, if my objective were to learn more about years 1840-1910, then of course I would:
- zoom inside the ball in order to explore what happened there
- also use line graphs for those years
..whichever seems to suit best.

As said in my previous response to jlbriggs, for me this is not a rivalry between line graphs and connected scatterplots (though I might have given a wrong impression on that). I use both.

__________________
-- Mr. Janne Pyykkö, Espoo, Finland, Europe
jlbriggs

Registered:
Posts: 194
Reply with quote  #26 
Janne - "to show more clearly where France's "old normal" locates"

I have to continue to disagree, it seems.

You have highlighted an area where the numbers moved around in similar spaces for a few years (out of 215). I don't think that can be called in any way the "old normal". Both numbers were trending in a direction, had a few years of variation, and then continued on their trend.

I can see how using the connected scatter plot in conjunction with a line chart could help enhance understanding - the conversation has been enlightening in that sense.

But it seems quite easy to draw conclusions from the scatter plot that don't hold up, and rather difficult to find ones that jump out at you when viewing the line chart.


jlbriggs

Registered:
Posts: 194
Reply with quote  #27 
To be more clear in my point: concluding that 1885, or 1887, is the year that things began to change from some sort of "old normal" state to the larger trend is an incorrect conclusion, as enticing as the scatter plot makes it.

A quick look at the line charts makes it clear why this conclusion is incorrect.

The area that you hail as the "normal" is really just a period of low or no correlation between variables.
sfew

Moderator
Registered:
Posts: 814
Reply with quote  #28 
Bilal,

As promised, I spent some time examining the data that appears in Moritz Stefaner's chart. I downloaded the data from Gapminder.org, as Moritiz did. To start, I recreated Mortitz's scatterplot and found that there are several significant differences when my version is compared to his. Had the values in Moritz's chart matched mine, his chart would not be as readable as it is because it would exhibit more overlapping. I have no idea why our charts are different despite the fact that we used the same data source.

If we ignore potential problems with the data, the one benefit of Moritz's scatterplot over other approaches is its compactness. Every approach that would present the data in more understandable ways would require more space. If I were telling the stories contained in this data set, I would display the data in multiple ways. I might begin with an animated scatterplot as Rosling often does to point out a few key observations. I would then immediately shift to line graphs. The first set might consist of a series of small multiples of line graphs featuring life expectancy only and I would point out some of the interesting patterns of change. I might then add a second Y axis to these graphs to display fertility rate and point out some of the interesting patterns of change and relationships between the two variables. Finally, I might switch the scales for both variables to show rate of change relative to the first year. This would give both variables the same scale and would make patterns of change across the full set of countries much easier to compare.

While playing with the data, I spent time comparing particular countries as they were represented in the connected scatterplot vs. line graphs. I continued to find that patterns of change that were incredible easy to see and compare using line graphs required mental gymnastics to understand when viewing the scatterplot. I also found myself questioning the value of viewing relationships between life expectancy and fertility rate. There is no causal relationship between them. What we know is that, as conditions in countries improve over time, in general life expectancy goes up and fertility rate goes down. Comparing these patterns of change does not lead to much insight.

__________________
Stephen Few
acraft

Registered:
Posts: 51
Reply with quote  #29 
@Janne:
Quote:
...whichever seems to suit best.
Quote:
...for me this is not a rivalry between line graphs and connected scatterplots... I use both.

I think that's the issue here, is whichever seems to suit best.

When you plot data, it's important to choose the right variables for your axes.  You can't clearly see a relationship between a variable that has a scale and a variable that doesn't.  Connected scatterplots try to show time without a scale, so trying to examine a relationship between anything and time is really difficult, and often impossible.  Therefore, connected scatterplots are poorly suited to showing relationships with time.

This isn't a rivalry between line graphs and connected scatterplots for me either.  Your goals for examining data should dictate how you visualize it.  Seeing how anything relates to time requires that time be one of the axes; it requires a line chart.
kris_erickson

Registered:
Posts: 3
Reply with quote  #30 
I have found an example where a CS could have been used but instead the author decided to just show the dots and have some light annotation on the chart.  It is from NPR and shows the change in income inequality over time.  I cannot see how adding the lines between the years would make it better.  Instead the author put two directional arrows to guide the eyes to see the change in time.  Would a connected scatterplot have shown the 'right angle' better? Not any better than the two lines would have.  Would a connected scatterplot may have shown the recent financial crisis better? Possibly, it may have shown loops or 'something' with the recent financial crisis.  Overall though I think a CS would have reduced the overall message the author was trying to show, that there has been an abrupt change since the early 80s.  
Unconnected_Scatterplot_NPR.PNG 
http://www.npr.org/sections/money/2015/02/11/384988128/the-fall-and-rise-of-u-s-inequality-in-2-graphs

I don't see the usefulness of plotting a CS for a single series.  As seen in Gap-Minder or Moritz Stefan type-images there could be some usefulness in showing a trend of one series among many.  In this chart, the author has a connected scatterplot, but also has annotations and for several 'spikes' and does not expect the user to instantly see / know what the unique shapes actually mean.

Connected_Scatterplot_Nelson.PNG 
https://public.tableau.com/shared/K8Y69QW6W?:display_count=yes

I think when our brain sees a certain 'shape' it attempts to find something familiar about it.  In this 'spike' above I kind of see a thorn / sharp point  / waterdrip , but that mental image has no relation to the information.  



Previous Topic | Next Topic
Print
Reply

Quick Navigation: