Registered: 1182202224 Posts: 97
Reply with quote #1
For the January/February/March 2015 Visual Business Intelligence Newsletter article, titled
Displaying Missing Values and Incomplete Periods in Time Series
, Stephen describes two relatively simple challenges in time series displays that are frequently mishandled. He shows the ineffective solutions that are often created (sometimes because of poor defaults in the software), and highlights more effective alternatives.
What are your thoughts about the article? Are you aware of other good solutions? We invite you to post your comments here.
Registered: 1423687128 Posts: 2
Reply with quote #2
Good article. One point I disagree with:
"If you use Excel to produce this graph, the single data point would not appear unless you enabled data points for all the values." I prefer to plot a marker at each data point, so I can see how every point contributes to the line. A missing value isn't as obvious as if there is a gap, or even a dashed line, but it still shows a larger space between values.
Registered: 1348995178 Posts: 182
Reply with quote #3
I do agree that the two matters, missing values and incomplete periods worth a special attention. Here they are my thoughts about them.
About Missing Values Missing values usually occur as a result of a database or a pivot aggregation. Unfortunately it is very often the time sequence is not considered by designer and irregular intervals are used to display "time series" graphs. Software that provides graphical support usually has to deal with missing values providing a correct time series sequence. A special case can be considered for generation of time sequence when certain periods are always excluded from analysis (weekends for instance). But in general, f rom my experience, a missing value in a time sequence can be interpreted in three ways: as NULL (empty), as ZERO or as previous value. Skipping the null values by connecting directly the points of non null values is a mistake. NULL example: Amount of work accidents in a company per day. Zero means no accidents in a working day, but NULL means no activity in a certain day (national holiday for instance). For usual representation a discontinued graph combined with markers for isolated values is my personal choice as well. ZERO example: Sales volume on detail level. In most of cases when we drill down into product level detail they are periods where no sales occur for certain products. No sale means zero sales. In this case a continuous line even with zero values has sense. More than that, ZERO can be used in statistics, mathematical modeling for regression or forecast, but NULL (or NaN) cannot. Sometimes, to distinguish between NULL or ZERO meaning I do the following exercise: what formula makes more sense for an average? sum/existing_count or sum/total_count? If the answer is first, then I go for NULL meaning, otherwise for ZERO meaning. Previous value example: Cumulative sales per period (week, month, year). Unless unusual things happen, such a graph is usually monotone ascending. Obviously missing values for certain interval do not break the graph. Missing sales days have the same meaning as zero sales, so after 10 days I still have a total of the previous 9 days even if no sale occur in the 10th day. No sales periods will be noticed in graph by horizontal lines. If is a must, extra markers can be drawn for not NULL values or, if is easier, special markers can be drawn for missing values only.
About Incomplete Periods. It is no way a graph will know in advance that summary for last year was based only on first 10 months instead of 12. It is the full responsibility of the analyst, to interpret correctly the values and design a graph. A projected value, as Stephen suggested, requires data sense-making. While simple statistics might be enough, they are cases where complex factors can make the projected value not so easy predictable. Sometimes it has sense to design the graph as below. This way, the two line charts show the evolution of the measure in equivalent contexts, without making any predictive assumption.
Registered: 1135986598 Posts: 803
Reply with quote #4
You don't actually seem to be disagreeing with anything that I wrote. If I understand you correctly, you're saying that you prefer to always show the data points along the line. If so, data points along the line are usually not needed. They add no real value. They are only useful when the person reading the graph compares values across multiple lines at a particular point in time because the data points show exactly where along the lines to make the comparison. Otherwise, we use lines to see and compare patterns in data, and the data points aren't needed for this and should be eliminated because they create unnecessary clutter. __________________ Stephen Few
Registered: 1282229480 Posts: 191
Reply with quote #5
In the majority of cases, I find data points displayed on a line to be nothing but a distraction.
Registered: 1345819952 Posts: 15
Reply with quote #6
I like your article. Your solutions are very straight-forward. Your point about having to show all markers in Excel, however, got my attention. It is indeed possible to turn on just individual markers, which solves the issue at hand. I've attached a sample file. The way I got there is by first creating the graph without markers, then clicking on any one data point, which marks them all, and then only marking the one invisible data point I want to enable. Then you can format it so it turns on (Marker Options - Built in; Fill Color - your choice). Hope this helps! Best, Matthias Attached Images
Registered: 1135986598 Posts: 803
Reply with quote #7
Thanks for pointing out this omission in my article. While it is possible with some effort, however, to turn on individual data points in an Excel line graph, this shouldn't be necessary. It would be useful if Excel automatically turned on a data point in a line graph when a value has no adjacent values. Much of Excel's value is that it allows us to do with additional effort what some products don't allow us to do with any amount of effort. We often applaud Excel, not because it is well designed, but because it can be hacked if you've spent enough hours to learn workarounds for its many deficiencies. __________________ Stephen Few
Registered: 1345819952 Posts: 15
Reply with quote #8
"It would be useful if Excel automatically turned on a data point in a line graph when a value has no adjacent values. "
--> Agreed. I looked for an opportunity to submit this feedback on the Microsoft site. This option is either well-hidden or does not exist, sadly.
Registered: 1142997573 Posts: 36
Reply with quote #9
As a tools developer, I just want to say thanks to Steve for writing these kinds of articles. It reminded me that we were not making it easy to show these kinds of missing-data gaps in
JMP's Graph Builder, and I've already made improvements for our next release.
Registered: 1435777941 Posts: 2
Reply with quote #10
We enable one to toggle from "row based" to "time based" trends as in some of our customers' data sets you may have 2-3 years of time with only a few periods of some hours of actual data values with irregular intervals with values every second. In row based the data is displayed for only those rows with values which looks like a continuous concatenated series. In time based, you get the 2 year view with little patches of "squiggles" of the data values when they occur. Both are helpful. Missing values on steroids!
__________________ Thanks, Carl
Registered: 1411394839 Posts: 1
Reply with quote #11
Originally Posted by
jlbriggs In the majority of cases, I find data points displayed on a line to be nothing but a distraction. Not wanting to drag this off topic, I agree. We only include data point markers when the fact that a data point exists is significant in its own right. An example of this would be when lab or environmental samples have to be taken at specific time intervals. If a sample has not been recorded at the allotted time then its absence must be conspicuous.