Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment  
sfew

Moderator
Registered:
Posts: 802
Reply with quote  #1 
A discussion began in my blog about the distinction between fit models (a.k.a., curve fitting) and smoothing techniques. I my experience, people tend to use these terms interchangeably. It is not clear to me that a clear distinction exists between the purpose and use of fit models and smoothing techniques. Perhaps they only differ at the algorithmic level, rather than in purpose and use. Fit models and smoothing techniques are both used to represent the essential pattern that exists in a data set. Part of the confusion that exists is that both terms are sometimes used to describe correlation patterns (e.g., a linear regression) and time-series patterns (e.g., a moving average), and software products often apply the same models to both situations, even though this does not always make sense. For example, most software products that will produce a "line of best fit" based on a linear regression to describe a correlation pattern in a scatter plot will apply the same models to time-series values in a line graph, even though a linear trend line is not an appropriate way to display overall trends in change through time. I'd like to see if we can clarify our use of these terms. If you have any insights that will help us clearly distinguish between fit models and smoothing techniques or between ways of display trends of change through time (e.g., a moving average) versus correlation patterns, I would appreciate your help.
__________________
Stephen Few
acraft

Registered:
Posts: 51
Reply with quote  #2 

If Daniel continues this discussion here, I'd like to see an example or screenshot of what he's talking about.  I keep thinking I understand what he's talking about, but then each new comment I read indicates that I'm still missing something.

I agree that there isn't really a distinction between fit models and smoothing techniques.  That is, the means might be different, but not the ends - "smoothing techniques" might mean manipulating each data point individually based on some algorithm versus plotting a best-fit curve, but the goal is the same.  So maybe Daniel is just concerned about the effectiveness of the technique - does it generally achieve the end purpose successfully? (of course this depends entirely on the chosen smoothing algorithm itself)

It could be that Daniel is talking about replacing the actuals with the "smoothed" data, which seems to imply a different purpose than plotting the actuals with a fit curve.  But hiding the actuals is just hiding the actuals - I think the goal is roughly the same.  If that is what he means, then the research topic should be "Determine the effectiveness of hiding the actual data when plotting fit curves or smoothed data."

So again, examples are welcome.

danz

Registered:
Posts: 181
Reply with quote  #3 
It is true that both technologies (especially curve fitting) can be applied also to bivariate analysis, yet I would like to keep this discussion related to time series analysis only. Data used for this comment was collected from Spotfire example: http://spotfire.tibco.com/demos/loess-smoothing?type=Interactive
raw_data.png 

1. Curve fitting (also known as data fitting or data modelling)

Curve fitting is the method of finding the parameters of a mathematical model function with a known expression and aspect which fits the best to a given set of data. They are several models extensively studied, a few of them being relatively easy to understand. They are used as models in analytic tools because each of them exhibits distinct behaviours which translates easy to our visual perception. I will mention only a few. Constant: y=a, values are mostly constant, Linear: y=a*x+b, values constantly increase or decrease, Exponential: y=a*exp(x+b), values increase in an accelerated way, Logarithmic: y=log(base, x), values increase in an decelerated way, Polynomial of 2nd degree, quadratic: y=a*x^2+b*x+c, values exhibits a possible peak and two distinct trends, Polynomial of 3rd degree, cubic: y=a*x^3+b*x^2+c*x+d, values exhibits a possible local minimum, a local maximum and three distinct trends.

The method of finding out the values of the parameters they fit the best to the studied set is called regression. This is why the resulting curves are often called regression curves. Regression algorithms are computational intense iterative routines, their variety and complexity are not subject of this discussion. In order to find the "best-fit", certain function called merit function is evaluated recursively with each iteration of the algorithm. The parameters are considered found when they do not vary significant anymore with new iterations. One of the methods used as merit function is to minimize the sum of absolute differences, often found in literature as least squares method.

We perform analysis via curve fitting when we try to see if our data has certain behaviours. This is why we choose several models with known aspect and interpretation to draw conclusions. A trivial unique linear trend line, even if it does not approximate well the analysed data it will show general tendency of data variation. For a more detailed trend, we will choose a higher degree of a polynomial which will show us different segments (see below). When we are in doubts, we can investigate linear, logarithmic and exponential models to get a general idea about the speed of variation (accelerated, constant, decelerated). The fitting model usually is the answer of our investigation. The similarity between model and raw data gives us answers. Curve fitting does not smooth data. Curve fitting helps the comparison of our data with a model with known behaviour from a given library. This is why they use to be displayed together. All the values of the raw data are used to compute the curve parameters. For a given model and a merit function it will be only one set of parameters, therefore just one resulting curve.
curve-fitting.png  

2. Data smoothing (also known as data filtering or noise removal)

There are many processes they have highly irregular aspect variation curves. These irregularities, perceived like variations around some local averages, make the interpretation of the patterns of change difficult in its original shape. The method of simplifying the variation aspect is called data smoothing, data filtering or noise removal. The replacement values can be considered a sort of average of surrounding values. The smoothing level is usually adjustable and is related to our perception regarding the curve aspect. The resulting set is not at all a known function, but a collection of connected smooth segments.  They are many methods used for data smoothing: moving average aka. rolling average, Savitzky–Golay filter, LOWESS (locally weighted scatterplot smoothing) and so on. Do not get confused by the term "scatterplot", LOWESS algorithm applies to time series as well.

We perform data smoothing to raw data when we need to clean the aspect of the curve on a level that it will help us further investigate its properties. It is, if you want, a preprocess operation. The result of data smoothing is not an answer by itself, because it does not give as result a function with known aspect. More than that, unlike curve fitting case they are no optimal results, because they are no metrics to measure that. Can anybody tell me if below smooth approximation has the correct level? Smoothing algorithms are build in a way they will consider only surrounding points to calculate the approximation of a given value. The result of a smoothing process is an approximation of original data, which is not always the case for curve fitting model.
smooth_data.png 

Several questions come in my mind in a random order. Does data smoothing bring any benefit to visual data interpretation? (I think so) Is data smoothing a possible replacement of actual data? (probably). What data qualifies for data smoothing? (not all data require smoothing, but how can we decide that). Because level of smoothness can be controlled, can we define sort of metric that would measure the similarity between smoothed data and raw data? (keep in mind that best approximation is the raw data itself, which makes no sense to minimize the differences). Do we still have to keep a minimum level of noise in smoothed data to keep visual analysis consistent and how do we measure that? Is it anything that helps us decide what is the ideal level of smoothness from which we start visual analysis process, other than our subjective opinion? 

A fitted model is an answer by itself, smoothed data is just a pre-analysis step. A fitted model should not be displayed independent to original data, because this technique always involves comparison. A smoothed data can be displayed alone as a replacement of original data. "Curve fitting" and "data smoothing" terms blend somewhere, probably on semantic level (I am not a native English speaker), so they are used interchangeably. They both look "smooth", but for different reasons. They are also implemented in several tools in a manner would suggest more similarities than should. From data analysis point of view, they belong to different chapters. They were both intended to help visual analysis. Yet they are distinct enough to require separate attention. 

As last picture, I overlapped both techniques in one chart with some descriptions. 

fit-vs-smooth.png

Daniel 

sfew

Moderator
Registered:
Posts: 802
Reply with quote  #4 

Daniel,

The first time that William Cleveland mentions a loess curve in his book The Elements of Graphing Data, he does so in the context of a scatterplot that displays the relationship between wind speed and ozone. He writes:

Quote:
Loess provides a graphical summary that helps our assessment of the dependence; now we can see the dependence of ozone on wind speed is nonlinear. One important property of loess is that it is quite flexible and can do a good job of following a very wide variety of patterns.

He goes on to say, “Loess, a method of smoothing data, is used to compute a curve summarizing the dependence of ozone on wind speed.” He describes loess in a manner that makes no clear distinction between smoothing and curve fitting. In fact, he interchangeably uses the terms “smoothing” and “fitting” in his books to describe the loess method. Overall, he describes loess as a way to summarize the essential pattern in a data set. When I fit a curve to a data set, whether it be linear, exponential, logarithmic, polynomial, or loess, I do it for the same reason: to summarize the essential pattern that exists.

In regards to a research study, I suspect that it would be impossible to test the effectiveness of smoothing in general. This is because smoothing and curve fitting are not clearly distinct from one another and because different methods have different strengths, limitations, and sometimes specific use cases (e.g., seasonal loess). At most, a specific method may be tested to determine its effectiveness for one or more specific purposes when applied to data sets that exhibit specific conditions. If you can suggest a specific case, I can add it to the list of proposed projects.


__________________
Stephen Few
danz

Registered:
Posts: 181
Reply with quote  #5 
Stephen,

I think that if the two technologies would have use more often alternative terms: data modeling and noise removal (a.k.a. digital filtering), it might be distinct enough not to be considered interchangeable. The fact that the two technologies became useful in visual data analysis should not blend them together under the "smooth" term. They were developed for different purposes, and they are considered different chapters in statistics.

One of your readers mentioned something about the danger of using smoothing technologies. Obviously both should be used with care, with more responsibility than trivial "try and see". Yet they are great aids for data sense making professionals.

As for possible research, I don't think I can make more clear my position. Is no particular case in my mind right know. When can I visually decide that certain time series require noise removal, what are the metrics in detecting this particular need and how far I can go in data approximation in a way it helps data interpretation and not alter it, they are questions I will, probably, not get easy answers soon. 

This whole discussion was triggered by the violin debate. Leave alone for now the symmetrical aspect of it. A violin is using a smoothing technique called kernel density to approximate the distribution of data. When I first learned about kernel density as a method of estimating the population distribution based on a sample of data I saw the advantages of smoothing techniques beyond time series or scatter plot modeling. A smoothing method does not approximate raw data to any existing, known model, it shapes it in a more readable format. Smoothing data goes beyond bi-variate analysis. I wish I know how far we can go with that from a visual perspective.

It is possible this subject is not of the highest interest for your readers, considering that it was almost a two sided conversation, yet I enjoyed it. I just hope that for your readers it raised enough questions to do their own investigations. 

Thank you,
Daniel
sfew

Moderator
Registered:
Posts: 802
Reply with quote  #6 
danz,

I'm sorry that we haven't been able to translate your interest in the use and effectiveness of smoothing techniques into a research proposal. I think there is value here, but it smoothing cannot be studied in general because it's meaning and purpose varies to some degree from technique to technique. Your assertion that curve fitting and smoothing were developed for distinct purposes isn't clearly in evidence. You mentioned that smoothing is for noise removal, but that is also true of curve fitting. Also, your statement that curve fitting is different it that it is used to compare data sets to known models doesn't hold true in all cases. Polynomial curve fitting is not based on a specific model. In bivariate data analysis, it just so happens that the relationship between two quantitative variables often exhibits particular shapes, especially linear, exponential, and logarithmic. For this reason, we have developed fit models to match those common shapes. We use them, however, for the same purposes as curve fitting and smoothing techniques that accommodate other shapes, such as polynomial and loess. Sometimes those techniques have particular strengths lead us to use them, such as the fact that loess is highly resistant to outliers. In all cases, these techniques are used to get a general, overall sense of the essential pattern in the data.

The violin plot, which prompted this discussion, is a good example of this. When it is used as originally intended (i.e., to display a kernel density estimation), it functions in some respects like moving averages for time series and loess for bivariate analysis in that it smooths out some of the variation that might be meaningless and unrepresentative of the essential pattern. Kernel density estimation is especially useful for eliminating some of the noise that plagues distribution displays that are based on bins, such as histograms and frequency polygons. Bin sizes--the somewhat arbitrary points where they are divided from one another--are notorious for introducing patterns that are noisy and, as such, fail to represent the essential pattern in the data set. Kernel density estimation takes all values into account rather than bins of values, which results in a smoother and often more useful summary of the essential pattern. For this reason, some people recommend the use of kernel density plots as a standard replacement for bin-based distribution displays. Perhaps this lends itself to a research study that would evaluate the effectiveness of kernel density plots versus frequency polygons for representing the underlying pattern of distribution in quantitative variables.

__________________
Stephen Few
duaneatat

Registered:
Posts: 1
Reply with quote  #7 
I agree with Daniel. I understand smoothing/noise-removal to be very different from curve fitting or regression. Smoothing is used to pre-process data - it doesn't fit a curve, or regress the data. Smoothing is a mapping from the original data set to a new data set, but regression can model places between the data points. They're often both used in conjunction.
sfew

Moderator
Registered:
Posts: 802
Reply with quote  #8 
Duane,

I have no doubt that the terms can be understood in these ways, but they are not consistently used by statisticians in these ways. For example, Loess is referred to as a smoothing technique, but it does not pre-process data. I fits a curve to the data.

__________________
Stephen Few
Previous Topic | Next Topic
Print
Reply

Quick Navigation: