Time series analysis: smoothing
A time series is a sequence of values varying over time. In this article, I’ll try to tell you about some simple but effective approaches to working with these sequences. There are many examples of this kind of data: currency quotations; sales; client requests; and data in different applied sciences, such as sociology, meteorology, geology, observations in physics, and many more.
Series are a popular and very important form of describing data, as they help to see the entire history of changes of the value under consideration. So we can assess the “typical” behavior of the value and deviation from this behavior.
I had to pick up a data set to be used to reveal the peculiarities of time series. I decided to use passenger traffic of the international airlines, as this data set is very demonstrative, and it has become a kind of standard (http://robjhyndman.com/tsdldata/data/airpass.dat, Time Series Data Library, R. J. Hyndman). The series describes the number of passengers of international airlines per month (measured in thousands) for the period from 1949 to 1960.
My Prognoz Platform is always at hand. It has a smart tool called Time Series Analysis, so I’ll use it for my research. Before importing data to the file, I have to add a column with a date to bind values to the time, and a column with the series name for each observation. Below, you can see what my source file looks like. I’ve imported it into the Prognoz Platform right from the Time Series Analysis tool.
The first thing you have to do with a time series is to display it on the chart. In the Prognoz Platform, you can build a chart just by dragging series to your workbook.
The letter M after the name of the series means monthly dynamics; that is, the interval between observations is one month.
The chart shows that the series demonstrates two features:
- Trend. The chart illustrates a long-term growth of the observable values. You can see the trend is almost linear.
- Seasonality. The chart shows periodic fluctuations of the value. In my next article related to time series, I will tell you how to calculate the period.
Our series looks quite nice; but very often there are series that, in addition to the two characteristics above, feature one more thing: the noise, or the random variations in one form or another. See the example of these series on the chart below. It is a sinusoidal signal mixed with a random variable.
When analyzing time series, you need to identify their structure and estimate all major components: namely, the trend, seasonality, noise, and other features. You need to have the ability to forecast changes of values in future periods.
When you work with series, you may notice that noise often hinders the analysis of the series structure. To exclude its impact and see the structure of the series better, you can use series smoothing methods.
The simplest method of smoothing series is moving average. The idea is that, for any odd number of data points in the sequence, you replace the central data point by the arithmetic mean of the other data points:
where xi is the given series, and si is the smoothed series.
Below, you can see the result of applying this algorithm to two of our series. By default, the Prognoz Platform suggests using smoothing with five data points in the window (k equals 2 in our formula above). Notice that the smoothed signal is not so much exposed to the noise; however, it is obvious that with the noise you lose some useful information on series dynamics. You can see that the smoothed series has no first and last k points. This is because the smoothing is performed for the central data point of the window (the third data point, in our case); after that, the window is shifted by one data point, and calculations are repeated. For the second series with random noise, I used the smoothing with a window of 30 data points in order to identify the structure of the series better, because there are many data points in the series.
The moving average method has certain disadvantages:
- The moving average is not effective in calculation. You have to recalculate the average for each data point. And you can’t reuse the result calculated for the previous data point.
- You can’t extend the moving average to the first and last data points of the series. It may cause a problem if we need exactly these data points.
- The moving average is not defined outside the series, so it can’t be used for forecasting.
Exponential smoothing is an advanced method of smoothing that may be used for forecasting as well. Sometimes it is called the Holt-Winters method after the names of its authors.
There are several varieties of this method:
- Simple exponential smoothing for series with no trend and seasonality
- Double exponential smoothing for series with a trend and no seasonality
- Triple exponential smoothing for series with both trend and seasonality
Exponential smoothing computes the values of the smoothed series by updating the values produced in the previous step using the information from the current step. Information from the previous and current steps has different weights that can be managed.
The simplest form of exponential smoothing is given by the formula:
The parameter α defines the relation between the non-smoothed value in the current step and the smoothed value from the previous step. When α = 1, we’ll take only data points from the given series; which is to say, there’ll be no smoothing. When α = 0, we’ll take only smoothed values from the previous steps; the series will become a constant.
To understand where exponential smoothing gets its name, we need to expand the equation recursively:
The equation shows that all previous values of the series contribute to the current smoothed value; however, this contribution fades exponentially due to the growth of the power of the parameter α.
But if the data has a trend, simple smoothing will not keep up with it. Otherwise, you have to take α values close to 1, but the smoothing will be insufficient. You have to use double exponential smoothing.
Double exponential smoothing has two equations. The first one assesses the trend as the difference between the current and the previous smoothed values and then smooths the trend using simple smoothing. The second equation performs simple smoothing, but the second summand uses the sum of the previous smoothed value and the trend.
Triple exponential smoothing includes one more component—seasonality—and uses one more equation. There are two types of seasonal patterns: additive and multiplicative. In the first case, the amplitude of the seasonal component is constant, so it does not depend on the base amplitude of the series over time. In the second case, the amplitude changes together with the change of the base amplitude of the series. And according to the chart, this is exactly our case. As the series grows, the amplitude of the seasonal fluctuations increases, too.
Since the first series has both trend and seasonality, I’ve decided to pick up the parameters of triple exponential smoothing for it. With the Prognoz Platform, it’s quite easy. Once the value of the parameter is updated, the program redraws the chart of the smoothed series immediately, so you can see how well it describes our initial series. I’ve picked up the following values:
In the next article on time series, I’ll tell you how to calculate the period.
As initial approximations, we typically take the values between 0.2 and 0.4. The Prognoz Platform also uses a model with an additional parameter ɸ, which damps the trend in such a way that it approaches the constant in the future. For ɸ, I’ve taken the value 1, which corresponds to the simple model.
I also used this method to forecast values of the series for the last two years. In the figure below, I’ve drawn a vertical line to mark the starting point of the forecast. As you can see, the given series and the smoothed series match fairly well, including in the forecast period. Not bad for such a simple method!
The Prognoz Platform helps automatically pick up the optimal values for parameters via searching systematically among parameter values and minimizing the sum of squared deviations of the smoothed series from the given series.
All these methods are quite simple and easy to use, and they serve as a starting point for structure analysis and time series forecasting.
Read more about time series in our next article.