This tutorial provides an example of a catalog company, which is interested in forecasting monthly sales of its men's clothing line, based on 10 years of their sales data.
In the Forecast bandwidth utilization tutorial, you learned how the Expert Modeler can decide which is the most appropriate model for your time series. Now, it's time to take a closer look at the two methods that are available when you choose a model: exponential smoothing and ARIMA.
- Does the series have an overall trend? If so, does the trend appear constant or does it appear to be dying out with time?
- Does the series show seasonality? If so, do the seasonal fluctuations seem to grow with time or do they appear constant over successive periods?
Try the tutorial
In this tutorial, you will complete these tasks:
Sample modeler flow and data set
This tutorial uses the Forecasting Catalog Sales flow in the sample project. The data file used is catalog_seasfac.csv. The following image shows the sample modeler flow.
Task 1: Open the sample project
The sample project contains several data sets and sample modeler flows. If you don't already have the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample project:
- In Cloud Pak for Data, from the Navigation menu , choose Projects > View all Projects.
- Click SPSS Modeler Project.
- Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample modeler flow associated with this tutorial.
Task 2: Examine the Data Asset and Type nodes
Forecasting Catalog Sales includes several nodes. Follow these steps to examine the Data Asset and Type nodes:
- From the Assets tab, open the Forecasting Catalog Sales modeler flow, and wait for the canvas to load.
- Double-click the catalog_seasfac.csv node. This node is a Data Asset node that points to the catalog_seasfac.csv file in the project.
- Review the File format properties.
- Optional: Click Preview data to see the full data set.
- Double-click the Type node.
- Click Read Values.
- For the men field, verify that the role is set to Target.
- Verify that all other fields have the role set to None.
- Click Save.
- Optional: Click Preview data to see the filtered data set.
Check your progress
The following image shows the Type node. You are now ready to visualize the data.
Task 3: Visualize the data
Follow these steps to use a Time plot node to visualize the data:
- Add a Time plot node:
- In the node palette, expand the Graphs section.
- Drag the Time plot node onto the canvas.
- Connect the Type node to the new Time plot node.
- Double-click the Time plot node to set its properties.
-
- In the Series section, click Add columns.
- Select the men field.
- Click OK.
- Select Use custom x axis field label.
- For the X axis label, select date.
- Clear the Normalize option.
- Click Save.
- Hover over the [men] v. date node, and click the Run icon .
- In the Outputs and models pane, click the results with the name [men] v. date to
view the graph.
The series shows a general upward trend; that is, the series values tend to increase over time. The upward trend is seemingly constant, which indicates a linear trend.
The series also has a distinct seasonal pattern with annual highs in December, as indicated by the vertical lines on the graph. The seasonal variations appear to grow with the upward series trend, which suggests multiplicative rather than additive seasonality.
Now that you've identified the characteristics of the series, you're ready to try modeling it. The exponential smoothing method is useful for forecasting series that exhibit trend, seasonality, or both. As previously seen, this data exhibits both characteristics.
Check your progress
The following image shows a graph. You are now ready to build the model.
Task 4: Build the model
Building a best-fit exponential smoothing model involves determining the model type (whether the model needs to include trend, seasonality, or both) and then obtaining the best-fit parameters for the chosen model.
The plot of men's clothing sales over time suggested a model with both a linear trend component and a multiplicative seasonality component. This implies a Winters' model. First, however, you explore a simple model (no trend and no seasonality) and then a Holt's model (incorporates linear trend but no seasonality). This will give you practice in identifying when a model is not a good fit to the data, an essential skill in successful model building.
Follow these steps to build a simple exponential smoothing model:
- Double-click the Men (Time Series) node to view its properties.
- Expand the Observations and time interval section, and set these properties:
- Verify that the Time/date is set to date.
- Verify that the Time Interval is set to Months.
- Expand the Build options - general section, and set these properties:
- Verify that the Method is set to Exponential Smoothing.
- Verify that the Model Type is set to Simple.
- Click Save.
- Click Run all .
- In the Outputs and models pane, click the output results with the name Time plot of
[men $TS-men] v. date to view the graph.The men plot represents the actual data, while $TS-men denotes the time series model.
Although the simple model does, in fact, exhibit a gradual (and rather ponderous) upward trend, it takes no account of seasonality. You can safely reject this model.
Now try a Holt's linear model. This should at least model the trend better than the simple model, although it is also unlikely to capture the seasonality.
- Double-click the Men (Time Series) node and set these properties:
- Expand the Build options - general section.
- Set the Model Type to Holt's linear trend.
- Click Save.
- Click Run all .
- In the Outputs and models pane, click the output results with the name Time plot of
[men $TS-men] v. date to view the graph.
Holt's model displays a smoother upward trend than the simple model, but it still takes no account of the seasonality, so you can disregard this one too.
You may recall that the initial plot of men's clothing sales over time suggested a model incorporating a linear trend and multiplicative seasonality. A more suitable candidate, therefore, might be Winters' model.
- Double-click the Men (Time Series) node and set these properties:
- Expand the Build options - general section.
- Set the Model Type to Winters' multiplicative.
- Click Save.
- Click Run all .
- In the Outputs and models pane, click the output results with the name Time plot of
[men $TS-men] v. date to view the graph.
This looks better. The model reflects both the trend and the seasonality of the data. The dataset covers a period of 10 years and includes 10 seasonal peaks occurring in December of each year. The 10 peaks present in the predicted results match up well with the 10 annual peaks in the real data.
However, the results also underscore the limitations of the Exponential Smoothing procedure. Looking at both the upward and downward spikes, there is significant structure that's not accounted for.
If you're primarily interested in modeling a long-term trend with seasonal variation, then exponential smoothing may be a good choice. To model a more complex structure such as this one, you need to consider using the ARIMA procedure.
Check your progress
The following image shows the flow. You are now ready to build an ARIMA model.
Task 5: Build an ARIMA model
With the ARIMA procedure, you can create an autoregressive integrated moving-average (ARIMA) model that is suitable for finely tuned modeling of time series.
ARIMA models provide more sophisticated methods for modeling trend and seasonal components than do exponential smoothing models, and they have the added benefit of being able to include predictor variables in the model.
Continuing the example of the catalog company that wants to develop a forecasting model, you have seen how the company has collected data on monthly sales of men's clothing along with several series that might be used to explain some of the variation in sales. Possible predictors include the number of catalogs mailed and the number of pages in the catalog, the number of phone lines open for ordering, the amount spent on print advertising, and the number of customer service representatives.
Are any of these predictors useful for forecasting? Is a model with predictors really better than one without? Using the ARIMA procedure, you can create a forecasting model with predictors, and see if there's a significant difference in predictive ability over the exponential smoothing model with no predictors.
With the ARIMA method, you can fine-tune the model by specifying orders of autoregression, differencing, and moving average, along with seasonal counterparts to these components. Determining the best values for these components manually can be a time-consuming process involving a good deal of trial and error so, for this example, you specify for the Expert Modeler to choose an ARIMA model for you.
Next, you build a better model by treating some of the other variables in the dataset as
predictor variables. The ones that seem most useful to include as predictors are the number of
catalogs mailed (mail
), the number of pages in the catalog (page
),
the number of phone lines open for ordering (phone
), the amount spent on print
advertising (print
), and the number of customer service representatives
(service
).
Follow these steps to build an ARIMA model:
- Double-click the Type node to set its properties.
- Verify that the role for mail, page, phone, print, and service fields are set to Input.
- Verify that the role for men is set to Target .
- Set the Role for all of the remaining fields to None.
- Click Save.
- Double-click the Men (Time Series) node and set these properties:
- Expand the Build options - general section.
- Set the Method to Expert Modeler.
- Set the Model Type to ARIMA models only.
- Select the Expert Modeler considers seasonal models option.
- Click Save.
- Click Run all .
- In the Outputs and models pane, click the model with the name men to view the
model details.
- On the Models page, click men in the Target column.
- Click the Model Information page. Notice how the Expert Modeler has chosen only two of the five specified predictors as being significant to the model.
- Close the two model windows.
- In the Outputs and models pane, click the output results with the name Time plot of
[men $TS-men] v. date to view the graph.
This model improves on the previous one by capturing the large downward spike as well, making it the best fit so far.
Next, you can refine the model even further, but any improvements from this point on are likely to be minimal. You've established that the ARIMA model with predictors is preferable, so use this model to forecast sales for the coming year.
- Close the graph window.
- Double-click the Men (Time Series) node and set these properties:
- Expand the Model options section.
- Select the Extend records into the future option, and set the value to
12
. - Select the Compute future values of inputs option.
- Click Save.
- Click Run all .
- In the Outputs and models pane, click the output results with the name Time plot of
[men $TS-men] v. date to view the graph.
The forecast looks good. As expected, there's a return to normal sales levels following the December peak, and a steady upward trend in the second half of the year, with sales in general better than those for the previous year.
Check your progress
The following image shows a graph using the ARIMA model.
Summary
You've successfully modeled a complex time series, incorporating not only an upward trend but also seasonal and other variations. You've also seen how, through trial and error, you can get closer and closer to an accurate model, which you can then use to forecast future sales.
In practice, you would need to reapply the model as your actual sales data are updated; for example, every month or every quarter, and produce updated forecasts.
Next steps
You are now ready to try other SPSS® Modeler tutorials.