This tutorial builds a logistic regression model, which is a statistical technique for classifying records based on values of input fields. It is analogous to linear regression, but takes a categorical target field instead of a numeric one.
For example, suppose that a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, you can customize offers for individual prospective customers.
Try the tutorial
In this tutorial, you will complete these tasks:
Sample modeler flow and data set
This tutorial uses the Classifying Telecommunications Customer flow in the sample project. The data file used is telco.csv. The following image shows the sample modeler flow.
The following image shows the data set used with this modeler flow.
custcat
has four possible values that correspond to the four customer groups, as
follows:
Value | Label |
---|---|
1 | Basic Service |
2 | E-Service |
3 | Plus Service |
4 | Total Service |
Because the target has multiple categories, a multinomial model is used. If the target has two distinct categories, such as yes/no, true/false, or churn/don't churn, a binomial model might be created instead.
Task 1: Open the sample project
The sample project contains several data sets and sample modeler flows. If you don't already have the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample project:
- In Cloud Pak for Data, from the Navigation menu , choose Projects > View all Projects.
- Click SPSS Modeler Project.
- Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample modeler flow associated with this tutorial.
Task 2: Examine the Data Asset, Type and Filter nodes
Classifying Telecommunication Customers modeler flow includes several nodes. Follow these steps to examine three of the nodes:
- From the Assets tab, open the Classifying Telecommunication Customers modeler flow, and wait for the canvas to load.
- Double-click the telco.csv node. This node is a Data Asset node that points to the telco.csv file in the project.
- Review the File format properties.
- Optional: Click Preview data to see the full data set.
- Double-click the Type node and click Read Values. This node specifies field
properties, such as measurement level (the type of data that the field contains), and the role of
each field as a target or input in modeling. Make sure that all measurement levels are set
correctly. For example, most fields with values of
0.0
and1.0
can be regarded as flags.gender
is more correctly considered as a field with a set of two values, instead of a flag, so leave its measurement value as Nominal. - Set the role for the
custcat
field to Target. Leave the role for all other fields set to Input. - Double-click the Filter node to see its properties.
- Notice that this node filters out only the relevant fields:
region
,age
,marital
,address
,income
,ed
,employ
,retire
,gender
,reside
, andcustcat
). Other fields are excluded for this analysis.
Check your progress
The following image shows the Filter node. You are now ready to view the Logistic node.
Task 3: View the Logistic node
Follow these steps to classify customers by using multinomial logistic regression:
- Double-click the custcat (Logistic) node to see its properties.
- In the Model Settings section, select the Multinomial procedure.
- A Binomial model is used when the target field is a flag or nominal field with two discrete values.
- A Multinomial model is used when the target field is a nominal field with more than two values.
- Next, select the Stepwise method and Main Effects
model type. Also, select the Include constant in equation checkbox.
- In the Expert Options section, select Expert mode.
- Click Output. Select Classification table, and click OK.
Check your progress
The following image shows the Logistic node. You are now ready to browse the model.
Task 4: Browse the model
Follow these steps to browse the model:
- Hover over the custcat (Logistic) node, and click the Run icon .
- In the Outputs and models pane, click the custcat model to view the results.
You can then explore the model information, feature (predictor) importance, and parameter estimates information.
These results are based on the training data only. To assess how well the model generalizes to other data in the real world, you can use a Partition node to hold out a subset of records for purposes of testing and validation.
Check your progress
Summary
This example showed you how to use demographic data to predict usage patterns by building a logistic regression model for classifying records based on values of input fields.
Next steps
You are now ready to try other SPSS® Modeler tutorials.