Last updated: May 23, 2024
The CHAID node generates decision trees by using chi-square statistics to identify optimal splits. Unlike the C&R Tree and Quest nodes, CHAID can generate nonbinary trees, meaning that some splits have more than two branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits but takes longer to compute.
Example
stream = modeler.script.stream()
sourcenode = stream.findByID("id46WRP1285C")
node = stream.createAt("chaid", "My node", 200, 100)
stream.link(sourcenode, node)
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "Na", "K", "Cholesterol", "BP"])
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "CHAID")
node.setPropertyValue("method", "Chaid")
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", "Test")
node.setPropertyValue("split_alpha", 0.03)
node.setPropertyValue("merge_alpha", 0.04)
node.setPropertyValue("chi_square", "Pearson")
node.setPropertyValue("use_percentage", False)
node.setPropertyValue("min_parent_records_abs", 40)
node.setPropertyValue("min_child_records_abs", 30)
node.setPropertyValue("epsilon", 0.003)
node.setPropertyValue("max_iterations", 75)
node.setPropertyValue("split_merged_categories", True)
node.setPropertyValue("bonferroni_adjustment", True)
chaidnode Properties |
Datatype or values | Property description |
---|---|---|
target
|
field | CHAID models require a single target and one or more input fields. You can also specify a frequency. For more information, see Common modeling node properties. |
continue_training_existing_model
|
flag | |
objective
|
|
psm is used for large datasets, and requires a server connection. |
model_output_type
|
|
|
use_tree_directives
|
flag | |
tree_directives
|
string | |
method
|
|
|
use_max_depth
|
|
|
max_depth
|
integer | Maximum tree depth, from 0 to 1000. Used only if use_max_depth =
Custom . |
use_percentage
|
flag | |
min_parent_records_pc
|
number | |
min_child_records_pc
|
number | |
min_parent_records_abs
|
number | |
min_child_records_abs
|
number | |
use_costs
|
flag | |
costs
|
structured | Structured property. |
trails
|
number | Number of component models for boosting or bagging. |
set_ensemble_method
|
|
The default rule for combining categorical targets. |
range_ensemble_method
|
|
Default combining rule for continuous targets. |
large_boost
|
flag | Applies boosting for large data sets. |
split_alpha
|
number | Significance level for splitting. |
merge_alpha
|
number | Significance level for merging. |
bonferroni_adjustment
|
flag | Adjust significance values by using the Bonferroni method. |
split_merged_categories
|
flag | Allow resplitting of merged categories. |
chi_square
|
|
The method used to calculate the chi-square statistic: Pearson or Likelihood Ratio |
epsilon
|
number | Minimum change in expected cell frequencies.. |
max_iterations
|
number | Maximum iterations for convergence. |
set_random_seed
|
integer | |
seed
|
number | |
calculate_variable_importance
|
flag | |
calculate_raw_propensities
|
flag | |
calculate_adjusted_propensities
|
flag | |
adjusted_propensity_partition
|
|
|
maximum_number_of_models
|
integer | |
train_pct |
double | The algorithm internally separates records into a model building set and an overfit
prevention set. The overfit prevention set is an independent set of data records used to track
errors during training, which prevents the method from modeling chance variation in the data.
Specify a percentage of records. The default is 30 . |
use_customize_layer |
Boolean | The default value is false . You can set this property to
true if you want to designate specific fields as points to split the decision tree
at. |
customize_layer |
list | This property is used only when use_customize_layer is set to
true . This property is a list of objects. Each of the objects has two attributes:
When the SPSS Modeler flow runs, the CHAID algorithm
evaluates and returns a candidate list of fields to split at based on the
p value
for each layer. For a custom layer, each field that you specified for the layer is compared to the
full candidate list of fields. The first field to match a field from the candidate list is used for
the split. The rest of the specified fields are ignored. If none of the fields match, a warning
message appears and the tree splits as normal. |