chaidnode properties | IBM Cloud Pak for Data as a Service

chaidnode properties

CHAID node icon The CHAID node generates decision trees by using chi-square statistics to identify optimal splits. Unlike the C&R Tree and Quest nodes, CHAID can generate nonbinary trees, meaning that some splits have more than two branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits but takes longer to compute.

Example

stream = modeler.script.stream()
sourcenode = stream.findByID("id46WRP1285C")

node = stream.createAt("chaid", "My node", 200, 100)
stream.link(sourcenode, node)

node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "Na", "K", "Cholesterol", "BP"])
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "CHAID")
node.setPropertyValue("method", "Chaid")
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", "Test")
node.setPropertyValue("split_alpha", 0.03)
node.setPropertyValue("merge_alpha", 0.04)
node.setPropertyValue("chi_square", "Pearson")
node.setPropertyValue("use_percentage", False)
node.setPropertyValue("min_parent_records_abs", 40)
node.setPropertyValue("min_child_records_abs", 30)
node.setPropertyValue("epsilon", 0.003)
node.setPropertyValue("max_iterations", 75)
node.setPropertyValue("split_merged_categories", True)
node.setPropertyValue("bonferroni_adjustment", True)

Table 1. chaidnode properties
`chaidnode` Properties	Datatype or values	Property description
`target`	field	CHAID models require a single target and one or more input fields. You can also specify a frequency. For more information, see Common modeling node properties.
`continue_training_existing_model`	flag
`objective`	`Standard` `Boosting` `Bagging` `psm`	`psm` is used for large datasets, and requires a server connection.
`model_output_type`	`Single` `InteractiveBuilder`
`use_tree_directives`	flag
`tree_directives`	string
`method`	`Chaid` `ExhaustiveChaid`
`use_max_depth`	`Default` `Custom`
`max_depth`	integer	Maximum tree depth, from 0 to 1000. Used only if `use_max_depth = Custom`.
`use_percentage`	flag
`min_parent_records_pc`	number
`min_child_records_pc`	number
`min_parent_records_abs`	number
`min_child_records_abs`	number
`use_costs`	flag
`costs`	structured	Structured property.
`trails`	number	Number of component models for boosting or bagging.
`set_ensemble_method`	`Voting` `HighestProbability` `HighestMeanProbability`	The default rule for combining categorical targets.
`range_ensemble_method`	`Mean` `Median`	Default combining rule for continuous targets.
`large_boost`	flag	Applies boosting for large data sets.
`split_alpha`	number	Significance level for splitting.
`merge_alpha`	number	Significance level for merging.
`bonferroni_adjustment`	flag	Adjust significance values by using the Bonferroni method.
`split_merged_categories`	flag	Allow resplitting of merged categories.
`chi_square`	`Pearson` `LR`	The method used to calculate the chi-square statistic: Pearson or Likelihood Ratio
`epsilon`	number	Minimum change in expected cell frequencies..
`max_iterations`	number	Maximum iterations for convergence.
`set_random_seed`	integer
`seed`	number
`calculate_variable_importance`	flag
`calculate_raw_propensities`	flag
`calculate_adjusted_propensities`	flag
`adjusted_propensity_partition`	`Test` `Validation`
`maximum_number_of_models`	integer
`train_pct`	double	The algorithm internally separates records into a model building set and an overfit prevention set. The overfit prevention set is an independent set of data records used to track errors during training, which prevents the method from modeling chance variation in the data. Specify a percentage of records. The default is `30`.
`use_customize_layer`	Boolean	The default value is `false`. You can set this property to `true` if you want to designate specific fields as points to split the decision tree at.
`customize_layer`	list	This property is used only when `use_customize_layer` is set to `true`. This property is a list of objects. Each of the objects has two attributes: `Layer` is an integer that indicates the specific n-th layer in the decision tree that you want to customize. In SPSS Modeler, layers start from `0` (root). `Fields` is a list of names. Each name is one of the fields that you want the decision tree to potentially split on for that `Layer`. These fields are evaluated by SPSS Modeler in the order that they are listed. When the SPSS Modeler flow runs, the CHAID algorithm evaluates and returns a candidate list of fields to split at based on the `p` value for each layer. For a custom layer, each field that you specified for the layer is compared to the full candidate list of fields. The first field to match a field from the candidate list is used for the split. The rest of the specified fields are ignored. If none of the fields match, a warning message appears and the tree splits as normal.