Last updated: Jan 17, 2024
The Tree-AS node is similar to the CHAID node; however, the Tree-AS node is designed to process big data to create a single tree and displays the resulting model in the output viewer. The node generates a decision tree by using chi-square statistics (CHAID) to identify optimal splits. This use of CHAID can generate nonbinary trees, meaning that some splits have more than two branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits but takes longer to compute.
treeas Properties |
Values | Property description |
---|---|---|
target
|
field | In the Tree-AS node, CHAID models require a single target and one or more input fields. A frequency field can also be specified. See Common modeling node properties for more information. |
method
|
chaid
exhaustive_chaid
|
|
max_depth
|
integer | Maximum tree depth, from 0 to 20. The default value is 5. |
num_bins
|
integer | Only used if the data is made up of continuous inputs. Set the number of equal frequency bins to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100. |
record_threshold
|
integer | The number of records at which the model will switch from using p-values to Effect sizes while building the tree. The default is 1,000,000; increase or decrease this in increments of 10,000. |
split_alpha
|
number | Significance level for splitting. The value must be between 0.01 and 0.99. |
merge_alpha
|
number | Significance level for merging. The value must be between 0.01 and 0.99. |
bonferroni_adjustment
|
flag | Adjust significance values using Bonferroni method. |
effect_size_threshold_cont |
number | Set the Effect size threshold when splitting nodes and merging categories when using a continuous target. The value must be between 0.01 and 0.99. |
effect_size_threshold_cat |
number | Set the Effect size threshold when splitting nodes and merging categories when using a categorical target. The value must be between 0.01 and 0.99. |
split_merged_categories
|
flag | Allow resplitting of merged categories. |
grouping_sig_level |
number | Used to determine how groups of nodes are formed or how unusual nodes are identified. |
chi_square
|
pearson
likelihood_ratio
|
Method used to calculate the chi-square statistic: Pearson or Likelihood Ratio |
minimum_record_use
|
use_percentage
use_absolute
|
|
min_parent_records_pc
|
number | Default value is 2. Minimum 1, maximum 100, in increments of 1. Parent branch value must be higher than child branch. |
min_child_records_pc
|
number | Default value is 1. Minimum 1, maximum 100, in increments of 1. |
min_parent_records_abs
|
number | Default value is 100. Minimum 1, maximum 100, in increments of 1. Parent branch value must be higher than child branch. |
min_child_records_abs
|
number | Default value is 50. Minimum 1, maximum 100, in increments of 1. |
epsilon
|
number | Minimum change in expected cell frequencies.. |
max_iterations
|
number | Maximum iterations for convergence. |
use_costs
|
flag | |
costs
|
structured | Structured property. The format is a list of 3 values: the actual value, the predicted value,
and the cost if that prediction is wrong. For example:
tree.setPropertyValue("costs", [["drugA", "drugB", 3.0], ["drugX", "drugY",
4.0]]) |
default_cost_increase
|
none
linear
square
custom
|
Only enabled for ordinal targets. Set default values in the costs matrix. |
calculate_conf
|
flag | |
display_rule_id
|
flag | Adds a field in the scoring output that indicates the ID for the terminal node to which each record is assigned. |