treeas properties | IBM Cloud Pak for Data as a Service

treeas properties

Tree-AS node icon The Tree-AS node is similar to the CHAID node; however, the Tree-AS node is designed to process big data to create a single tree and displays the resulting model in the output viewer. The node generates a decision tree by using chi-square statistics (CHAID) to identify optimal splits. This use of CHAID can generate nonbinary trees, meaning that some splits have more than two branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits but takes longer to compute.

Table 1. treeas properties
`treeas` Properties	Values	Property description
`target`	field	In the Tree-AS node, CHAID models require a single target and one or more input fields. A frequency field can also be specified. See Common modeling node properties for more information.
`method`	`chaid` `exhaustive_chaid`
`max_depth`	integer	Maximum tree depth, from 0 to 20. The default value is 5.
`num_bins`	integer	Only used if the data is made up of continuous inputs. Set the number of equal frequency bins to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100.
`record_threshold`	integer	The number of records at which the model will switch from using p-values to Effect sizes while building the tree. The default is 1,000,000; increase or decrease this in increments of 10,000.
`split_alpha`	number	Significance level for splitting. The value must be between 0.01 and 0.99.
`merge_alpha`	number	Significance level for merging. The value must be between 0.01 and 0.99.
`bonferroni_adjustment`	flag	Adjust significance values using Bonferroni method.
`effect_size_threshold_cont`	number	Set the Effect size threshold when splitting nodes and merging categories when using a continuous target. The value must be between 0.01 and 0.99.
`effect_size_threshold_cat`	number	Set the Effect size threshold when splitting nodes and merging categories when using a categorical target. The value must be between 0.01 and 0.99.
`split_merged_categories`	flag	Allow resplitting of merged categories.
`grouping_sig_level`	number	Used to determine how groups of nodes are formed or how unusual nodes are identified.
`chi_square`	`pearson` `likelihood_ratio`	Method used to calculate the chi-square statistic: Pearson or Likelihood Ratio
`minimum_record_use`	`use_percentage` `use_absolute`
`min_parent_records_pc`	number	Default value is 2. Minimum 1, maximum 100, in increments of 1. Parent branch value must be higher than child branch.
`min_child_records_pc`	number	Default value is 1. Minimum 1, maximum 100, in increments of 1.
`min_parent_records_abs`	number	Default value is 100. Minimum 1, maximum 100, in increments of 1. Parent branch value must be higher than child branch.
`min_child_records_abs`	number	Default value is 50. Minimum 1, maximum 100, in increments of 1.
`epsilon`	number	Minimum change in expected cell frequencies..
`max_iterations`	number	Maximum iterations for convergence.
`use_costs`	flag
`costs`	structured	Structured property. The format is a list of 3 values: the actual value, the predicted value, and the cost if that prediction is wrong. For example: `tree.setPropertyValue("costs", [["drugA", "drugB", 3.0], ["drugX", "drugY", 4.0]])`
`default_cost_increase`	`none` `linear` `square` `custom`	Only enabled for ordinal targets. Set default values in the costs matrix.
`calculate_conf`	flag
`display_rule_id`	flag	Adds a field in the scoring output that indicates the ID for the terminal node to which each record is assigned.