randomtrees properties | IBM Cloud Pak for Data as a Service

randomtrees properties

Random Trees node icon The Random Trees node is similar to the C&RT Tree node; however, the Random Trees node is designed to process big data to create a single tree. The Random Trees tree node generates a decision tree that you use to predict or classify future observations. The method uses recursive partitioning to split the training records into segments by minimizing the impurity at each step, where a node in the tree is considered pure if 100% of cases in the node fall into a specific category of the target field. Target and input fields can be numeric ranges or categorical (nominal, ordinal, or flags); all splits are binary (only two subgroups).

Table 1. randomtrees properties
`randomtrees` Properties	Values	Property description
`target`	field	In the Random Trees node, models require a single target and one or more input fields. A frequency field can also be specified. See Common modeling node properties for more information.
`number_of_models`	integer	Determines the number of models to build as part of the ensemble modeling.
`use_number_of_predictors`	flag	Determines whether `number_of_predictors` is used.
`number_of_predictors`	integer	Specifies the number of predictors to be used when building split models.
`use_stop_rule_for_accuracy`	flag	Determines whether model building stops when accuracy can't be improved.
`sample_size`	number	Reduce this value to improve performance when processing very large datasets.
`handle_imbalanced_data`	flag	If the target of the model is a particular flag outcome, and the ratio of the desired outcome to a non-desired outcome is very small, then the data is imbalanced and the bootstrap sampling that's conducted by the model may affect the model's accuracy. Enable imbalanced data handling so that the model will capture a larger proportion of the desired outcome and generate a stronger model.
`use_weighted_sampling`	flag	When False, variables for each node are randomly selected with the same probability. When True, variables are weighted and selected accordingly.
`max_node_number`	integer	Maximum number of nodes allowed in individual trees. If the number would be exceeded on the next split, tree growth halts.
`max_depth`	integer	Maximum tree depth before growth halts.
`min_child_node_size`	integer	Determines the minimum number of records allowed in a child node after the parent node is split. If a child node would contain fewer records than specified here, the parent node won't be split.
`use_costs`	flag
`costs`	structured	Structured property. The format is a list of 3 values: the actual value, the predicted value, and the cost if that prediction is wrong. For example: `tree.setPropertyValue("costs", [["drugA", "drugB", 3.0], ["drugX", "drugY", 4.0]])`
`default_cost_increase`	`none` `linear` `square` `custom`	Note this is only enabled for ordinal targets. Set default values in the costs matrix.
`max_pct_missing`	integer	If the percentage of missing values in any input is greater than the value specified here, the input is excluded. Minimum 0, maximum 100.
`exclude_single_cat_pct`	integer	If one category value represents a higher percentage of the records than specified here, the entire field is excluded from model building. Minimum 1, maximum 99.
`max_category_number`	integer	If the number of categories in a field exceeds this value, the field is excluded from model building. Minimum 2.
`min_field_variation`	number	If the coefficient of variation of a continuous field is smaller than this value, the field is excluded from model building.
`num_bins`	integer	Only used if the data is made up of continuous inputs. Set the number of equal frequency bins to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100.
`topN`	integer	Specifies the number of rules to report. Default value is 50, with a minimum of 1 and a maximum of 1000.