featureselectionnode properties

Last updated: Feb 11, 2025

Feature Selection node icon The Feature Selection node screens input fields for removal based on a set of criteria (such as the percentage of missing values); it then ranks the importance of remaining inputs relative to a specified target. For example, given a data set with hundreds of potential inputs, which are most likely to be useful in modeling patient outcomes?

Example

node = stream.create("featureselection", "My node")
node.setPropertyValue("screen_single_category", True)
node.setPropertyValue("max_single_category", 95)
node.setPropertyValue("screen_missing_values", True)
node.setPropertyValue("max_missing_values", 80)
node.setPropertyValue("criteria", "Likelihood")
node.setPropertyValue("unimportant_below", 0.8)
node.setPropertyValue("important_above", 0.9)
node.setPropertyValue("important_label", "Check Me Out!")
node.setPropertyValue("selection_mode", "TopN")
node.setPropertyValue("top_n", 15)

Table 1. featureselectionnode properties
`featureselectionnode` Properties	Values	Property description
`target`	field	Feature Selection models rank predictors relative to the specified target. Weight and frequency fields are not used. See Common modeling node properties for more information.
`screen_single_category`	flag	If `True`, screens fields that have too many records falling into the same category relative to the total number of records.
`max_single_category`	number	Specifies the threshold used when `screen_single_category` is `True`.
`screen_missing_values`	flag	If `True`, screens fields with too many missing values, expressed as a percentage of the total number of records.
`max_missing_values`	number
`screen_num_categories`	flag	If `True`, screens fields with too many categories relative to the total number of records.
`max_num_categories`	number
`screen_std_dev`	flag	If `True`, screens fields with a standard deviation of less than or equal to the specified minimum.
`min_std_dev`	number
`screen_coeff_of_var`	flag	If `True`, screens fields with a coefficient of variance less than or equal to the specified minimum.
`min_coeff_of_var`	number
`criteria`	`Pearson` `Likelihood` `CramersV` `Lambda`	When ranking categorical predictors against a categorical target, specifies the measure on which the importance value is based.
`unimportant_below`	number	Specifies the threshold p values used to rank variables as important, marginal, or unimportant. Accepts values from 0.0 to 1.0.
`important_above`	number	Accepts values from 0.0 to 1.0.
`unimportant_label`	string	Specifies the label for the unimportant ranking.
`marginal_label`	string
`important_label`	string
`selection_mode`	`ImportanceLevel` `ImportanceValue` `TopN`
`select_important`	flag	When `selection_mode` is set to `ImportanceLevel`, specifies whether to select important fields.
`select_marginal`	flag	When `selection_mode` is set to `ImportanceLevel`, specifies whether to select marginal fields.
`select_unimportant`	flag	When `selection_mode` is set to `ImportanceLevel`, specifies whether to select unimportant fields.
`importance_value`	number	When `selection_mode` is set to `ImportanceValue`, specifies the cutoff value to use. Accepts values from 0 to 100.
`top_n`	integer	When `selection_mode` is set to `TopN`, specifies the cutoff value to use. Accepts values from 0 to 1000.

Was the topic helpful?

0/1000