binningnode properties | IBM Cloud Pak for Data as a Service

binningnode properties

Binning node icon The Binning node automatically creates new nominal (set) fields based on the values of one or more existing continuous (numeric range) fields. For example, you can transform a continuous income field into a new categorical field containing groups of income as deviations from the mean. After you create bins for the new field, you can generate a Derive node based on the cut points.

Example

node = stream.create("binning", "My node")
node.setPropertyValue("fields", ["Na", "K"])
node.setPropertyValue("method", "Rank")
node.setPropertyValue("fixed_width_name_extension", "_binned")
node.setPropertyValue("fixed_width_add_as", "Suffix")
node.setPropertyValue("fixed_bin_method", "Count")
node.setPropertyValue("fixed_bin_count", 10)
node.setPropertyValue("fixed_bin_width", 3.5)
node.setPropertyValue("tile10", True)

Table 1. binningnode properties
`binningnode` properties	Data type	Property description
`fields`	[field1 field2 ... fieldn]	Continuous (numeric range) fields pending transformation. You can bin multiple fields simultaneously.
`method`	`FixedWidth` `EqualCount` `Rank` `SDev` `Optimal`	Method used for determining cut points for new field bins (categories).
`recalculate_bins`	`Always` `IfNecessary`	Specifies whether the bins are recalculated and the data placed in the relevant bin every time the node is executed, or that data is added only to existing bins and any new bins that have been added.
`fixed_width_name_extension`	string	The default extension is _BIN.
`fixed_width_add_as`	`Suffix` `Prefix`	Specifies whether the extension is added to the end (suffix) of the field name or to the start (prefix). The default extension is income_BIN.
`fixed_bin_method`	`Width` `Count`
`fixed_bin_count`	integer	Specifies an integer used to determine the number of fixed-width bins (categories) for the new field(s).
`fixed_bin_width`	real	Value (integer or real) for calculating width of the bin.
`equal_count_name_` `extension`	string	The default extension is _TILE.
`equal_count_add_as`	`Suffix` `Prefix`	Specifies an extension, either suffix or prefix, used for the field name generated by using standard p-tiles. The default extension is _TILE plus N, where N is the tile number.
`tile4`	flag	Generates four quantile bins, each containing 25% of cases.
`tile5`	flag	Generates five quintile bins.
`tile10`	flag	Generates 10 decile bins.
`tile20`	flag	Generates 20 vingtile bins.
`tile100`	flag	Generates 100 percentile bins.
`use_custom_tile`	flag
`custom_tile_name_extension`	string	The default extension is _TILEN.
`custom_tile_add_as`	`Suffix` `Prefix`
`custom_tile`	integer
`equal_count_method`	`RecordCount` `ValueSum`	The `RecordCount` method seeks to assign an equal number of records to each bin, while `ValueSum` assigns records so that the sum of the values in each bin is equal.
`tied_values_method`	`Next` `Current` `Random`	Specifies which bin tied value data is to be put in.
`rank_order`	`Ascending` `Descending`	This property includes `Ascending` (lowest value is marked 1) or `Descending` (highest value is marked 1).
`rank_add_as`	`Suffix` `Prefix`	This option applies to rank, fractional rank, and percentage rank.
`rank`	flag
`rank_name_extension`	string	The default extension is _RANK.
`rank_fractional`	flag	Ranks cases where the value of the new field equals rank divided by the sum of the weights of the nonmissing cases. Fractional ranks fall in the range of 0–1.
`rank_fractional_name_` `extension`	string	The default extension is _F_RANK.
`rank_pct`	flag	Each rank is divided by the number of records with valid values and multiplied by 100. Percentage fractional ranks fall in the range of 1–100.
`rank_pct_name_extension`	string	The default extension is _P_RANK.
`sdev_name_extension`	string
`sdev_add_as`	`Suffix` `Prefix`
`sdev_count`	`One` `Two` `Three`
`optimal_name_extension`	string	The default extension is _OPTIMAL.
`optimal_add_as`	`Suffix` `Prefix`
`optimal_supervisor_field`	field	Field chosen as the supervisory field to which the fields selected for binning are related.
`optimal_merge_bins`	flag	Specifies that any bins with small case counts will be added to a larger, neighboring bin.
`optimal_small_bin_threshold`	integer
`optimal_pre_bin`	flag	Indicates that prebinning of dataset is to take place.
`optimal_max_bins`	integer	Specifies an upper limit to avoid creating an inordinately large number of bins.
`optimal_lower_end_point`	`Inclusive` `Exclusive`
`optimal_first_bin`	`Unbounded` `Bounded`
`optimal_last_bin`	`Unbounded` `Bounded`