Setting options for values
The Value mode column under the Type node settings displays a drop-down list of predefined values. Choosing the Specify option on this list and then clicking the gear icon opens a new screen where you can set options for reading, specifying, labeling, and handling values for the selected field.
Many of the controls are common to all types of data. These common controls are discussed here.
Measure. Displays the currently selected measurement level. You can change
this setting to reflect the way that you intend to use data. For instance, if a field called
day_of_week
contains numbers that represent individual days, you might want to
change this to nominal data in order to create a distribution node that examines each category
individually.
Role. Used to tell modeling nodes whether fields will be Input (predictor fields) or Target (predicted fields) for a machine-learning process. Other roles are also available such as Both , None, Partition, Split, Frequency, or Record ID.
- Read. Select to read values when the node runs.
- Pass. Select not to read data for the current field.
- Specify. Options here are used to specify values and labels for the selected field. Used with value checking, use this option to specify values that are based on your knowledge of the current field. This option activates unique controls for each type of field. You can't specify values or labels for a field whose measurement level is Typeless.
- Extend. Select to append the current data with the values that you enter
here. For example, if field_1 has a range from
(0,10)
and you enter a range of values from(8,16)
, the range is extended by adding the16
without removing the original minimum. The new range would be(0,16)
. - Current. Select to keep the current data values.
Value Labels (Add/Edit Labels). In this section you can enter custom labels for each value of the selected field.
Max list length. Only available for data with a measurement level of either Geospatial or Collection. Set the maximum length of the list by specifying the number of elements the list can contain.
Max string length. Only available for typeless data. Use this field when you're generating SQL to create a table. Enter the value of the largest string in your data; this generates a column in the table that's big enough for the string. If the string length value is not available, a default string size is used that may not be appropriate for the data (for example, if the value is too small, errors can occur when writing data to the table; too large a value could adversely affect performance).
Check. Select a method of coercing values to conform to the specified
continuous, flag, or nominal values. This option corresponds to the Check
column in the main Type node settings, and a selection made here will override those in the main
settings. Used with the options for specifying values and labels, value checking allows you to
conform values in the data with expected values. For example, if you specify values as 1,
0
and then use the Discard. option here, you can discard all records
with values other than 1
or 0
.
- Missing values. Use this field to define specific values (such as
99
or0
) as blanks. The value should be appropriate for the storage type of the field. - Range. Used to specify a range of missing values (such as ages
1–17
or greater than65
). If a bound value is blank, then the range is unbounded. For example, if you specify a lower bound of100
with no upper bound, then all values greater than or equal to100
are defined as missing. The bound values are inclusive. For example, a range with a lower bound of5
and an upper bound of10
includes5
and10
in the range definition. You can define a missing value range for any storage type, including date/time and string (in which case the alphabetic sort order is used to determine whether a value is within the range). - Null/White space. You can also specify system nulls (displayed in the
data as
$null$
) and white space (string values with no visible characters) as blanks. Note that the Type node also treats empty strings as white space for purposes of analysis, although they are stored differently internally and may be handled differently in certain cases.
$null$
, use the Filler node.