Handling records with system missing values
What are system missing values?
System missing values represent data values that aren't known or not applicable. In databases, these values are often referred to as NULL values. System missing values are different from blank values. Blank values are typically defined in the Type node as particular values, or ranges of values, that can be regarded as user-defined-missing. Blank values are handled differently in the context of modeling.
Constructing system missing values
System missing values might be present in data that's read from a data source (for example,
database tables might contain NULL values). System missing values can be constructed by
using the value undef
in expressions. For example, the following CLEM expression
returns the Age
, if less than or equal to 30
, or a missing value
if greater than 30
:
if Age > 30 then undef else Age endif
Missing values can also be created when an outer join is carried out, when a number is divided by zero, when the square root of a negative number is computed, and in other situations.
Displaying system missing values
System missing values are displayed in tables and other output as $null$
.
Testing for system missing values
Use the special function @NULL to return true
if the argument
value is a system missing value. For example:
if @NULL(MyFieldName) then 'It is null' else 'It is not null' endif
System missing values passed to functions
System missing values that are passed to functions usually propagate missing values to the
output. For example, if the value of field f1
is a system missing value in a
particular row, then the expression log(f1)
also evaluates to a system missing
value for that row. An exception is the @NULL function.
System missing values in expressions that involve arithmetic operators
Applying arithmetic operators to values that include a system missing value results in a system
missing value. For example, if the value of field f1
is a system missing value in a
particular row, then the expression f1 + 10
also evaluates to a system missing
value for that row.
System missing values in expressions that involve logical operators
When you work with system missing values in expressions that involve logical operators, the rules of three-valued logic (true, false, and missing) apply and can be described in truth tables. The truth tables for the common logical operators of not, and, and or are shown in the following tables.
Operand | NOT Operand |
---|---|
true | false |
false | true |
missing | missing |
Operand1 | Operand2 | Operand1 AND Operand2 |
---|---|---|
true | true | true |
true | false | false |
true | missing | missing |
false | true | false |
false | false | false |
false | missing | false |
missing | true | missing |
missing | false | false |
missing | missing | missing |
Operand1 | Operand2 | Operand1 OR Operand2 |
---|---|---|
true | true | true |
true | false | true |
true | missing | true |
false | true | true |
false | false | false |
false | missing | missing |
missing | true | true |
missing | false | missing |
missing | missing | missing |
System missing values in expressions that involve comparison operators
When you compare a system missing value and a non-system-missing value, the outcome evaluates to a system missing value rather than a true or false result. System missing values can be compared with each other; two system missing values are considered to be equal.
System missing values in if/then/else/endif expressions
When you use conditional expressions, and the conditional expression returns a system missing
value, the value from the else
clause is returned from the conditional
expression.
System missing values in the Select node
When, for a particular record, the selection expression evaluates to a missing value, the record is not output from the Select node (this action applies to both Include and Discard modes).
System missing values in the Merge node
When you merge by using a key, any records that have system missing values in a key field are not merged.
System missing values in aggregation
When aggregating data on columns, missing values are not included in the calculation. For
example, in a column with three values { 1
, 2
, and
undef
}, the sum of the values in the column is computed as 3
; the
mean value is computed as 1.5
.