This section describes how to set up the data model attributes based on
pyspark.sql.StructField
.
spss.datamodel.Role Objects
This class enumerates valid roles for each field in a data model.
BOTH
: Indicates that this field can be either an antecedent or a consequent.
FREQWEIGHT
: Indicates that this field is to be used as a frequency weight; this
isn't displayed to the user.
INPUT
: Indicates that this field is a predictor or an antecedent.
NONE
: Indicates that this field is not used directly during modeling.
TARGET
: Indicates that this field is predicted or a consequent.
PARTITION
: Indicates that this field identifies the data partition.
RECORDID
: Indicates that this field identifie the record id.
SPLIT
: Indicates that this field splits the data.
spss.datamodel.Measure Objects
This class enumerates measurement levels for fields in a data model.
UNKNOWN
: Indicates that the measure type is unknown.
CONTINUOUS
: Indicates that the measure type is continuous.
NOMINAL
: Indicates that the measure type is nominal.
FLAG
: Indicates that the field value is one of two values.
DISCRETE
: Indicates that the field value should be interpreted as a collection
of values.
ORDINAL
: Indicates that the measure type is ordinal.
TYPELESS
: Indicates that the field can have any value compatible with its
storage.
pyspark.sql.StructField Objects
StructType
. A StructField
object
comprises four fields:name (string)
: name of aStructField
dataType (pyspark.sql.DataType)
: specific data typenullable (bool)
: if the values of aStructField
can containNone
valuesmetadata (dictionary)
: a python dictionary that stores the option attributes
measure
: the key word formeasure
attributerole
: the key word forrole
attributedisplayLabel
: the key word forlabel
attribute
from spss.datamodel.Role import Role
from spss.datamodel.Measure import Measure
_metadata = {}
_metadata['measure'] = Measure.TYPELESS
_metadata['role'] = Role.NONE
_metadata['displayLabel'] = "field label description"
StructField("userName", StringType(), nullable=False,
metadata=_metadata)