You can invoke native Python APIs from your scripts to interact with SPSS Modeler.
The following APIs are supported.
To see an example, you can download the sample stream python-extension-str.zip and import it into SPSS Modeler (in the Assets tab, click . Then open the Extension node properties in the flow to see example syntax.
APIs for data models
modelerpy.isComputeDataModelOnly()
You can use this API to check whether a current run is to compute the output data or only compute the output data model. When it returns
true
, your script must not perform any task that depends on input or output data, otherwise the run fails.modelerpy.getDataModel()
This API contacts SPSS Modeler to get the data model for an input dataset. The return value is an instance of
class DataModel
, which describes metadata of the input dataset, including field count, field name, field storage type, and so on.modelerpy.setOutputDataModel(dataModel)
This API sends an instance of a class
DataModel
back to SPSS Modeler, and must be invoked before your script passes a dataset to SPSS Modeler. SPSS Modeler uses the metadata that is described in thisDataModel
instance to handle your data on the SPSS Modeler side.
APIs for modeling
modelerpy.saveModel(model, name='model')
This API transforms a Python model into an SPSS Modeler model, which SPSS Modeler then saves. The saved model is copied to a generated model nugget. Invoke this API from a modeling node when a Python model is built.
modelerpy.loadModel(name='model')
This API loads an SPSS Modeler saved model and creates a Python object for the saved model. Invoke this API from the model nugget to load the saved model for further processing, such as scoring.
APIs for input/output datasets
modelerpy.readPandasDataframe()
This API reads a dataset from SPSS Modeler to Python. The return value is a Python Pandas DataFrame (a two-dimensional data structure, like a two-dimensional array, or a table with rows and columns).
modelerpy.writePandasDataframe(df)
This API writes a Python Pandas DataFrame from Python to SPSS Modeler.
APIs for packages
modelerpy.installPackage(package)
This API pulls a package from
pypi.org
and installs it.modelerpy.uninstallPackage(package)
This API uninstalls an installed package.
modelerpy.listPackages()
This API provides a list of all the installed packages.
APIs for metadata
Use the following metadata-related classes withmodelerpy.getDataModel
and
modelerpy.setOutputDataModel
.modelerpy.DataModel
This API is the main entry class for the metadata. It contains an array of instances of
class Field
and includes the following methodsmodelerpy.DataModel.getFields
This method returns the array of
class Field
instances.modelerpy.DataModel.addField
This method adds an instance of
Field
to the metadata array.modelerpy.Field
The
Field
class is where the actual metadata info is stored, including the field name, storage, and measurement,modelerpy.Field.getName
This method returns the name of the field.
modelerpy.Field.getStorage
This method returns the storage of the field. Valid storage includes:
integer
,real
,string
,date
,time
, andtimestamp
.modelerpy.Field.getMeasure
This method returns the measurement of the field. Valid measurements include:
discrete
,flag
,nominal
,ordinal
, andcontinuous
.
DataModel
object by
invoking the modelerpy.DataModel
constructor with an array of
modelerpy.Field
. The modelerpy.Field
constructor accepts field name, field storage, and field measurement as its input
parameters (field storage and field measurement are required; field measurement is
optional).dataModel = modelerpy.DataModel([
# %FieldName%, %StorageType%, %MeasurementType%
modelerpy.Field(‘StringField’, ‘string’, ‘nominal’),
modelerpy.Field(‘FloatField’, ‘real’, ‘continuous’),
modelerpy.Field(‘IntegerField’, ‘integer’, ‘ordinal’),
modelerpy.Field(‘BooleanField’, ‘integer’, ‘flag’),
modelerpy.Field(‘DatetimeField’, ‘timestamp’, ‘continuous’),
modelerpy.Field(‘TimeField’, ‘time’, ‘continuous’),
modelerpy.Field(‘DateField’, ‘date’, ‘continuous’),
])
# StorageType could be: integer, real, string, date, time, timestamp
# MeasurementType could be: discrete, flag, nominal, ordinal, continuous
outputDataModel = modelerDataModel
outputDataModel.addField(modelerpy.Field(field_outlier, "real", measure="flag"))
outputDataModel.addField(modelerpy.Field(field_dist_hp, "real", measure="continuous"))