Last updated: Feb 11, 2025
With the Extension Transform node, you can take data from a flow and apply
transformations to the data using R scripting or Python for Spark scripting.
Python for Spark example
import modeler.api
stream =
node = stream.create("extension_process", "extension_process")
node.setPropertyValue("syntax_type", "Python")
process_script = """
import spss.pyspark.runtime
from pyspark.sql.types import *
cxt = spss.pyspark.runtime.getContext()
if cxt.isComputeDataModelOnly():
_schema = StructType([StructField("Age", LongType(), nullable=True), \
StructField("Sex", StringType(), nullable=True), \
StructField("BP", StringType(), nullable=True), \
StructField("Na", DoubleType(), nullable=True), \
StructField("K", DoubleType(), nullable=True), \
StructField("Drug", StringType(), nullable=True)])
df = cxt.getSparkInputData()
print df.dtypes[:]
_newDF ="Age","Sex","BP","Na","K","Drug")
print _newDF.dtypes[:]
node.setPropertyValue("python_syntax", process_script)
R example
node.setPropertyValue("syntax_type", "R")
node.setPropertyValue("r_syntax", """day<-as.Date(modelerData$dob, format="%Y-%m-%d")
next_day<-day + 1
var1<-c(fieldName="Next day",fieldLabel="",fieldStorage="date",fieldMeasure="",fieldFormat="",
extensionprocessnode properties |
Data type | Property description |
syntax_type |
R Python | Specify which script runs – R or Python (R is the default). |
r_syntax |
string | The R scripting syntax to run. |
python_syntax |
string | The Python scripting syntax to run. |
use_batch_size |
flag | Enable use of batch processing. |
batch_size |
integer | Specify the number of data records to include in each batch. |
convert_flags |
Option to convert flag fields. |
convert_missing |
flag | Option to convert missing values to the R NA value. |
convert_datetime |
flag | Option to convert variables with date or datetime formats to R date/time formats. |
convert_datetime_class |
Options to specify to what format variables with date or datetime formats are converted. |