Python for Spark scripts
SPSS Modeler supports Python scripts for Apache Spark.
Note:
- Python nodes depend on the Spark environment.
- Python scripts must use the Spark API because data is presented in the form of a Spark DataFrame.
- When installing Python, make sure all users have permission to access the Python installation.
- If you want to use the Machine Learning Library (MLlib), you must install a version of Python that includes NumPy.
Tips
You can run the following Python scripts from an Extension Output node:
- To view information about the distribution of Python included with SPSS Modeler:
import sys sys.version
- To list all installed Python packages:
import subprocess subprocess.check_call([sys.executable, '-m', 'pip', 'list'])
- To install Python packages from an air-gapped environment, use the
--index-url
option which allowspip
to install packages from a given Python repository (the repository must be compliant with PEP 503). For more information, including a list of all options, see https://pip.pypa.io/en/stable/cli/pip_install/.