Last updated: Sep 10, 2024
K-means is one of the most commonly used clustering algorithms. It clusters data points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler is implemented in Spark. For more information about k-means algorithms, see Clustering.1
Note: The K-Means-AS node performs one-hot encoding automatically for categorical variables.
kmeansasnode Properties |
Values | Property description |
---|---|---|
roleUse
|
string | Specify predefined to use predefined roles, or custom to
use custom field assignments. Default is predefined . |
autoModel
|
Boolean | Specify true to use the default name ($S-prediction ) for
the new generated scoring field, or false to use a custom name. Default is
true . |
features
|
field | List of the field names for input when the roleUse property is set to
custom . |
name
|
string | The name of the new generated scoring field when the autoModel property is
set to false . |
clustersNum
|
integer | The number of clusters to create. Default is 5 . |
initMode
|
string | The initialization algorithm. Possible values are k-means|| or
random . Default is k-means|| . |
initSteps
|
integer | The number of initialization steps when initMode is set to
k-means|| . Default is 2 . |
advancedSettings
|
Boolean | Specify true to make the following four properties available. Default is
false . |
maxIteration
|
integer | Maximum number of iterations for clustering. Default is 20 . |
tolerance
|
string | The tolerance to stop the iterations. Possible settings are 1.0E-1 ,
1.0E-2 , ..., 1.0E-6 . Default is 1.0E-4 . |
setSeed
|
Boolean | Specify true to use a custom random seed. Default is
false . |
randomSeed
|
integer | The custom random seed when the setSeed property is
true . |
displayGraph
|
Boolean | Select this option if you want a graph to be included in the output. |
1 "Clustering - RDD-based API." Apache Spark. MLlib: Main Guide. Aug 2024.