kmeansasnode properties | IBM Cloud Pak for Data as a Service

kmeansasnode properties

K-Means-AS node icon K-means is one of the most commonly used clustering algorithms. It clusters data points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler is implemented in Spark. For more information about k-means algorithms, see Clustering.¹

Note: The K-Means-AS node performs one-hot encoding automatically for categorical variables.

Table 1. kmeansasnode properties
`kmeansasnode` Properties	Values	Property description
`roleUse`	string	Specify `predefined` to use predefined roles, or `custom` to use custom field assignments. Default is `predefined`.
`autoModel`	Boolean	Specify `true` to use the default name (`$S-prediction`) for the new generated scoring field, or `false` to use a custom name. Default is `true`.
`features`	field	List of the field names for input when the `roleUse` property is set to `custom`.
`name`	string	The name of the new generated scoring field when the `autoModel` property is set to `false`.
`clustersNum`	integer	The number of clusters to create. Default is `5`.
`initMode`	string	The initialization algorithm. Possible values are `k-means\|\|` or `random`. Default is `k-means\|\|`.
`initSteps`	integer	The number of initialization steps when `initMode` is set to `k-means\|\|`. Default is `2`.
`advancedSettings`	Boolean	Specify `true` to make the following four properties available. Default is `false`.
`maxIteration`	integer	Maximum number of iterations for clustering. Default is `20`.
`tolerance`	string	The tolerance to stop the iterations. Possible settings are `1.0E-1`, `1.0E-2`, ..., `1.0E-6`. Default is `1.0E-4`.
`setSeed`	Boolean	Specify `true` to use a custom random seed. Default is `false`.
`randomSeed`	integer	The custom random seed when the `setSeed` property is `true`.
`displayGraph`	Boolean	Select this option if you want a graph to be included in the output.

¹ "Clustering - RDD-based API." Apache Spark. MLlib: Main Guide. Aug 2024.