Last updated: Jan 17, 2024
The Kohonen node generates a type of neural network that can be used to cluster the data set into distinct groups. When the network is fully trained, records that are similar should be close together on the output map, while records that are different will be far apart. You can look at the number of observations captured by each unit in the model nugget to identify the strong units. This may give you a sense of the appropriate number of clusters.
Example
node = stream.create("kohonen", "My node")
# "Model" tab
node.setPropertyValue("use_model_name", False)
node.setPropertyValue("model_name", "Symbolic Cluster")
node.setPropertyValue("stop_on", "Time")
node.setPropertyValue("time", 1)
node.setPropertyValue("set_random_seed", True)
node.setPropertyValue("random_seed", 12345)
node.setPropertyValue("optimize", "Speed")
# "Expert" tab
node.setPropertyValue("mode", "Expert")
node.setPropertyValue("width", 3)
node.setPropertyValue("length", 3)
node.setPropertyValue("decay_style", "Exponential")
node.setPropertyValue("phase1_neighborhood", 3)
node.setPropertyValue("phase1_eta", 0.5)
node.setPropertyValue("phase1_cycles", 10)
node.setPropertyValue("phase2_neighborhood", 1)
node.setPropertyValue("phase2_eta", 0.2)
node.setPropertyValue("phase2_cycles", 75)
kohonennode Properties |
Values | Property description |
---|---|---|
inputs
|
[field1 ... fieldN] | Kohonen models use a list of input fields, but no target. Frequency and weight fields are not used. See Common modeling node properties for more information. |
continue
|
flag | |
show_feedback
|
flag | |
stop_on
|
Default Time |
|
time
|
number | |
optimize
|
Speed Memory |
Use to specify whether model building should be optimized for speed or for memory. |
cluster_label
|
flag | |
mode
|
Simple Expert |
|
width
|
number | |
length
|
number | |
decay_style
|
Linear Exponential |
|
phase1_neighborhood
|
number | |
phase1_eta
|
number | |
phase1_cycles
|
number | |
phase2_neighborhood
|
number | |
phase2_eta
|
number | |
phase2_cycles
|
number | |
set_random_seed
|
Boolean | If no random seed is set, the sequence of random values used to initialize the network weights will be different every time the node runs. This can cause the node to create different models on different runs, even if the node settings and data values are exactly the same. By selecting this option, you can set the random seed to a specific value so the resulting model is exactly reproducible. |
random_seed
|
integer | Seed |