twostepAS properties | IBM Cloud Pak for Data as a Service

twostepAS properties

Twostep-AS node icon TwoStep Cluster is an exploratory tool that's designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. The algorithm that's employed by this procedure has several desirable features that differentiate it from traditional clustering techniques, such as handling of categorical and continuous variables, automatic selection of number of clusters, and scalability.

Table 1. twostepAS properties
`twostepAS` Properties	Values	Property description
`inputs`	[f1 ... fN]	TwoStepAS models use a list of input fields, but no target. Weight and frequency fields are not recognized.
`use_predefined_roles`	Boolean	Default=`True`
`use_custom_field_assignments`	Boolean	Default=`False`
`cluster_num_auto`	Boolean	Default=`True`
`min_num_clusters`	integer	Default=`2`
`max_num_clusters`	integer	Default=`15`
`num_clusters`	integer	Default=`5`
`clustering_criterion`	`AIC` `BIC`
`automatic_clustering_method`	`use_clustering_criterion_setting` `Distance_jump` `Minimum` `Maximum`
`feature_importance_method`	`use_clustering_criterion_setting` `effect_size`
`use_random_seed`	Boolean
`random_seed`	integer
`distance_measure`	`Euclidean` `Loglikelihood`
`include_outlier_clusters`	Boolean	Default=`True`
`num_cases_in_feature_tree_leaf_is_less_than`	integer	Default=`10`
`top_perc_outliers`	integer	Default=`5`
`initial_dist_change_threshold`	integer	Default=`0`
`leaf_node_maximum_branches`	integer	Default=`8`
`non_leaf_node_maximum_branches`	integer	Default=`8`
`max_tree_depth`	integer	Default=`3`
`adjustment_weight_on_measurement_level`	integer	Default=`6`
`memory_allocation_mb`	number	Default=`512`
`delayed_split`	Boolean	Default=`True`
`fields_not_to_standardize`	[f1 ... fN]
`adaptive_feature_selection`	Boolean	Default=`True`
`featureMisPercent`	integer	Default=`70`
`coefRange`	number	Default=`0.05`
`percCasesSingleCategory`	integer	Default=`95`
`numCases`	integer	Default=`24`
`include_model_specifications`	Boolean	Default=`True`
`include_record_summary`	Boolean	Default=`True`
`include_field_transformations`	Boolean	Default=`True`
`excluded_inputs`	Boolean	Default=`True`
`evaluate_model_quality`	Boolean	Default=`True`
`show_feature_importance bar chart`	Boolean	Default=`True`
`show_feature_importance_ word_cloud`	Boolean	Default=`True`
`show_outlier_clusters_interactive_table_and_chart`	Boolean	Default=`True`
`show_outlier_clusters_pivot_table`	Boolean	Default=True
`across_cluster_feature_importance`	Boolean	Default=`True`
`across_cluster_profiles_pivot_table`	Boolean	Default=`True`
`withinprofiles`	Boolean	Default=`True`
`cluster_distances`	Boolean	Default=`True`
`cluster_label`	`String` `Number`
`label_prefix`	`String`
`evaluation_maxNum`	integer	The maximum number of outliers to display in the output. If there are more than twenty outlier clusters, a pivot table will be displayed instead.
`across_cluster_profiles_table_and_chart`	Boolean	Table and charts of feature importance and cluster centers for each input (field) used in the cluster solution. Selecting different rows in the table displays a different chart. For categorical fields, a bar chart is displayed. For continuous fields, a chart of means and standard deviations is displayed.