Column Statistics content model and Pairwise Statistics content model
The Column Statistics content model provides access to statistics that can be computed for each field (univariate statistics). The Pairwise Statistics content model provides access to statistics that can be computed between pairs of fields or values in a field.
Any of these statistics measures are possible:
Count
UniqueCount
ValidCount
Mean
Sum
Min
Max
Range
Variance
StandardDeviation
StandardErrorOfMean
Skewness
SkewnessStandardError
Kurtosis
KurtosisStandardError
Median
Mode
Pearson
Covariance
TTest
FTest
Some values are only appropriate from single column statistics while others are only appropriate for pairwise statistics.
Nodes that produce these are:
- Statistics node produces column statistics and can produce pairwise statistics when correlation fields are specified
- Data Audit node produces column and can produce pairwise statistics when an overlay field is specified.
- Means node produces pairwise statistics when comparing pairs of fields or comparing a field's values with other field summaries.
Which content models and statistics are available depends on both the particular node's capabilities and the settings within the node.
Method | Return types | Description |
---|---|---|
getAvailableStatistics() |
List<StatisticType> |
Returns the available statistics in this model. Not all fields necessarily have values for all statistics. |
getAvailableColumns() |
List<String> |
Returns the column names for which statistics were computed. |
getStatistic(String column, StatisticType statistic) |
Number |
Returns the statistic values associated with the column. |
reset() |
void |
Flushes any internal storage associated with this content model. |
Method | Return types | Description |
---|---|---|
getAvailableStatistics() |
List<StatisticType> |
Returns the available statistics in this model. Not all fields necessarily have values for all statistics. |
getAvailablePrimaryColumns() |
List<String> |
Returns the primary column names for which statistics were computed. |
getAvailablePrimaryValues() |
List<Object> |
Returns the values of the primary column for which statistics were computed. |
getAvailableSecondaryColumns() |
List<String> |
Returns the secondary column names for which statistics were computed. |
getStatistic(String primaryColumn, String secondaryColumn, StatisticType
statistic) |
Number |
Returns the statistic values associated with the columns. |
getStatistic(String primaryColumn, Object primaryValue, String secondaryColumn,
StatisticType statistic) |
Number |
Returns the statistic values associated with the primary column value and the secondary column. |
reset() |
void |
Flushes any internal storage associated with this content model. |
Nodes and outputs
This table lists nodes that build outputs that include this type of content model.
Node name | Output name | Container ID | Notes |
---|---|---|---|
"means"
(Means node) |
"means" |
"columnStatistics" |
|
"means"
(Means node) |
"means" |
"pairwiseStatistics" |
|
"dataaudit"
(Data Audit node) |
"means" |
"columnStatistics" |
|
"statistics"
(Statistics node) |
"statistics" |
"columnStatistics" |
Only generated when specific fields are examined. |
"statistics"
(Statistics node) |
"statistics" |
"pairwiseStatistics" |
Only generated when fields are correlated. |
Example script
from modeler.api import StatisticType
stream = modeler.script.stream()
# Set up the input data
varfile = stream.createAt("variablefile", "File", 96, 96)
varfile.setPropertyValue("full_filename", "$CLEO/DEMOS/DRUG1n")
# Now create the statistics node. This can produce both
# column statistics and pairwise statistics
statisticsnode = stream.createAt("statistics", "Stats", 192, 96)
statisticsnode.setPropertyValue("examine", ["Age", "Na", "K"])
statisticsnode.setPropertyValue("correlate", ["Age", "Na", "K"])
stream.link(varfile, statisticsnode)
results = []
statisticsnode.run(results)
statsoutput = results[0]
statscm = statsoutput.getContentModel("columnStatistics")
if (statscm != None):
cols = statscm.getAvailableColumns()
stats = statscm.getAvailableStatistics()
print "Column stats:", cols[0], str(stats[0]), " = ", statscm.getStatistic(cols[0], stats[0])
statscm = statsoutput.getContentModel("pairwiseStatistics")
if (statscm != None):
pcols = statscm.getAvailablePrimaryColumns()
scols = statscm.getAvailableSecondaryColumns()
stats = statscm.getAvailableStatistics()
corr = statscm.getStatistic(pcols[0], scols[0], StatisticType.Pearson)
print "Pairwise stats:", pcols[0], scols[0], " Pearson = ", corr