Use this node to create a temporal causal model (TCM).
Temporal causal modeling attempts to discover key causal relationships in time series data. In temporal causal modeling, you specify a set of target series and a set of candidate inputs to those targets. The procedure then builds an autoregressive time series model for each target and includes only those inputs that have a causal relationship with the target. This approach differs from traditional time series modeling where you must explicitly specify the predictors for a target series. Since temporal causal modeling typically involves building models for multiple related time series, the result is referred to as a model system.
In the context of temporal causal modeling, the term causal refers to Granger causality. A time series X is said to "Granger cause" another time series Y if regressing for Y in terms of past values of both X and Y results in a better model for Y than regressing only on past values of Y.
Examples
Business decision makers can use temporal causal modeling to uncover causal relationships within a large set of time-based metrics that describe the business. The analysis might reveal a few controllable inputs, which have the largest impact on key performance indicators.
Managers of large IT systems can use temporal causal modeling to detect anomalies in a large set of interrelated operational metrics. The causal model then allows going beyond anomaly detection and discovering the most likely root causes of the anomalies.
Field requirements
There must be at least one target. By default, fields with a predefined role of
None
are not used.
Data structure
Temporal causal modeling supports two types of data structures:
- Column-based data
- For column-based data, each time series field contains the data for a single time series. This structure is the traditional structure of time series data, as used by the Time Series Modeler.
- Multidimensional data
- For multidimensional data, each time series field contains the data for multiple time series.
Separate time series, within a particular field, are then identified by a set of values of
categorical fields referred to as dimension fields. For example, sales data for two
different sales channels (retail and web) might be stored in a single
sales
field. A dimension field namedchannel
, with valuesretail
andweb
, identifies the records that are associated with each of the two sales channels.
m>(L + KL + 1)
where m
is the number of data
points, L
is the number of lags, and K
is the number of
predictors. Make sure your data set is big enough so that the number of data points
(m
) satisfies the condition.