Other important modeling components

Clustering

Clustering is necessary for dimension analysis

For discrete dimensions, anything below X% (X=2) of primary numerator is aggregated into “other”
For continuous dimensions, cuts are made using weighted decision tree methodology, in order to create coherent buckets.

Read docs related to continuous dimension

Interdependence

In ‘Safe Mode’, most correlated dimensions are flagged. Interdependencies between dimensions are tested using Chi-Square and simple business calculation.

Combined dimension

Combined dimension is created by concatenating all clustered dimensions into one “Combined_Dimension”. It is then considered as all other dimensions and it’s contribution in the variation performance is assessed as it is for the other dimensions.

Significance

In ‘Safe Mode’, simple check of minimal volume (manually inputted) for given metric in Start and End You can also use Datama Impact to assess properly signifiance of variations

Scope

‘Out’ segment defined in column ‘Scope’ is excluded from analysis, and simply stacked on Start and End column in waterfall chart

Covariance

For waterfall analysis, covariance is distributed on each step. User should check that it remains reasonable (typically, <30%)

For Dimension analysis, covariance is not distributed on neither mix nor performance sizing. Hence user should be careful when looking at dimension impact