Clustering Settings
This section helps you understand the different clustering methods
Within Datama, it is possible to cluster your data directly in Prep (see here) or in the settings of the app. To find it, go to the settings in the top right hand corner of each solution:
Then in Dimensions section you will find a menu Clustering settings:
As you can see above, there is an input Choose clustering method. Here, you have 3 options:
- Auto: clustering will be done using the Other (<X%) method (for discrete dimensions) or the Regression method (for continuous dimensions).
- Manual: allows you to select the clustering methods used for computation.
- None: your data will not be aggregated at all.
When Choose clustering method is on Manual, different methods and settings for clustering are available, depending on the type of your data.
Clustering Methods
Basically, there are 3 categories of clustering:
For discrete dimensions
Other (< X%)
With this method, any segments representing less than X% of the KPI numerator will be grouped into a ‘Other’ cluster. You can adjust the Aggregation level of the clustering.
Here is an example of this method with an aggregation level of 2:
Long tail
With this method, segments will be clustered into an ‘Other’ group such that the total sum of the segments in this bucket represents less than X% of the KPI numerator. Similarly to the
Here is an example of this method with an aggregation level of 10:
Binning based on rank
With this method, segments will be first ranked by value on the KPI numerator, and then divided into X bins. You can adjust the number of bins created in the Number of bins input.
Here is an example of this method with 2 bins:
For continuous dimensions
Regression
This method cluster continuous dimensions using a recursive partioning regression tree. You can adjust the tree’s depth with the Maximum depth for continuous grouping input, which defines the granularity of the clustering. If you are not familiar with this concept, here is a brief explanation :
The tree’s depth controls how many times the data can be split into smaller groups. A shallow tree (low depth) creates fewer, larger groups, giving a more general overview of the data. A deeper tree (higher depth) makes more splits, creating smaller, more detailed groups. However, if the tree is too deep, it might overfit, meaning it could capture noise or random patterns in the data instead of the important trends.
Here is an example of this method:
Bins
With this method, segments will be divided into X bins. You can adjust the number of bins created in the Number of bins input.
Here is an example of this method with 2 bins:
For dates
Regression & Bins
Those two methods are available both for continuous dimensions and dates. Here is an example with the Regression method:
Here is an example with the Bins method:
Truncation by Period
This method is only available for dates, truncating them by time periods such as day, week, month, or year. You can select the time period in the Period for clustering input.
Here is an example of this method used on “Month”: