Prep - Refine data

This section helps you to refine your data in Datama Prep

Find here how you need to format your dataset for Datama solutions.

Here is the available actions you can do so far in Datama Prep to refine your dataset:


Clean

Clean action allows you to edit order and types of your columns.

Why ?

Because you might need to append later on this dataset with another one, and you will have to coordinate both of their characteristics (such as the name of the metrics and dimensions, their format etc) so that they match in your dataflow afterwards. This is necessary to create a functionnal and interesting market equation, which is the foundation of Datama’s solutions.

How to use it ?

You can adapt your dataset and transform it in the right format so that its lines and columns concur with other datasets.

In the action menu, you can select the “Clean” option to use it.

Once the block is created you will be able to see all of the existing columns generated by the previous action.

Configuration explain

When you click on the Clean action block, you’ll be able to see the configuration menu on the sidebar as a list of columns corresponding to the previous action on a dataset. Thus, you’ll be able to:

You’ll also be able to edit a column type by using the select contain on items. If the conversion is not possible, for example converting a product name such as “hair dryer” to a date type, the clean action will show a warning.

In addition, Datama prep gives you the possibility to create Calculated Field using the Create a calculated field button.

This a powerfull tool to create columns using conditionals or aggregation functions without the need to manually edit your dataset.

Once you’re done make sure to click on the “Ok” button to validate the creation of the calculated column.


Append

Append action allow you to merge two source flow in one.

Why ?

Datama Prep allow you to import data from multiple source to a single output ready for Datama Core’s solutions. In order to join our multiple source we need an Append action.

How to use it ?

For instance, if you have input a Google sheet and a Google trend file, you might want to make a fusion of both of their data.

To do so, in the action menu of the block where you want to merge datasets select “Append”. It will automatically merge datasets.

On the sidebar, you can also notice the summary of each of the dataset’s columns.

If there are some columns shared between both dataset, it will merge them instead of duplicate them (careful it’s case sensitive). All non-shared columns of each dataset will be seed with NULL value.

This is not a JOIN, it acts like puting one dataset below the other.


Filter

Filter action allow you to filter data in columns.

Why ?

Let’s say you want , from your dataset, to fetch products where the “product name” contains ‘Hair’ and the “price” is above $20. The filter action helps you do such tasks so you don’t need to do them manualy.

How to use it ?

By using the button “new filter” you must first choose the desired column.

Then make sure to choose the appropriate type of filter. For text base data the currently availbalbe filters are:

As for numbers and dates:

Finally you have to add one or more elements to be used to apply the filter on the column.

A filter action holds a single filter condition on each field and works by applying a logical OR to the column data. It’s possible to chain multiple filters which will be the equivalent of using a logical AND on each simillar column.

Once you have some filters applied you can see them by their column name. By clicking on it you open it and see a description of the filter.

Sometimes you can see the following error message “It seems that there are too many unique values” it means that the number of unique values in the dataset is higher than the limit set by Datama.

This was implemented to avoid application performance problems.

It has the effect of not offering the unique values in the filter value selector.


Pivot & Unpivot

You might want to pivot and unpivot your dataset in Datama PREP.

What is that for?

Pivot operations

A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics.

UnPivot example

We have a column gender containing only two unique values, {males, females}. We want to split sessions according to those values. To do so, we will unpivot the column gender using sessions as an aggregates column.

For example, this unpivot configuration creates two columns {Sessions.female and Sessions.males} containing the same values but in two distinct columns.

Congrats! Your dataset is ready to be uploaded in Datama’s solution!

To see how to save and share your dataflow, click here