Building DWH Solution With Analytical Database

Our client is a leading commerce content platform focusing on the delivery of consumer rewards, cashback, savings opportunities, and shopping content. By assisting millions of consumers to make smarter purchasing decisions, the client’s products enable advertisers to reach high-intent users in 20+ markets around the world.


The existing data warehouse (DWH) of our client is built in a way that transforms the regular page refresh into a rather time-consuming process. The end-user lacks or doesn’t have enough accurate data and is missing out on seeing the full performance of the offers running on the website.

Primarily, our team was tasked to understand the business requirements for the client’s emerging needs, since the number of domains and countries the client was operating in, was increasing very fast. In the meantime, the requests for new reporting and new features on their websites based on the data they received, were also acquired to be done efficiently and with high accuracy.

More importantly, the accuracy of the provided reports was put into question and users didn’t trust the data as before. In addition, using the provided data to do some in-depth analysis, generating some sort of advanced reporting or personalizing offers for users was now very hard to do and at times even impossible.

At this point, Microsoft Excel was used as the main tool for displaying the reports.


Having in mind all of the above, we began identifying all sources used for the reporting and tried to develop a strategy on how all reports would fit into the global reporting. We continued our process of work by redoing the ETL (Extract, Transform, Load) process used for report generation with the intention of obtaining and providing refreshed reports every hour rather than once a day.

Along the way, we encountered another bigger issue. The transaction data was fetched from the affiliate networks with PHP, Java and in most cases manually.

Taking this into consideration, we then unified and automated the process and every hour, using the Pentaho Data Integration, we were able to fetch data from around 50 different API’s.

MySQL DB was replaced with AWS Redshift as a result of the amount of data the client received. The data exceeded 2.5 TB which only resulted in delaying the time of doing analysis or even simple changes to the DB tables. In this case, using Redshift we gained stability, scalability, and speed in the overall ETL processes and also faster development of other sub-projects related to using data from the DWH.

We also introduced the Data Lake concept using AWS S3 where we structurally store the data from all the various sources such as:

  • Affiliate Networks,
  • Google Analytics,
  • Google Search Console,
  • Google Adwords,
  • Salesforce,
  • In-house Tracking,
  • Serpbook etc. in its raw format.

This approach proved to be useful for tracking changes in history, doing ad-hoc analysis or simply reloading some information that at first was not necessary.

We also used Jenkins as a scheduling tool with the purpose of controlling when and how the ETL process works, while ensuring that all dependencies are taken into consideration when we run the process.

Result provided the client with a stable ETL process that runs hourly and also provides data for newly developed dashboards and reports.

The solution is scalable and we are constantly introducing new sources of data and developing new features such as the automatic send-out of personalized newsletter offers based on the user’s search history and preferences or giving away gift cards to users who purchased something from the retailer by combining data from various sources.

Using all of the products we introduced and Tableau as a visualization tool, the client now has insight into how their business is running and enabling them to do data-driven decisions on a daily basis.

Tanja Zlatanovska.png
Tanja Zlatanovska
Nov 24, 2020
You may also like

Let's talk

Have an idea? Tell us more about your project.

By submitting this, you agree to our Terms and Conditions & Privacy Policy