What are the benefits of using a data warehouse?
Most businesses have 5+ tools and platforms that are constantly producing and storing user data. This includes their analytics platforms, CRM, CMS, ad platforms, social media, marketing automation tools, A/B testing platforms, and more.
Having this data spread across multiple platforms will, sooner or later, cause problems like:
• Tools reporting different values for similar metrics (number of sessions, conversion rate, etc.)
• Troubles compiling data from multiple sources
• Tools aggregating data in a way that doesn’t make sense for your use case (i.e. session vs. user)
If your teams are reporting on similar metrics with different data, it’s a recipe for chaos. This can lead to people questioning your efforts. In many cases, the problem isn’t that the numbers are wrong, it’s that the numbers are calculated differently.
If any of those problems sound familiar, then you need a data warehouse; a central location where data from various tools and other sources is pulled together, in a raw format. This means that you can define how certain metrics are calculated, join data from different sources and find answers to important business questions.
What data should you store in a data warehouse?
The more the better.
Since the cost of storing data in the cloud has become relatively cheap, there is no reason why you shouldn’t store all the data you possibly can in your data warehouse.
Examples of the kind of data you might store in a data warehouse;
• Google Analytics (raw, hit-level, unsampled)
• Google Ads and other ad platforms (campaigns, cost, impressions, etc.)
• Back-end data (i.e. Shopify orders, leads, order data, customer data)
• Marketing platforms (Mailchimp, Klaviyo, Hubspot, etc)
• Stock/shipping/warehouse data
What's the difference between a data warehouse and data mining?
A data warehouse is a system used to store data. The data can then be queried using a query language like SQL.
Data mining, on the other hand, is a process. This can be part of the process of getting data into a data warehouse, but more commonly, it describes the process of accessing data already stored in a data warehouse to discover patterns, correlations, and find actionable insights.
What are the key elements required for a data warehouse?
To have a fully functional data warehouse you need to have at least the following elements;
• Data pipeline or an ETL process to get data into a data warehouse
• Data warehousing software (BigQuery, Snowflake, etc.)
• Schema of how the data is being transferred, modified, and stored
• A business intelligence tool for accessing data (Looker, Tableau, Power BI)
What is the purpose of a data warehouse?
Joining data from various sources
When sending data into a data warehouse, there should always be an ID/key that is also found in some other source/table. This can be a user ID, campaign ID, order ID, or something else. This then enables you to answer questions like;
• How many leads from [campaign X, experiment Y, source Z] converted into paying customers, and what is their LTV?
• What is the churn rate from a specific traffic source or user segment? How does it compare with other traffic sources or segments?
• What are all of the touchpoints for a specific user or a (high-value) user segment? (ad, email, website, support chat, phone call, etc.)
You can define your metrics
In Google Analytics (and many other tools) you will start collecting data after you define a metric. If you change a metric, you will start collecting from that moment on with no view on the historical data.
If you have all the data available in your data warehouse, you can define your metrics, KPIs, goals, and funnels on the go. Modify them to your liking and immediately create a report based on your historical data. This helps you to answer questions like;
• What is the conversion rate of each step in an alternative funnel that I didn’t define before?
• What is the conversion rate for each traffic channel for specific user action, that I didn’t define as a goal?
Gain access to unsampled and unaggregated data
Google Analytics and many other tools aggregate data based on how they see fit. This includes how they define a conversion, conversion rate, session, bounce rate, etc. In many cases, this is not how the business itself would define them. I.e. the use of session vs. user for calculating metrics.
• Websites that receive lots of traffic are also affected by sampling.
• This means the reports you are looking at aren't based on 100% of the sessions (collected data), it may be an estimate based on say 5% of the data.
• With a data warehouse, there is no sampling. This helps you answer questions like;
Advanced dashboards and automated reports
With data warehousing, you can define metrics in your own way. This means that you can also use those metrics to build interactive dashboards, automated reports, and other types of visualizations/presentations based on your data. This allows you to build reports/dashboards for;
• Monitoring your A/B testing program, including data from Google Analytics, testing tools, and the back-end data.
• General business performance, including data from various sources, funnels with touchpoints in different platforms, etc.
• Specialized reports such as LTV, churn, user journey, etc.
Monitoring, machine learning, and other clever use cases
With a data warehouse, you have access to raw hit-level data, often in (near) real-time. This is the kind of data you can use for monitoring systems or building machine learning for things like marketing automation and product recommendations.
Some ideas;
• Use machine learning to detect the traits of high LTV customers. Show ads and send customized marketing messages to this audience.
• Configure a real-time monitoring system to detect changes in any critical metric. (i.e. alert you if the conversion rate for a specific device goes down)
• Use data from GA and other sources to show/send more accurate product recommendations.