Posted by Peder Enhorning
January 29, 2019
Your data needs to be easily accessible and stored in a way that leads to insightful analysis. That means using a data architecture to create that perfect data warehouse – an amalgamation of disparate data sources optimized and organized into one. This will enable informed decision-making. And will make your job more meaningful.
Bringing data into the data warehouse and organizing it using ETL (Extract, Transform, Load) functions is critical in preparing the data for reporting and data visualization. It’s imperative that you separate the ETL for data import and that for data integration into two distinct operations, or tiers, thus making them independent. This creates a solution that is flexible and scalable to support inevitable reporting enhancements.
The first tier copies data from the original source systems into a special purpose database designed to bring source data together in a single location. This is called the “landing database” and the data processing that occurs between the source system and the landing database is called “landing ETL”.
Once the data consolidation is complete, the second tier introduces an additional set of transformation scripts that apply business rules to the landed data and makes it suitable for end-user consumption. This is called “integration ETL”. This makes the data ideally suited for reporting and data visualization and may be of a variety of possible structures, including an enterprise data warehouse (EDW), analytical data store, etc. The integration ETL will be designed to properly transform the data into the relevant data warehouse architecture. The third tier allows for data visualization to be done autonomously from the underlying data warehouse.
The purpose of the 3-tier approach is to abstract data consolidation from the application of the business rules to the data. This ensures that any changes in source systems, or changes to business rules will not incur excessive effort to integrate. Abstracting the landing and integration layers insulates each layer from changes in the other layers. This greatly reduces the maintenance required to keep a data warehouse up to date with systems or procedural changes within the organization.
Besides minimizing upkeep, another benefit of the three-tier data architecture is that different business rules can be applied to the same landing data for different end uses. For example, one set of integration ETL can summarize source data from raw transactions up to organizational summaries, while another set of integration ETL can populate a database for data visualization and analytics, and yet another set of ETL can populate a different database for integration with other systems. In all three examples, only one set of landing ETL is required, despite the varying data repositories the data ends up in.
Let us know your experiences and what challenges you have faced.