Posted by Peder Enhorning

June 6, 2017

10:16 pm

Leave Reply

Why a Virtual Data Warehouse?

At Unilytics, we’re all about creating visualizations that deliver better insights. But creating meaningful visual dashboards is often really difficult. It’s generally much more challenging than clients think.

That’s mostly because it’s not always easy to access corporate data and make sense of it. Disconnected customer and product information is often stored in a multitude of silos including financial systems, CRM, transactional sources, and various Excel files created by individuals or regional departments.

Accessing and then integrating data so it makes sense is the greatest hurdle to producing effective reporting. Typically, accessing and integrating data is done by building a data warehouse to create a central repository fine-tuned for data visualization and reporting. While the end solution often works well, it means moving all data from disparate systems into that central location. It can be expensive, time consuming, and inflexible to replicate data and maintain these hardwire links between the various applications and data sources.

Also, many organisations have information in places that they are either unable (perhaps for governance reasons or because the data is incompatible) or unwilling (either for security or cost reasons) to consolidate into a single data repository. Finally, many companies need to incorporate external data, either from the cloud or partners, into their own environment.

What if instead you could mimic the solution without doing all the work? Rather than copying and moving all your data into a new repository, data virtualization creates lines of communication to all that data but keeps it in it’s original form and location. This eliminates the movement of data, greatly speeds up the process, and allows you to interrogate live data. It also means you can include data that for security or other restrictive reasons can’t be moved to a data warehouse. Some effort is still needed, but often much less. It’s not the right solution for all situations but can reduce cost and time to market for others.

In 2017, Gartner stated that “By 2018, organizations with data virtualization capabilities will spend 40% less on building & managing data integration processes for connecting distributed data assets”.

Data Warehouse Construction

Building a data warehouse relies on ETL (Extract, Transform, and Load). ETL is designed to process very large amounts of data as it copies complete data sets from the source systems; translates and often cleanses the data to improve its quality; and loads the resultant data set into a data warehouse.

ETL and the resulting data warehouse are best suited for applications that require access to the complete consolidated data set and are not in real-time. Some examples are historical trend analysis and data mining operations, since these applications need to process complete data sets to perform their analysis.

ETL is great for very structured data sets such as relational databases. However, ETL does not handle semi-structured or unstructured data very well, if at all.

Data Virtualization

Data Virtualization makes all data, regardless of where it’s located and regardless of what format it’s in, look as if it is one place and in a consistent format. It provides access to data directly from one or more disparate data sources, without physically moving the data and provides it in such a manner that the technical aspects of location, structure, and access language are transparent to the analyst.

These data sources can be anything including relational data, data in Hadoop, XML and JSON documents, flat file data, and spreadsheet data. It can even be files that you can’t move over to a data warehouse such as data from a partner, a website, or from the cloud.

Some of the advantages we have seen are:

  • Reduces number of database licenses by eliminating data replication
  • Reduces end-user application licenses by accessing data via data virtualization platform
  • Gains agility to easily accommodate new requirements with changing business needs
  • Accesses information directly from the source in real-time
  • Quickly validates new business models using an agile approach to data integration
  • Reduces IT operational costs
  • Increases end-user productivity by empowering users with better information access
  • Achieves time and cost savings over traditional Integration methods. Projects can be completed in 4-6 weeks; ROI can be realized in less than 6 months

Data virtualization isn’t suitable for all applications, and ETL is still necessary for some systems. Data virtualization complements ETL, but it doesn’t replace it. That’s because data virtualization doesn’t work well when there needs to be significant transformations or complex business logic from the source before it can be used by the business user. However, data virtualization provides another option when looking for better reporting and to explore both data visualization and dashboard creation.

Contact our Data Management Experts to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Explore Posts By Category

Archives

Archives

Want to know more?

Contact us