Posted by Peder Enhorning
June 6, 2017
At Unilytics, we’re all about creating visualizations that deliver better insights. But creating meaningful visual dashboards is often really difficult. It’s generally much more challenging than clients think.
That’s mostly because it’s not always easy to access corporate data and make sense of it. Disconnected customer and product information is often stored in a multitude of silos including financial systems, CRM, transactional sources, and various Excel files created by individuals or regional departments.
Accessing and then integrating data so it makes sense is the greatest hurdle to producing effective reporting. Typically, accessing and integrating data is done by building a data warehouse to create a central repository fine-tuned for data visualization and reporting. While the end solution often works well, it means moving all data from disparate systems into that central location. It can be expensive, time consuming, and inflexible to replicate data and maintain these hardwire links between the various applications and data sources.
Also, many organisations have information in places that they are either unable (perhaps for governance reasons or because the data is incompatible) or unwilling (either for security or cost reasons) to consolidate into a single data repository. Finally, many companies need to incorporate external data, either from the cloud or partners, into their own environment.
What if instead you could mimic the solution without doing all the work? Rather than copying and moving all your data into a new repository, data virtualization creates lines of communication to all that data but keeps it in it’s original form and location. This eliminates the movement of data, greatly speeds up the process, and allows you to interrogate live data. It also means you can include data that for security or other restrictive reasons can’t be moved to a data warehouse. Some effort is still needed, but often much less. It’s not the right solution for all situations but can reduce cost and time to market for others.
In 2017, Gartner stated that “By 2018, organizations with data virtualization capabilities will spend 40% less on building & managing data integration processes for connecting distributed data assets”.
Building a data warehouse relies on ETL (Extract, Transform, and Load). ETL is designed to process very large amounts of data as it copies complete data sets from the source systems; translates and often cleanses the data to improve its quality; and loads the resultant data set into a data warehouse.
ETL and the resulting data warehouse are best suited for applications that require access to the complete consolidated data set and are not in real-time. Some examples are historical trend analysis and data mining operations, since these applications need to process complete data sets to perform their analysis.
ETL is great for very structured data sets such as relational databases. However, ETL does not handle semi-structured or unstructured data very well, if at all.
Data Virtualization makes all data, regardless of where it’s located and regardless of what format it’s in, look as if it is one place and in a consistent format. It provides access to data directly from one or more disparate data sources, without physically moving the data and provides it in such a manner that the technical aspects of location, structure, and access language are transparent to the analyst.
These data sources can be anything including relational data, data in Hadoop, XML and JSON documents, flat file data, and spreadsheet data. It can even be files that you can’t move over to a data warehouse such as data from a partner, a website, or from the cloud.
Some of the advantages we have seen are:
Data virtualization isn’t suitable for all applications, and ETL is still necessary for some systems. Data virtualization complements ETL, but it doesn’t replace it. That’s because data virtualization doesn’t work well when there needs to be significant transformations or complex business logic from the source before it can be used by the business user. However, data virtualization provides another option when looking for better reporting and to explore both data visualization and dashboard creation.