Posted by Eric
July 3, 2019
One of the most common types of data we see at Unilytics is survey data. We’ve worked with surveys ranging from hundreds of respondents and dozens of questions to large surveys with tens of thousands of respondents and hundreds of questions. While there are some idiosyncrasies between small and large surveys, there are far more commonalities.
Most people believe that survey data is “well-structured”, but this is virtually never the case. In fact, with all the projects we’ve worked on, the survey data has never been properly structured for visualization purposes. The average analyst assumes that because each row is a survey respondent and each column is a question that the data is well-structured and ready for visualizing, but this is not the case.
Leaving the data in the “normal” survey data format (i.e., respondent-per-row and question-per-column) will lead to significant performance issues and a surprising inability to create basic visuals like stacked bar charts, time series/trend lines, and combine related questions (e.g., Question 1: part 1, part 2, optional part 3, alternate part 4, etc.) into a single chart/visual.
In fact, if you leave the data in the normal survey format, you will end up writing some really awful, complex, slow formulas. We regularly hear from clients saying, “Why are these survey formulas so complicated? Should it take this much work to get what I need?”
Yet another complicating issue is how to handle the freeform or “open ended” survey questions… for example, “what are some things we could do to improve our service?” Understanding those freeform answers is not easy. Having a human go through all that text to understand the “gist” of respondent perception is extremely time consuming and prone to error.
And for the cherry on top of survey complexity, if you are working with longitudinal surveys there are considerations that need to be accounted for when storing time series data and as survey questions are ‘tweaked’ or refined from year to year and change the survey question content. Multi-year, or multi-time period surveys, need to be able to “stitch” back together over historical survey editions/versions. Without this you lose access to quality time series data and end up with a lot of extra “footnotes” in the survey when question series break.
All this complexity is because the optimal structure to visualize data is “tall” rather than “wide”. Data visualization tools are particularly susceptible to data that contains lots of columns. If you have to choose between data that has 1,000 rows and 300 columns or 300,000 rows and 3 columns, you’ll find better performance from the latter.
The next issue is that after restructuring the data it ends up being difficult to group questions together, so another thing required is something called a survey design document, also known as a data dictionary, or survey map. This document helps you properly transform the survey data into the right structure for visualizing while still able to organize it the way you designed the survey.
There are a few additional challenges for handling quantitative variables from qualitative variables. They both need to be properly set up for you to achieve the cohort analysis that you want with survey data. For example, “For Gen X respondents, what was the distribution of responses for question #14?” Properly setting up your qualitative (i.e., generation) variable versus your quantitative (i.e., question #14) lets you filter the data while still achieving the most common visuals, such as NPS, Likert, percent-to-whole, etc.
Referring back to the issue of freeform text, there is an entire industry build around understanding the content of text. Understanding how “natural language processing” (NLP) can extract themes from user text entries is a huge benefit to analytical productivity. A wide variety of capabilities exist to parse the open-ended questions and generate analytically useful results that can be visualized.
As you can see, there’s a lot more to survey data than just attaching your dashboard to the survey output. At Unilytics we have worked through much of this pain over the years and now offer services to help you achieve the optimal results for your survey data. We’ve refined these services into a specific training course on structuring and visualizing survey data. Contact us to learn more.
Alternatively, our consultants are experts at survey data and can get you jump started on the analytical results you need. We can provide a lot of assistance in the up-front survey design to ensure your questions will yield your desired results during the post-survey analytics phase. We can also assist in the restructuring processes, NLP, visualization design and construction, and application of common data science (e.g., regression, k-means clustering, distribution/variance, kurtosis/skewness, etc.) principles.
With our services you will have statistically significant survey visuals that optimize the user experience for your end users and handle the inevitable complications that arise in analyzing survey data.