🏆 Pro tip: use 'Paste special' to paste 'Values Only' in the Hotjar analysis template, so no formulas or formatting are copied over.
How to analyze open-ended questions in 5 steps [template included]
Open-ended questions are great for getting authentic feedback because they give people a chance to describe what they’re experiencing in their own voice. Analyzing such survey questions yourself is an excellent opportunity to empathize with your audience, gather essential insights, and make the right decisions.
But you may be wondering...
How do you efficiently analyze more than 100 replies? Or even 1,000?
Here’s a system we use at Hotjar to categorize and visually represent large volumes of qualitative data—and it’s easier than you might think! You’ll have to work with the technique a bit before you become comfortable with it, but once you get it, you’ll be sorting through mountains of qualitative data in no time.
What you’ll need:
- Working knowledge of spreadsheets (Google Sheets or Excel)
- A quiet space with some uninterrupted focus time
- Hotjar’s open-ended question analysis template
To help you learn this technique, we created a data sample that you can download and use to follow along.
Now let’s begin…
Table of contents
Step 1: get your data into the template
1) Export the data from your survey or poll into a .CSV or .XLS file.
2) Copy the data from your .CSV or .XLS file and paste it into the sheet ‘CSV Export’ of the template.
3) Copy the column from the ‘CSV Export’ sheet containing the open-ended question you want to analyze first and paste it into the ‘Question 1’ sheet, in the cell marked with < Paste answers to first open-ended question here >.
4) Choose wrap text for the entire column, so the data fits the column width and is easier for you to read later on.
Step 2: identify response categories
A response category is a set of replies that can be grouped because they are part of the same theme, even if they’re worded differently.
In the sample dataset we use for this tutorial, we asked Hotjar customers to explain how their employer measures their performance (e.g., revenue, conversions, traffic). In theory, you could go through every answer to identify your response categories one-by-one, but that wouldn’t be very efficient. Instead, we’re going to use a series of techniques that help you identify the broad categories.
A) Use a text analyzer: text analyzers take your data and analyze it for the most commonly used words in your text, which helps you identify broad categories of responses.
🏆 Pro tip: Textalyser is a simple, free resource that does this well.
If you do this with the sample data we’ve provided above, you’ll find that ‘sales,’ ‘conversion,’ and ‘traffic’ are some of the most commonly used words in the data set:
As such, they represent some of the most popular replies to the question we asked. They don’t represent all the answers, of course, but they’re a good place to start when building the list of response categories.
Add each category to the top of separate a separate column (replacing the text that reads, 'Response Category 01,' 'Response Category 02,' etc.):
Note: some of the popular words in our text analyzer mean the same thing (e.g., 'sales' and 'revenue'), so you’ll want to create a single category for those responses called 'Sales/Revenue.' Other popular words will NOT become categories because, as stand-alone words, they tell us nothing useful (e.g., 'our,' 'rate').
B) Sort your responses alphabetically: when you sort alphabetically, you’ll notice that specific patterns emerge, and you can create more categories based on the trends you spot.
In our sample data, every sentence beginning with the word 'Revenue' gets grouped when you sort alphabetically. Of course, we already have a category for 'Sales/Revenue,' so there’s no need to add that category in this case—but grouping the data alphabetically will allow groups to stand out.
Alphabetical sorting will also draw your attention to certain stand-alone response. For example, someone replied 'Huh?' and another person told us they didn’t understand the question. This information allows us to add a new category called 'Didn’t understand the question.'
Scan the alphabetically sorted responses for other categories, such as 'It’s not measured,' 'Traffic,' 'Conversions,' etc. Be on the lookout for synonyms, but don’t worry if you create a few redundant categories for now. You will combine the categories that mean the same thing at the end.
Step 3: record the individual responses
1) Place a '1' in each cell where a response (the row) matches a category (the column) to identify a positive response in each category. Add categories as you go.
For example, if you sorted our sample data alphabetically, you’ll find that the response in Row 6 reads, 'Huh?' If you added 'Did not understand the question' to Column E (as we did in the screenshot), then you’ll place a '1' in E36.
Note: In our example, many respondents indicate that their performance was measured by multiple factors (e.g., lead gen + sales + customer satisfaction). Be sure to place a '1' in each category. In other words, the row for that single answer, 'Revenue, then conversion rate, then traffic.' will record three different positive responses.
When you input your first '1,' the cell in Row 3 (below the category) will change to indicate the number of positive responses in that category. Row 4 will change from a '#DIV/0' error to the percentage of responses that fall into each category.
2) Use the 'Find' feature to search for words related to each category: begin with the first category (in our example, that’s 'sales') and search the data column for any response that mentions 'sales.' Read the entire response to ensure it fits the category you searched for, then place a '1' in the appropriate column for that response.
3) Fill in the gaps: read each row that hasn’t been categorized and place a '1' under the appropriate category, creating new categories as necessary. As you create new categories, search your data for those terms to quickly find similar responses.
⚠️ Important: when adding a new category as you go through the responses, make sure to retroactively check previous answers that might fit in this new category.
Step 4: organize your categories
1) Group your data: you will almost certainly find categories that should be grouped but ended up in different categories because respondents used different words to describe the same concept. In our sample data, we found the terms 'Lead Gen' and 'Form Submissions,' and these belong in the same category.
Drag these columns next to each other, and apply a color (any color) to the group of columns you plan to merge—this marks them as a group so you can return to them in a bit when it’s time to combine them. Repeat this step for each set of categories you plan to join.
Add a new column to the left-hand side of each group. For example, with 'Lead Gen' and 'Form Submissions,' you’ll create a new category called 'Lead Gen / Form Submissions,' add up the Row 3 totals for the two old categories, and enter the new total under the new group. Copy and paste the percentage formula from any Row 4 cell, then delete the old categories.
⚠️ Important: when merging multiple categories, make sure to re-add the '1s' under the newly merged category, or you run the risk of losing your data.
Repeat this step for every group you plan to merge.
2) Arrange your categories from large to small: arrange your categories in descending order from left to right. For those that only contribute to a small percentage of the total (2% or less), use the grouping method above to merge them into one category called 'Others,' which you’ll leave on the far right.
Step 5: represent your data visually
1) Prep your data to create a bar chart. First, select and copy the top three rows of your spreadsheet (those that make up the 'Response Categories,' 'Total respondents who answered X,' and '% respondents who answered X').
Paste them into the ‘Graph Question 1’ sheet using the 'Paste special' feature to paste only the values (so the formulas don’t copy over).
Select and copy the table you just pasted, and choose 'Paste special' again—this time using 'Paste transposed' to invert the rows and columns (this makes your data more chart-friendly).
This is what you should see:
2) Create your chart: insert your chart, selecting the percentage column as your 'Series' and the categories as your 'X-axis.' Resize the chart however you see fit.
And there you have it—a visual representation of your data! Feel free to experiment with different formats if you’re putting the chart into a formal presentation.
Analyzing open-ended questions efficiently and empathizing with your audience take some practice, but the more you do it, the easier it becomes. Your mind will begin to recognize patterns the more you practice this technique, so don’t be afraid to dive into it.
Hotjar's open-ended question analysis template
Want to efficiently analyze a large volume of qualitative data? Get our Google Sheets/Excel template to get started.Get the template