Think of data codification as the bridge that takes you from raw data to actionable insights. This is usually applicable when we have to process open-ended data, usually in the form of: polls, surveys, reviews, NPS data, chat logs, and lots more…
We know that data collection without analyses is fruitless, but that doesn’t mean we should draw conclusions from mere passing observations. Codifying data using tried and tested methodologies helps you find patterns and connections that inform your hypotheses and drive better user experience. By finding trends and correlations in the way your users interface with your site, you’ll not only understand them so much better, but you’ll also get better at predicting their needs.
Check out this 5 minute mental model video on the topic. And if you want to geek-out on this topic, the bible for this activity is “The Coding Manual for Qualitative Researchers” by Johnny Saldaña.
Getting started with data coding
Coding is all about pattern recognition and information processing. A code is a word or short phrase that captures the essence or meaning of something. In our case, that “something” is an open-ended response to a survey question. When we code, we’re classifying each piece of raw data - meaning each survey question response - to help us search for quantifiable information.
As a rule of thumb, you want approximately 200-300 open-ended responses to be confident in our strength of signal within the analysis (have a 95% confidence that the answer strength of signal is valid within ~5-10% of what is reported - here is an article on stats for reference). BUT!!, and there’s always a but, the more niche your product or service is, the fewer responses you typically need to grasp the mindset of your users.
There’s no need to capture more than 400 responses. Even for brands with mass appeal, there’s often a diminishing return after you pass that ceiling. A non-response but interaction, and No’s’ generally don’t count as a response.
Creating categories
After you have the data you need to start reading through and finding general groupings usually based on similarity of responses (more on other types of patterns in the next section).
To illustrate how we codify data, we’ll work with survey responses from users on a jewelry site.
Creating a coding report
Pictured above is a table of our responses and our categories. Set up all responses as rows, and all categories/classifications as columns. As you read through each survey response, add a “1” in each category that the response falls into. Responses can easily fall into multiple categories.
There are handy templates you can use that allow you to paste in your raw data and start adding categories. From there, some templates will automatically summarize and report the “signal strength” of those codes. Whether automated or manual, the resulting report is still only giving you a bunch of data.
Plenty of data processing articles stop right here, leaving you with a basically useless report. Luckily, this is where we’re going to take it further - to show how you can really gain strategic value from the data you’ve just sorted.
Identifying data patterns
Once you’ve categorized your data, you want to identify patterns, stitching together all your code signals. There are a few general patterns that you can find or look for,specifically:
- Similarities (things that happen the same way)
- Differences (they happen in predictable ways)
- Frequency (they happen often or seldom)
- Sequence (they happen in a certain order)
- Correspondence (they happen in relation to other things)
- Causation (one appears to cause the other)
Coding is only one of many steps toward our goals. The way we connect that data is what will transform all our codification into information.
The value of anomalies
Anomalies are just as important as patterns when it comes to data. There are different schools of thought about coding, and one of them supports being rigidly machine-like about the process. You can find a lot of tools and SaaS platforms, such asUserLeap, that rely on machine learning algorithms and natural language processing models to codify.
By nature, the process of coding is reductionist. We’re taking a complex scope of information and trying to break it down into small, simple pieces of information. If you’re too rigid about the coding process, you risk overlooking key details and not reaching your end goal - which is understanding how customers feel.
I like to emphasize that coding analysis is heuristic. It’s a mental shortcut to help us reach our end goals. But you don’t want to be a machine about it. You need a little bit of “science meets art.” With that in mind, codification will help you leap from information to actionable user knowledge. As an example, say you’re asking ‘what’s important to you when buying jewelry online?” and you get these coded category results:
Price: 35%
Quality: 20%
Style: 20%
Brand reputation: 10%
A social or environmentally conscious brand: 4%
Referral: 3%
Easy to find and purchase: 2%
Well, you might obsess over the top 3 right? Well, if you’re in the game of growth and margins, then perhaps it’s the smaller signals that have kernals or seeds that could grow adjacent audiences. It likely depends a bunch on your brand or brand direction, but perhaps you want to start a line of jewlery that supports some social cause and link it to an influencer who can refer traffic…. Suddenly you have 7% of your audience that is triggered by this combination. Not bad for a campaign strategy experiment??
Using a codification lens
Your codification lens will impact the way you interpret your data, and helps you translate responses into quantifiable data. Your codification lens is determined by the question you ask and the goal of your question.
For example, when it comes to jewelry, you might want to understand a user’s motivation and tap into their fears, uncertainties, and doubts (FUDs).
The above sample response reads, , "I really don't see why this is so expensive, seems I can get it elsewhere for less." It's an open-ended response, and we don’t know what question was asked. Depending on your question and your goals, you can categorize or code this response in at least these three different ways.
- Price - The response is clearly price-oriented, which gives us a simple way to understand the answer.
- Value - On a deeper level, the response also indicates that your users don’t understand the value of the product. There’s some kind of uncertainty of success with that product. And because of that missing link, they don't see that the price is justified.
- Competition - Depending on the survey question, you can code this response in a “competition” category. In other words, the user's emphasis is that they can purchase this product elsewhere.
Coding your data
Let’s go deeper by looking at a specific customer survey my team conducted. We asked 300 users the question, “What matters to you most when buying jewelry?”
First, we bucketed the data into codes such as quality and style/design. As you can see above, we ranked the signal of all codes we found based on the number of respondents that mentioned those codes.
Our highest ranking signal was quality, since almost 50% of respondents mentioned it. Style and design were also important, and price was ranked third. There were also a number of high-ranking categories that overlapped with our top three, such as sentimental value and personal style.
Grouping Your Codes
Coding your data is a cyclical process. You might go through it once and create your codes, but in the process of trying to find patterns, you’ll continue to regroup and rearrange.
Your user responses will likely touch on a few signals at once, so trying to parse patterns from coding alone is tricky. How can you get a better look at your information? The answer: start identifying your patterns by sub-categorizing your codes. Group your codes to condense the data, then begin identifying patterns or trends.
When you start to group this sample survey, for example, you can see the following:
- Many codes can be categorized in quality and durability
- Many codes are around price and value
- The strongest signal, the category in which the most codes qualify, is style and meaning.
That last signal was hiding out in all this data, and we wouldn’t have found it without grouping and pattern-searching our codes.
Creating theories and hypotheses
Now that we’ve found the patterns in our data and presented them in an easily digestible format,, we can generate the hypotheses that move our messaging and marketing forward.
It’s time to ask ourselves a few more questions:
- How do these categories relate to each other?
- How can we put them in relationships with each other in terms of importance and sequence?
- How might we position value statements or show benefits to customers based on these insights?
Again, the goal of codification in this example is to understand how our customers feel when they’re buying jewelry so that we can tailor our approach to their experience - not just to collect data.
From the survey, we know user responses centered on quality, price, and meaning. What can we theorize from these facts?
In this case, we came up with the following theory: Meaning and material values are the core motivating themes for our customers. Customers want jewelry to connect to something in their lives, and they need to trust that the jewelry is well-made and worth the price.
From this theory, we can create new value propositions for campaigns. We can also start testing user experiences and behaviors based on this model. This process from coding and categorizing to deriving themes and concepts, we generated a new, strategic hypothesis for presenting products and improving our digital experiences.
Tools for codification
There are a range of tools that process information and simplify the codifying experience. Here are several that I’ve personally used and enjoyed.
Amazon Comprehend is a powerful, DIY, natural language processing service. You can actually API into it, so I wouldn't doubt that a lot of the tools below are actually just powered by Amazon Comprehend. It's sort of the Amazon Amtrak of the natural language processing world.
SEOScout is a free, cheap, quick, cool topic modeling service. Just go to their site and play around with it. It’s pretty basic, but I’ve had fun exploring it. What’s unique about this option is that it does the surveying and some of the natural language processing and coding for you.
At the enterprise level, Chattermill and Luminoso are worth playing with for the data processing stage.
Documenting your data process
As with every process, documentation is crucial. When carrying out polls, surveys and user testing — your user research and UX research platform — you need to be really diligent about documenting the metadata behind the process.
Keep in mind questions like:
- Who are you talking to?
- When you talked to them, what was the goal?
- What was the channel?
- What was the tool used?
- How many responses did you get?
Create a separate link out to the raw data set, and a link out to the mainstream reporting of that data. You can see how we build a template below.
If you've got this documentation in one central repository, you’re set up for long-term benefits, such as simpler onboarding for new team members and avoiding redundant practices.
Quick Recap
Coding is vital for transforming raw data into valuable information. When you go even deeper, codification can take you from information to the next stage; knowledge. After that leap from information to knowledge, you can generate data-driven strategies and hypotheses.
It’s a fine combination of structured process and creative thinking. You need structure, but not at the expense of robotic coding that prevents you from gaining deeper insights about your users. For that reason, a human eye is critical, so don’t solely rely on a machine to run the codification process.
Lastly, don’t forget to document all the practices you do. Documentation is a key touchstone for progress, and will ensure you don’t waste time repeating your efforts and relearning what you’d already uncovered..