Qualitative and Quantitative Data — Let's Blur the Line
It is common to classify data as quantitative and qualitative. In this post I talk about why this distinction is not always useful, and exciting directions for future developments.
Motivation
There is value in dissolving the boundary between " qualitative " and " quantitative " data, and freeing up our thinking around information. In doing so, we open ourselves up for better collaboration between humans and machines, and tools that can streamline and automate more of our knowledge workflows.
Qualitative & Unstructured Data
Qualitative data typically refers to "unstructured" and non-numerical data. The most common form of qualitative data is text-heavy documents and notes.
It is "unstructured" in a sense that two documents with very different content would appear similar to a machine. Whereas a humans can interpret and make sense of them, given the right prior experience.
Quantitative & Structured Data
Quantitative or "structured" data generally refers to numerical data to which we can apply mathematical, statistical, computational techniques.
For example, given the heights of a group of students, we can sort them by height in ascending or descending order.
Simplifying the World
Both quantitative and qualitative data are the outputs of recording what we notice and observe in the world. We reduce reality according to how we experience it, record the important pieces of information, and throw away the rest.
We have to discard information, there is too much noise otherwise. We won't get anywhere if we try to keep track of every particle of every object we come across. Abstractions help us do that, to focus only on attributes that matter to the scenario.
Plaintext is Limited
For qualitative data, we mostly write them down and store them as plaintext. It is good enough for most purposes. Ever since we invented language (and later writing), note-taking and documentation have been helpful for passing down knowledge.
But there is an issue: in most cases, plaintext requires a human being to interpret and act on. And human attention is expensive. Even though we have powerful computational tools available, we can't directly apply them on plaintext. This limits the extent to which we can enlist the help of machines.
Structured Data is Limited, in Other Ways
On the other hand, quantitative and structured data are limited in other ways. We have more powerful tools at our disposal to handle structured data, but they are harder to edit.
Data models are typically designed with a specific use case in mind. For instance, you would have a "Comment" and a "Post" data model for a forum app. This works well for the designated scenario. But the data models are rigidly tied to one system.
Our current ability to communicate (both interpreting and specifying) in structured data is limited.
Getting the Best of Both Worlds
The distinction between the analysis of "qualitative" and "quantitative" data leads us to reach for different tools. When we have a table of data, we immediately open up a spreadsheet. And for knowledge with a less obvious structure, we use a note-taking tool.
But with this, our knowledge ends up being scattered across disparate tools over time. We can unlock so much potential value if we can more seamlessly integrate the "quantitative" and "qualitative" parts of a knowledge worker's workflows. For instance, a hybrid of quantitive/qualitative data would mean documents that are easier to search or filter, more granular control and automation logic, and recommendation of other data relevant to the context, to name a few.
The following sections explore ways to make that happen.
From Qualitative to Quantitative
To augment unstructured data with attributes, we should first recognize that all attributes (numeric or otherwise) are somewhat arbitrary. For a "Person", we may choose to record and store their "height", "marital status", or even "number of McDonald's visits in lifetime".
There are an infinite number of valid features we can generate from unstructured data, by taking advantage of external knowledge. Here are some promising tools and techniques we can use:
I will discuss each one in more detail in upcoming posts.
From Quantitative to Qualitative
Structured data that we capture from applications and sensors may not easy to consume for humans. There is value in synthesizing a qualitative output from raw data, to give a report, and to tell a story.
Similar to how journalists and bloggers write articles based on events in the world (or in the market), it is likely that more summaries, presentations, and visualizations will be synthesized automatically from the underlying data. With generative language models improving at an impressive rate, this seems promising. As with the other direction, there are also many ways to go from quantitative to qualitative data.
Codifying & Emulating What Is Inside Our Minds
The more we can blur the line between quantitative and qualitative, structured and unstructured, the more we can offload knowledge work to existing tools and systems. It is an exciting direction to codify and emulate what is inside our minds, for us to digitize our knowledge, and for machines to interpret it.
I am working on tools for thoughts to build towards this direction. If you are interested in chatting, hit me up on Twitter.
Hope you enjoyed this post. Let's stay in touch.