Qualitative and Quantitative Data — Let's Blur the Line

Sep 15 2020

It is common to classify data as quantitative and qualitative. In this post I talk about why this distinction is not always useful, and exciting directions for future developments.



Motivation

There is value in dissolving the boundary between " qualitative " and " quantitative " data, and freeing up our thinking around information. In doing so, we open ourselves up for better collaboration between humans and machines, and tools that can streamline and automate more of our knowledge workflows.



Qualitative & Unstructured Data

Qualitative data typically refers to "unstructured" and non-numerical data. The most common form of qualitative data is text-heavy documents and notes.

It is "unstructured" in a sense that two documents with very different content would appear similar to a machine. Whereas a humans can interpret and make sense of them, given the right prior experience.



Quantitative & Structured Data

Quantitative or "structured" data generally refers to numerical data to which we can apply mathematical, statistical, computational techniques.

For example, given the heights of a group of students, we can sort them by height in ascending or descending order.



Simplifying the World

Both quantitative and qualitative data are the outputs of recording what we notice and observe in the world. We reduce reality according to how we experience it, record the important pieces of information, and throw away the rest.

We have to discard information, there is too much noise otherwise. We won't get anywhere if we try to keep track of every particle of every object we come across. Abstractions help us do that, to focus only on attributes that matter to the scenario.



Plaintext is Limited

For qualitative data, we mostly write them down and store them as plaintext. It is good enough for most purposes. Ever since we invented language (and later writing), note-taking and documentation have been helpful for passing down knowledge.

But there is an issue: in most cases, plaintext requires a human being to interpret and act on. And human attention is expensive. Even though we have powerful computational tools available, we can't directly apply them on plaintext. This limits the extent to which we can enlist the help of machines.



Structured Data is Limited, in Other Ways

On the other hand, quantitative and structured data are limited in other ways. We have more powerful tools at our disposal to handle structured data, but they are harder to edit.

Data models are typically designed with a specific use case in mind. For instance, you would have a "Comment" and a "Post" data model for a forum app. This works well for the designated scenario. But the data models are rigidly tied to one system.

Our current ability to communicate (both interpreting and specifying) in structured data is limited.



Getting the Best of Both Worlds

The distinction between the analysis of "qualitative" and "quantitative" data leads us to reach for different tools. When we have a table of data, we immediately open up a spreadsheet. And for knowledge with a less obvious structure, we use a note-taking tool.

But with this, our knowledge ends up being scattered across disparate tools over time. We can unlock so much potential value if we can more seamlessly integrate the "quantitative" and "qualitative" parts of a knowledge worker's workflows. For instance, a hybrid of quantitive/qualitative data would mean documents that are easier to search or filter, more granular control and automation logic, and recommendation of other data relevant to the context, to name a few.

The following sections explore ways to make that happen.



From Qualitative to Quantitative

To augment unstructured data with attributes, we should first recognize that all attributes (numeric or otherwise) are somewhat arbitrary. For a "Person", we may choose to record and store their "height", "marital status", or even "number of McDonald's visits in lifetime".

There are an infinite number of valid features we can generate from unstructured data, by taking advantage of external knowledge. Here are some promising tools and techniques we can use:

  • Manual collection and labelling: Using a human's understanding to collect and fill in the data. Simple but expensive.
  • Rules and algorithms for generating features: e.g. Word count in a document. Counting the number of typos in a resume.
  • In-collection/binary classifiers: Does a piece of content belong to a known collection? If it does, then the rules and logic that apply to the collection are now available.
  •  Named-entity recognition : In the sentence "Tom Cruise has a dog", we can match the actor entity Tom Cruise to the first two words of the string, we can now connect the sentence to the existing structured data about him.
  • Interpolating from related data: We can use the most similar "neighbours" of a data point to fill in its numeric attributes. This is similar to how we estimate the value of a real estate property by looking at other proeprties in the same area.
  • I will discuss each one in more detail in upcoming posts.

    

    From Quantitative to Qualitative

    Structured data that we capture from applications and sensors may not easy to consume for humans. There is value in synthesizing a qualitative output from raw data, to give a report, and to tell a story.

    Similar to how journalists and bloggers write articles based on events in the world (or in the market), it is likely that more summaries, presentations, and visualizations will be synthesized automatically from the underlying data. With generative language models improving at an impressive rate, this seems promising. As with the other direction, there are also many ways to go from quantitative to qualitative data.

    

    Codifying & Emulating What Is Inside Our Minds

    The more we can blur the line between quantitative and qualitative, structured and unstructured, the more we can offload knowledge work to existing tools and systems. It is an exciting direction to codify and emulate what is inside our minds, for us to digitize our knowledge, and for machines to interpret it.

    I am working on tools for thoughts to build towards this direction. If you are interested in chatting, hit me up on Twitter.

    Hope you enjoyed this post. Let's stay in touch.