At the GURT 2012 conference last week, another graduate student whose work was primarily quantitative asked me about the issue of data quality in qualitative research. She asked how, if your results are not repeated over a large group of people, do you know that your results are reliable or representative? In the process of answering her, I realized that in some ways quantitative and qualitative research are ideologically opposed. Whereas quantitative research is validated in part by the reliability of a result across a variety of people and contexts, the people and contexts for which the result is being quantified are not the focus but the rationale. However, in qualitative research, the context and the people are the focus. Qualitative research is more about putting a microscope up to an element in the data and closely observing how it works in the context of its surroundings. The research questions are starkly different, but they do clearly complement each other.
In the world of text analytics, the analytic focus is largely quantitative, which in some ways counteracts the very nature of the data. The unbalanced nature of the data collection then calls the quality of the quantitative analysis into question, and the analysis is not used in a traditional way to recreate the microfocus, so the advantages of the qualitative microfocus are also diminished.
We spoke a lot at GURT about the paucity of theory in the field of text analytics. The technical ability applied to the problems is quite strong. There are quite a few very talented programmers who are hard at work at conquering some of the technical issues inherent in text analysis, some on the computer science end of things, some at the computational linguistic end of things and some fortunate enough to work from both ends with knowledge from both fields. But it is not enough to be able to solve problems and answer questions, we have to know which questions to ask.
It has been relatively easy to blame the fast growing set of consumers for their immediate hunger for the data analysis. In order to satisfy this hunger, companies like Open Amplify double their output monthly and are working as hard and fast as they can to keep up with the demand. But the demand generally comes with the same level of linguistic knowledge that most laypeople have. We, as language users, are constantly inundated with language, and we only consciously process a very small proportion of it. So we don’t instinctively ask the questions that our data is really best suited to answer. The text analytic world is responding to questions of “what are people talking about?” with word frequencies, comparative word frequencies and sentiment analyses that are tied to those word frequencies. But we don’t use language that way! If I ask you about your phone, you’re likely to respond about its features of its usefulness of its price, or how well you’ve adapted to it. If I ask 100 people about their phones, how much good will it do me to aggregate across responses? There is a good deal of work that needs to be done in terms of finding intertextual references to phones (e.g. “springy keypad” or “data plan”) and assigning a negative value to “limited calling plan” and a positive to “limited call interference.” When I asked a coworker how our advisory committee meeting was going while I was at the conference, she answered “delicious.” We communicate by keying on shared knowledge, and as we communicate we build senses of particular topics that are related specifically to our conversation. If my coworker had answered with a comment about the potato salad, and I had played off of that, would we be talking about potato salad in any equivalent way to the way we might at a summer picnic? I would argue that as we joke on, we be talking in fact about something quite different than the potato salad itself. In fact, we would likely use the potato salad as a stand-in for the meeting that we were really discussing. Should that conversation be used by market researchers in a potato salad corporation?
In fact, the topics that we discuss are quite variable. The specific meanings that elements take on within a conversation is best understood in the context of that conversation as a part of a qualitative analysis than an aggregated quantitative analysis.
The big essential questions that we need to grapple with, as a field, at this point are questions like:
‘what kind of questions is this kind of data best suited to answer?’
‘how can our knowledge of linguistics and discourse be transferred into quantifiable questions that could feed the field of text analysis’
‘what kinds of questions can we ask of textual data that will reframe the way that people think about the usefulness of textual data?’
‘how can we best harness this fast growing mass of textual data in the most useful, reliable ways?’
I would argue that these are questions that discourse analysts are best suited to answer, but in order to ask them, they/we must be able to leave our qualitative bunkers and open our minds to the complementary potential for quantitative analysis. I would also argue that a popularized appreciation for the value of discourse analysis would also lend some legitimacy to a field that is largely unknown.
On the way to work this morning, I listened to an interview with Naomi Wolf. She spoke in part of the chutzpah of presenting academic knowledge in a widely accessible format. Academic perspective, she argued, is too often maintained in academic circles, far away from the general population who could really use and appreciate it. Georgetown professor Deborah Tannen made some important steps in the popularization of sociolinguistics. I believe that what I am suggesting is a quantitative extension of the popularization. People could not have imagined that a book about something as obscure as conversation analysis could be interesting or so widely applicable to their own lives. There was no rushing the doors from the general population of people desperate for these books. Hers was not a case of giving people what they wanted. It was a case of giving people something that would be widely useful. And people embraced Dr Tannen’s books as such.
Let us now use the luxury of time that the academic sector has but the commercial sector certainly does not to do what we do best: theorize! A new, great plane awaits. Let us head west!