I read a blog post on the LoveStats blog today that referred to one of the most widely regarded critiques of social media research: the lack of demographic information.
In traditional survey research, demographic information is a critically important piece of the analysis. We often ask questions like “Yes 50% of the respondents said they had encountered gender harassment, but what is the breakdown by gender?” The prospect of not having this demographic information is a large enough game changer to cast the field of social media research into the shade.
Here I’d like to take a sidestep and borrow a debate from linguistics. In the linguistic subfield of conversation analysis, there are two main streams of thought about analysis. One believes in gathering as much outside data as possible, often through ethnographic research, to inform a detailed understanding of the conversation. The second stream is rooted in the purity of the data. This stream emphasizes our dynamic construction of identity over the stability of identity. The underlying foundation of this stream is that we continually construct and reconstruct the most important and relevant elements of our identity in the process of our interaction. Take, for example, a study of an interaction between a doctor and a patient. The first school would bring into the analysis a body of knowledge about interactions between doctors and patients. The second would believe that this body of knowledge is potentially irrelevant or even corrupting to the analysis, and if the relationship is in fact relevant it will be constructed within the excerpt of study. This begs the question: are all interactions between doctors and patients primarily doctor patient interactions? We could address this further through the concept of framing and embedded frames (a la Goffman), but we won’t do that right now.
Instead, I’ll ask another question:
If we are studying gender discrimination, is it necessary to have a variable for gender within our datasouce?
My kneejerk reaction to this question, because of my quantitative background, is yes. But looking deeper: is gender always relevant? This does strongly depend on the datasource, so let’s assume for this example that the stimulus was a question on a survey that was not directly about discrimination, but rather more general (e.g. “Additional Comments:”).
What if we took that second CA approach, the purist approach, and say that where gender is applicable to the response it will be constructed within that response. The question now becomes ‘how is gender constructed within a response?’ This is a beautiful and interesting question for a linguist, and it may be a question that much better fits the underlying data and provides deeper insight into the data. It also turns the age old analytic strategy on its head. Now we can ask whether a priori assumptions that the demographics could or do matter are just rote research or truly the productive and informative measures that we’ve built them up to be?
I believe that this is a key difference between analysis types. In the qualitative analysis of open ended survey questions, it isn’t very meaningful to say x% of the respondents mentioned z, and y% of the respondents mentioned d, because a nonmention of z or d is not really meaningful. Instead we go deeper into the data to see what was said about d or z. So the goal is not prevalence, but description. On the other hand, prevalence is a hugely important aspect of quantitative analysis, as are other fun statistics which feed off of demographic variables.
The lesson in all of this is to think carefully about what is meaningful information that is relevant to your analysis and not to make assumptions across analytic strategies.