Yesterday was a big first for research methodologists across many disciplines. For some of the newer methods, it was the first election that they could be applied to in real time. For some of the older methods, this election was the first to bring competing methodologies, and not just methodological critiques.
Real time sentiment analysis from sites like this summarized Twitter’s take on the election. This paper sought to predict electoral turnout using google searches. InsideFacebook attempted to use Facebook data to track voting. And those are just a few of a rapid proliferation of data sources, analytic strategies and visualizations.
One could ask, who are the winners? Some (including me) were quick to declare a victory for the well honed craft of traditional pollsters, who showed that they were able to repeat their studies with little noise, and that their results were predictive of a wider real world phenomena. Some could call a victory for the emerging field of Data Science. Obama’s Chief Data Scientist is already beginning to be recognized. Comparisons of analytic strategies will spring up all over the place in the coming weeks. The election provided a rare opportunity where so many strategies and so many people were working in one topical area. The comparisons will tell us a lot about where we are in the data horse race.
In fact, most of these methods were successful predictors in spite of their complicated underpinnings. The google searches took into account searches for variations of “vote,” which worked as a kind of reliable predictor but belied the complicated web of naturalistic search terms (which I alluded to in an earlier post about the natural development of hashtags, as explained by Rami Khater of Al Jezeera’s The Stream, a social network generated newscast). I was a real-world example of this methodological complication. Before I went to vote, I googled “sample ballot.” Similar intent, but I wouldn’t have been caught in the analyst’s net.
If you look deeper at the Sentiment Analysis tools that allow you to view the specific tweets that comprise their categorizations, you will quickly see that, although the overall trends were in fact predictive of the election results, the data coding was messy, because language is messy.
And the victorious predictive ability of traditional polling methods belies the complicated nature of interviewing as a data collection technique. Survey methodologists work hard to standardize research interviews in order to maximize the reliability of the interviews. Sometimes these interviews are standardized to the point of recording. Sometimes the interviews are so scripted that interviewers are not allowed to clarify questions, only to repeat them. Critiques of this kind of standardization are common in survey methodology, most notably from Nora Cate Schaeffer, who has raised many important considerations within the survey methodology community while still strongly supporting the importance of interviewing as a methodological tool. My reading assignment for my ethnography class this week is a chapter by Charles Briggs from 1986 (Briggs – Learning how to ask) that proves that many of the new methodological critiques are in fact old methodological critiques. But the critiques are rarely heeded, because they are difficult to apply.
I am currently working on a project that demonstrates some of the problems with standardizing interviews. I am revising a script we used to call a representative sample of U.S. high schools. The script was last used four years ago in a highly successful effort that led to an admirable 98% response rate. But to my surprise, when I went to pull up the old script I found instead a system of scripts. What was an online and phone survey had spawned fax and e-mail versions. What was intended to be a survey of principals now had a set of potential respondents from the schools, each with their own strengths and weaknesses. Answers to common questions from school staff were loosely scripted on an addendum to the original script. A set of tips for phonecallers included points such as “make sure to catch the name of the person who transfers you, so that you can specifically say that Ms X from the office suggested I talk to you” and “If you get transferred to the teacher, make sure you are not talking to the whole class over the loudspeaker.”
Heidi Hamilton, chair of the Georgetown Linguistics department, often refers to conversation as “climbing a tree that climbs back.” In fact, we often talk about meaning as mutually constituted between all of the participants in a conversation. The conversation itself cannot be taken outside of the context in which it lives. The many documents I found from the phonecallers show just how relevant these observations can be in an applied research environment.
The big question that arises from all of this is one of a practical strategy. In particular, I had to figure out how to best address the interview campaign that we had actually run when preparing to rerun the campaign we had intended to run. My solution was to integrate the feedback from the phonecallers and loosen up the script. But I suspect that this tactic will work differently with different phonecallers. I’ve certainly worked with a variety of phonecallers, from those that preferred a script to those that preferred to talk off the cuff. Which makes the best phonecaller? Neither. Both. The ideal phonecaller works with the situation that is presented to them nimbly and professionally while collecting complete and relevant data from the most reliable source. As much of the time as possible.
At this point, I’ve come pretty far afield of my original point, which is that all of these competing predictive strategies have complicated underpinnings.
And what of that?
I believe that the best research is conscious of its strengths and weaknesses and not afraid to work with other strategies in order to generate the most comprehensive picture. As we see comparisons and horse races develop between analytic strategies, I think the best analyses we’ll see will be the ones that fit the results of each of the strategies together, simultaneously developing a fuller breakdown of the election and a fuller picture of our new research environment.