Instagram is changing the way I see

I recently joined Instagram (I’m late, I know).

I joined because my daughter wanted to, because her friends had, to see what it was all about. She is artistic, and we like to talk about things like color combinations and camera angles, so Instagram is a good fit for us. But it’s quickly changing the way I understand photography. I’ve always been able to set up a good shot, and I’ve always had an eye for color. But I’ve never seriously followed up on any of it. It didn’t take long on Instagram to learn that an eye for framing and color is not enough to make for anything more than accidental great shots. The great shots that I see are the ones that pick deeper patterns or unexpected contrasts out of seemingly ordinary surroundings. They don’t simply capture beauty, they capture an unexpected natural order or a surprising contrast, or they tell a story. They make you gasp or they make you wonder. They share a vision, a moment, an insight. They’re like the beginning paragraph of a novel or the sketch outline of a poem. Realizing that, I have learned that capturing the obvious beauty around me is not enough. To find the good shots, I’ll need to leave my comfort zone, to feel or notice differently, to wonder what or who belongs in a space and what or who doesn’t, and why any of it would capture anyone’s interest. It’s not enough to see a door. I have to wonder what’s behind it. To my surprise, Instagram has taught me how to think like a writer again, how to find hidden narratives, how to feel contrast again.

Sure this makes for a pretty picture. But what is unexpected about it? Who belongs in this space? Who doesn't? What would catch your eye?

Sure this makes for a pretty picture. But what is unexpected about it? Who belongs in this space? Who doesn’t? What would catch your eye?

This kind of change has a great value, of course, for a social media researcher. The kinds of connections that people forge on social media, the different ways in which people use platforms and the ways in which platforms shape the way we interact with the world around us, both virtual and real, are vitally important elements in the research process. In order to create valid, useful research in social media, the methods and thinking of the researcher have to follow closely with the methods and thinking of the users. If your sensemaking process imitates the sensemaking process of the users, you know that you’re working in the right direction, but if you ignore the behaviors and goals of the users, you have likely missed the point altogether. (For example, if you think of Twitter hashtags simply as an organizational scheme, you’ve missed the strategic, ironic, insightful and often humorous ways in which people use hashtags. Or if you think that hashtags naturally fall into specific patterns, you’re missing their dialogic nature.)

My current research involves the cycle between social media and journalism, and it runs across platforms. I am asking questions like ‘what gets picked up by reporters and why?’ and ‘what is designed for reporters to pick up?’ And some of these questions lead me to examine the differences between funny memes that circulate like wildfire through Twitter leading to trends and a wider stage and the more indepth conversation on public facebook pages, which cannot trend as easily and is far less punchy and digestible. What role does each play in the political process and in constituting news?

Of course, my current research asks more questions than these, but it’s currently under construction. I’d rather not invite you into the workzone until some of the pulp and debris have been swept aside…

Total Survey Error: as Iconic as the Statue of Liberty herself?

In Jan Blommaerts book, the Sociolinguistics of Globalization, I learned about the iconicity of language. Languages, dialects, phrases and words have the potential to be as iconic as the statue of liberty. As I read Blommaert’s book, I am also reading about Total Survey Error, which I believe to be an iconic concept in the field of survey research.

Total Survey Error (TSE) is a relatively new, albeit very comprehensive framework for evaluating a host of potential error sources in survey research. It is often mentioned by AAPOR members (national and local), at JPSM classes and events, and across many other events, publications and classes for survey researchers. But here’s the catch: TSE came about after many of us entered the field. In fact, by the time TSE debuted and caught on as a conceptual framework, many people had already been working in the field for long enough that a framework didn’t seem necessary or applicable.

In the past, survey research was a field that people grew into. There were no degree or certificate programs in survey research. People entered the field from a variety of educational and professional backgrounds and worked their way up through the ranks from data entry, coder or interviewing positions to research assistant and analyst positions, and eventually up to management. Survey research was a field that valued experience, and much of the essential job knowledge came about through experience. This structure strongly characterizes my own office, where the average tenure is fast approaching two decades. The technical and procedural history of the department is alive and well in our collections of artifacts and shared stories. We do our work with ease, because we know the work well, and the team works together smoothly because of our extensive history together. Challenges or questions are an opportunity for remembering past experiences.

Programs such as the Joint Program in Survey Methodology (JPSM, a joint venture between the University of Michigan and University of Maryland) are relatively new, arising, for the most part, once many survey researchers were well established into their routines. Scholarly writings and journals multiplied with the rise of the academic programs. New terms and new methods sprang up. The field gained an alternate mode of entry.

In sociolinguistics, we study evidentiality, because people value different forms of evidence. Toward this end, I did a small study of survey researchers’ language use and mode of evidentials and discovered a very stark split between those that used experience to back up claims and those who relied on research to back up claims. This stark difference matched up well to my own experiences. In fact, when I coach jobseekers who are looking for survey research positions, I  draw on this distinction and recommend that they carefully listen to the types of evidentials they hear from the people interviewing them and try to provide evidence in the same format. The divide may not be visible from the outside of the field, but it is a strong underlying theme within it.

The divide is not immediately visible from the outside because the face of the field is formed by academic and professional institutions that readily embrace the academic terminology. The people who participate in these institutions and organizations tend to be long term participants who have been exposed to the new concepts through past events and efforts.

But I wonder sometimes whether the overwhelming public orientation to these methods doesn’t act to exclude some longtime survey researchers in some ways. I wonder whether some excellent knowledge and history get swept away with the new. I wonder whether institutions that represent survey research represent the field as a whole. I wonder what portion of the field is silent, unrepresented or less connected to collective resources and changes.

Particularly as the field encounters a new set of challenges, I wonder how well prepared the field will be- not just those who have been following these developments closely, but also those who have continued steadfast, strong, and with limited errors- not due to TSE adherence, but due to the strength of their experience. To me, the Total Survey Error Method is a powerful symbol of the changes afoot in the field.

For further reference, I’m including a past AAPOR presidential address by Robert Groves

groves aapor

Proceedings of the Fifty-First Annual Conference of the American Association for Public Opinion Research
Source: Source: The Public Opinion Quarterly, Vol. 60, No. 3 (Autumn, 1996), pp. 471-513
ETA other references:

Bob Groves: The Past, Present and Future of Total Survey Error

Slideshow summary of above article

Is there Interdisciplinary hope for Social Media Research?

I’ve been trying to wrap my head around social media research for a couple of years now. I don’t think it would be as hard to understand from any one academic or professional perspective, but, from an interdisciplinary standpoint, the variety of perspectives and the disconnects between them are stunning.

In the academic realm:

There is the computer science approach to social media research. From this standpoint, we see the fleshing out of machine learning algorithms in a stunning horserace of code development across a few programming languages. This is the most likely to be opaque, proprietary knowledge.

There is the NLP or linguistic approach, which overlaps to some degree with the cs approach, although it is often more closely tied to grammatical rules. In this case, we see grammatical parsers, dictionary development, and api’s or shared programming modules, such as NLTK or GATE. Linguistics is divided as a discipline, and many of these divisions have filtered into NLP.

Both the NLP and CS approaches can be fleshed out, trained, or used on just about any data set.

There are the discourse approaches. Discourse is an area of linguistics concerned with meaning above the level of the sentence. This type of research can follow more of a strict Conversation Analysis approach or a kind of Netnography approach. This school of thought is more concerned with context as a determiner or shaper of meaning than the two approaches above.

For these approaches, the dataset cannot just come from anywhere. The analyst should understand where the data came from.

One could divide these traditions by programming skills, but there are enough of us who do work on both sides that the distinction is superficial. Although, generally speaker, the deeper one’s programming or qualitative skills, the less likely one is to cross over to the other side.

There is also a growing tradition of data science, which is primarily quantitative. Although I have some statistical background and work with quantitative data sets every day, I don’t have a good understanding of data science as a discipline. I assume that the growing field of data visualization would fall into this camp.

In the professional realm:

There are many companies in horseraces to develop the best systems first. These companies use catchphrases like “big data” and “social media firehose” and often focus on sentiment analysis or topic analysis (usually topics are gleaned through keywords). These companies primarily market to the advertising industry and market researchers, often with inflated claims of accuracy, which are possible because of the opacity of their methods.

There is the realm of market research, which is quickly becoming dependent on fast, widely available knowledge. This knowledge is usually gleaned through companies involved in the horserace, without much awareness of the methodology. There is an increasing need for companies to be aware of their brand’s mentions and interactions online, in real time, and as they collect this information it is easy, convenient and cost effective to collect more information in the process, such as sentiment analyses and topic analyses. This field has created an astronomically high demand for big data analysis.

There is the traditional field of survey research. This field is methodical and error focused. Knowledge is created empirically and evaluated critically. Every aspect of the survey process is highly researched and understood in great depth, so new methods are greeted with a natural skepticism. Although they have traditionally been the anchors of good professional research methods and the leaders in the research field, survey researchers are largely outside of the big data rush. Survey researchers tend to value accuracy over timeliness, so the big, fast world of big data, with its dubious ability to create representative samples, hold little allure or relevance.

The wider picture

In the wider picture, we have discussions of access and use. We see a growing proportion of the population coming online on an ever greater variety of devices. On the surface, the digital divide is fast shrinking (albeit still significant). Some of the digital access debate has been expanded into an understanding of differential use- essentially that different people do different activities while online. I want to take this debate further by focusing on discursive access or the digital representation of language ideologies.

The problem

The problem with such a wide spread of methods, needs, focuses and analytic traditions is that there isn’t enough crossover. It is very difficult to find work that spreads across these domains. The audiences are different, the needs are different, the abilities are different, and the professional visions are dramatically different across traditions. Although many people are speaking, it seems like people are largely speaking within silos or echo chambers, and knowledge simply isn’t trickling across borders.

This problem has rapidly grown because the underlying professional industries have quickly calcified. Sentiment analysis is not the revolutionary answer to the text analysis problem, but it is good enough for now, and it is skyrocketing in use. Academia is moving too slow for the demands of industry and not addressing the needs of industry, so other analytic techniques are not being adopted.

Social media analysis would best be accomplished by a team of people, each with different training. But it is not developing that way. And that, I believe, is a big (and fast growing) problem.

Dispatch from the quantitative | qualitative border

On Tuesday evening I attended my first WAPA meeting (Washington Association of Professional Anthropologists). This group meets monthly, first with a happy hour and then with a speaker. Because I have more of a quantitative background, the work of professional anthropologists really blows my mind. The topics are wide ranging and the work interesting and innovative. I’ve been sorry to miss so many of their gatherings.

This week’s topic was near and dear to my heart in two ways.

1. The work was done in a survey context as a qualitative investigation preceding the development of survey questions. As a professional survey methodologist, I have worked through the surprisingly complicated question writing process many hundreds of times, so this approach really fascinates me!

2. The work surrounded the topic of childbirth. As a mother of two and a [partially] trained birth assistant, I love to talk about childbirth.

The purpose of the study at hand was to explore infant mortality in greater depth by investigating certain aspects of the delivery process. The topics of interest included:

– whether the birth was attended by a professional or not
– whether the birth was at home or in a medical facility
– delivery of the placenta
– how soon after the birth the baby was wiped
– cord cutting and tying
– whether the baby was swaddled and whether the baby’s head was covered
– how soon the baby was bathed

The study was based on 80 respondents (half facility births, half homebirths) (half moms of newborns, half moms of 1-2 year olds) from each of two countries. The researchers collected two kinds of data: extensive unstructured interviews and survey questions. The interviews were coded using Atlas ti into specific, identifiable, repeated events that were relevant to infant mortality and then placed onto a timeline. The timeline guided the recommended order of the survey questions.

One audience member shared that she would have collected stories of “what is a normal childbirth?” from participants in addition to the women’s personal stories. Her focus with this tactic was to collect the language with which people usually discuss these events in childbirth. She mentioned that her field was linguistic anthropology. The language she was talking about is referred to by survey researchers as “native terms-” essentially the terms that people normally use when discussing a given topic. One of the goals of question writing is to write a question using the terms that a respondent would naturally use to classify their response, making the response process easier for the respondent and collecting higher quality data. The presenters mentioned that, although they did not collect normative stories, collecting native terms was a part of their research process and recommendations.

The topics of focus are problematic ones to investigate. Most women can tell whether or not they gave birth in a facility and whether or not the birth was attended by a professional. Women can usually remember their labor and delivery in detail (usually for the rest of their lives), as well as the first time they held and fed their babies. Often women can also remember the delivery of the placenta or whether or not they hemorrhaged or tore significantly during the birth process.

But other aspects of the birth, such as the cord cutting and tying and the first wiping and swaddling of the baby, are usually done by someone other than the mother (if there is someone else present). They often don’t command the attention of the mother, who is full of emotion and adrenaline and catching her breath from an all encompassing, life changingly powerful experience. These moments are often not as memorable as others, and the mothers are often not as fully aware of them or able to report them.

I wondered if the moms were able to use the same level of detail in retelling these parts of their stories? Was there any indication that these sections of the stories they told were their own personal stories and not a general recounting of events as they are supposed to happen? In survey research, we talk about satisficing, or providing an answer because an answer is expected, not because it is correct. In societies where babies are frequently born at home, people often grow up around childbirth and know the general, expected order of events. How would the results of the study have been different if the researchers had used a slightly different approach: instead of assuming that the mothers would be able to recount all of these details of their own experiences, the researchers could have taken a deeper look at who performed the target activities, how detailed an account of the activities the mothers were able to provide, and the nature of the mom’s involvement or role in the target activities.

I wondered if working with this alternative approach would have led to questions more like “The next few questions refer to the moments after your baby was born and the first time you held and nursed your baby. Was the baby already wiped when you first held and nursed them? Was the babies cord already cut and tied? Was the baby already swaddled? Was the baby’s head already covered?” Although questions like these wouldn’t separate out the first 5 minutes from the first 10, they would likely be easier for the mom to answer and yield more complete and accurate responses.

All in all, this event was a fantastic one. I learned about an area of research that I hadn’t known existed. The speaker was great, and the audience was engaged. If you have an opportunity to attend a WAPA event, I highly recommend it.

Time for some Research Zen

As the new semester kicks into gear and work deadlines loom, I find myself ready for a moment of research zen.

2012-12-16 14.18.00

Let’s take a minute to stand in a stream and think about the water. Feel the flow of the water over your feet and by your calves. Feel the pull of constant motion. Feel yourself sink against the current, rooting deeper to keep steady. Breathe the clean outdoor air. Observe the clouds and watch the way the sky reflects in the water in the stream. The stream is not constant. The water passing now is not the water that passed when you started, and the water that passes when you leave will be still different. And yet we call this a stream.

As I observe sources of social media, thinking about sampling, I’m faced with some of the same questions that the stream gives rise to. Although I would define my sources consistently from day to day, their content shifts constantly. The stream is not constant, but rather constantly forming and reforming at my feet.

For a moment, I saw the tide of social media start to turn in favor of taxi drivers. In that moment, I felt both a strong sense of relief from the negativity and a need to revisit my research methods. Today I see that the stream has again turned against the drivers. I could ignore the momentary shift, or I could use this as a moment to again revisit the wisdom of sampling.

If I sample the river at a given point, what should I collect and what does it represent? How, when the water is constantly moving around me, can I represent what I observe within a sample? Could my sampling ever represent a single point in the stream, the stream as a whole, or streams in general? Or will it always be moments in the life of a stream?

In the words of Henry Miller, “The world is not to be put in order. The world is in order. It is for us to put ourselves in unison with this order.” In order to understand this stream, I need to understand what lies beneath it, what gives it its shape and flow, and how it works within its ecosystem.

The ecosystem of public opinion around the taxi system in DC is not one that can be understood purely online. When I see the reflection of clouds on the stream, I need to find the sky. When I see phrases repeated over and over, I need to understand where they come from and how they came to be repeated. In the words of Blaise Pascal “contradiction is not a sign of falsity, nor the lack of contradiction a sign of truth.” No elements in this ecosystem exist independent of context. Each element has its base.

Good research involves a good deal of reflection. It involves digging in against currents and close observation. It involves finding a moment of stillness in the flow of the stream.

Breathe in. Observe carefully. Breathe out. Repeat, continue, focus, research.

Fertile soil from dry dirt. Thank you, Netherlands!

The mood workshop (microanalysis of online data) in Nijmegen last week was immensely helpful for me. In two short days, my research lost some branches and grew some deeper roots. Definitely worth 21+ hours of travel!

Aerial shot of Greenland. Can't tell where the clouds end and the snow and ice begin!

Aerial shot of Greenland. Can’t tell where the clouds end and the snow and ice begin!

The retooling began early on the first day. My first, burning question for the group was about choosing representative data. The shocking first answer: why? To someone with a quantitative background, this question was mind blowing. The sky is up, the ground is down, and data should be representative. But representative of what?

Here we return to the nature of the data. What data are you looking at? What kind of motivated behavior does it represent? Essentially, I am looking at online conversation. We know that counting conversational topics is fruitless- that’s the first truth of conversation analysis. And we know that counting conversational participation is usually misguided. So what was I trying to represent?

My goal is to track a silence that happens across site types, largely independent of stimulus. No matter what kind of news article about taxis in Washington DC, no matter the source, the driver perspective is almost completely absent, and if it is represented the responses are noticeably different or marked. I had thought that if I could find a way to count this underrepresentation I could launch a systematic, grounded critique of the notion of participatory media and pose the question of which values were being maintained from the ground up. What is social capital in online news discourse, who speaks, and which speakers are ratified?

But this is not a question of representative sampling alone. Although sampling could offer a sense of context to the data, the meat and potatoes of the analysis are in fact fodder for conversation analysis. A more useful and interesting research question emerged: how are these online conversations constructed so as to make a pro taxi response dispreferred or marked? This question invokes pronoun usage, intertextuality, conversational reach, crowd based sanctioning, conversational structure and pair parts, register, and more. It provides grounding for a rich, layered analysis. Fertile soil from dry dirt. Thank you, Netherlands.

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You'd be safer living in one of these houseboats!

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You’d be safer living in one of these houseboats!

Turns out Ethnography happens one slice at a time

Some of you may have noticed that I promised to report some research and then didn’t.

Last semester, for my Ethnography of Communication class, I did an Ethnography of DC taxi drivers. The theme of the Ethnography was “the voice of the drivers.” It was multilayered, and it involved data from a great variety of sources. I had hoped to share my final paper for the class here, but that won’t work for three reasons.

1.) The nature of Ethnography. Ethnography involves collecting a great deal of data and then choosing what to report, in what way, and in what context. The goal of the final paper was to reflect on the methodology. This was an important exercise, but I really wanted to share more of my findings and less of my methodology here.

2.) The particular aspect of my findings that I most want to share here has to do with online discourse. Specifically, I want to examine the lack of representation of the drivers perspective online. There are quite a few different ways to accomplish this. I have tried to do it a number of ways, using different slices of data and using different analytic strategies. But I haven’t decided which is the best set of data or method of analysis. But I am a very lucky researcher. Next week I’m headed to a workshop at Radbound University in Nijmegen, Netherlands. The workshop is on the Microanalysis of Online Discourse. I am eager to bring my data and methodological questions and to recieve insight from such an amazing array of researchers. I am also very eager to see what they bring!

Much of the discussion in the analysis of online discourse either excludes the issue of representation altogether or focuses on it entirely. Social media is often hailed as the great democratizer of communication. Internet access was long seen as the biggest obstacle to this new democracy . From this starting point, much of the research has evolved to consider more of nuances of differential use, including the complicated nature of internet access as well as behavior and goals of internet users. This part of my findings is an example of differential use and of different styles of participation. Working with this data has changed the way I see social media and the way I understand the democratization of news.

3.) Scope. The other major reason why I haven’t shared my findings is because of the sheer scope of this project. I was fortunate enough to only have taken one class last semester, which left me the freedom to work much harder on it. Also, as a working/student mom, I chose a project that involved my whole family in an auto-ethnographic way, so much of my work brought me closer to my family, rather than farther apart (spending time away from family to study is one of the hardest parts of working student motherhood!)

I have amassed quite a bit of data at this point, and I plan to write a few different papers using it.

Stay tuned, because I will release slices of it. But have some patience, because each slice will only be released in its own good time.

 

At this point, I feel the need to reference the Hutzler Banana Slicer

Turns out, Ethnography is more like this:

 

than like this:

Data Storytelling

In the beginning of our Ethnography of Communication class, one of the students asked about the kinds of papers one writes about an ethnography. It seemed like a simple question at the time. In order to report on ethnographic data, the researcher chooses a theme and then pulls out the parts of their data that fit the theme. Now that I’m at the point in my ethnography where I’m choosing what to report, I can safely say that this question is not one with an easy answer.

At this point, I’ve gathered together a tremendous amount of data about DC taxi drivers. I’ve already given my final presentation for my class, and written most of my final paper. But the data gathering phase hasn’t ended yet. I have been wondering whether I have enough data gathered together to write a book, and I probably could write a book, but that still doesn’t make my project feel complete. I don’t feel like the window I’ve carved is large enough to do this topic any justice.

The story that I set out to tell about the drivers is one of their absence in the online public sphere. As the wife of a DC driver, I was sick and tired of seeing blog posts and newspaper articles with seemingly unending streams of offensive, ignorant, or simply one sided comments. This story turns out to be one with many layers, one that goes far beyond issues of internet access, delves deeply into matters of differential use of technology, and one that strikes fractures into the soil of the grand potential of participatory democracy. It is also a story grounded in countless daily interactions, involving a large number of participants and situations. The question is large, the data abundant, and the paths to the story many. Each more narrow path begs a depth that is hungry for more data and more analysis. Each answer is defined by more questions. More specifically, do I start with the rides? With a specific ride? With the drivers? With a specific driver? With a specific piece of legislation? With one online discussion or theme? How can I make sure that my analysis is grounded and objective? How far do I trace the story, and which parts of the story does it leave out? What happens with the rest of the story? What is my responsibility and to whom?

This paper will clearly not be the capstone to the ethnography, just one story told through the data I’ve gathered together in the past few months. More stories can be told, and will be told with the data. Specifically, I’m hoping to delve more deeply into the driver’s social networks, for their role in information exchange. And the fallout from stylistic differences in online discussions. And, more prescriptively, into ways that drivers voices can be better represented in the public sphere. And maybe more?

It feels strange to write a paper that isn’t descriptive of the data as a whole. Every other project that I’ve worked on has led to a single publication that summarized the whole set. It seems strange, coming from a quantitative perspective where the data strongly confines the limits of what can and cannot be said in the report and what is more or less important to include in the report, to have a choice of data, and, more importantly, a choice of story to tell. Instead of pages of numbers to look through, compare and describe, I’m entering the final week of this project with the same cloud of ambiguity that has lingered throughout. And I’m looking for ways that my data can determine what can and cannot be reported on and what stories should be told. Where, in this sea of data, is my life raft of objectivity? (Hear that note of drama? That comes from the lack of sleep and heightened anxiety that finals bring about- one part of formal education that I will not miss!!)

I have promised to share my paper here once it has been written. I might end up making some changes before sharing it, but I will definitely share it. My biggest hope is that it will inspire some fresh, better informed conversation on the taxi situation in DC and on what it means to be represented in a participatory democracy.

What do all of these polling strategies add up to?

Yesterday was a big first for research methodologists across many disciplines. For some of the newer methods, it was the first election that they could be applied to in real time. For some of the older methods, this election was the first to bring competing methodologies, and not just methodological critiques.

Real time sentiment analysis from sites like this summarized Twitter’s take on the election. This paper sought to predict electoral turnout using google searches. InsideFacebook attempted to use Facebook data to track voting. And those are just a few of a rapid proliferation of data sources, analytic strategies and visualizations.

One could ask, who are the winners? Some (including me) were quick to declare a victory for the well honed craft of traditional pollsters, who showed that they were able to repeat their studies with little noise, and that their results were predictive of a wider real world phenomena. Some could call a victory for the emerging field of Data Science. Obama’s Chief Data Scientist is already beginning to be recognized. Comparisons of analytic strategies will spring up all over the place in the coming weeks. The election provided a rare opportunity where so many strategies and so many people were working in one topical area. The comparisons will tell us a lot about where we are in the data horse race.

In fact, most of these methods were successful predictors in spite of their complicated underpinnings. The google searches took into account searches for variations of “vote,” which worked as a kind of reliable predictor but belied the complicated web of naturalistic search terms (which I alluded to in an earlier post about the natural development of hashtags, as explained by Rami Khater of Al Jezeera’s The Stream, a social network generated newscast). I was a real-world example of this methodological complication. Before I went to vote, I googled “sample ballot.” Similar intent, but I wouldn’t have been caught in the analyst’s net.

If you look deeper at the Sentiment Analysis tools that allow you to view the specific tweets that comprise their categorizations, you will quickly see that, although the overall trends were in fact predictive of the election results, the data coding was messy, because language is messy.

And the victorious predictive ability of traditional polling methods belies the complicated nature of interviewing as a data collection technique. Survey methodologists work hard to standardize research interviews in order to maximize the reliability of the interviews. Sometimes these interviews are standardized to the point of recording. Sometimes the interviews are so scripted that interviewers are not allowed to clarify questions, only to repeat them. Critiques of this kind of standardization are common in survey methodology, most notably from Nora Cate Schaeffer, who has raised many important considerations within the survey methodology community while still strongly supporting the importance of interviewing as a methodological tool. My reading assignment for my ethnography class this week is a chapter by Charles Briggs from 1986 (Briggs – Learning how to ask) that proves that many of the new methodological critiques are in fact old methodological critiques. But the critiques are rarely heeded, because they are difficult to apply.

I am currently working on a project that demonstrates some of the problems with standardizing interviews. I am revising a script we used to call a representative sample of U.S. high schools. The script was last used four years ago in a highly successful effort that led to an admirable 98% response rate. But to my surprise, when I went to pull up the old script I found instead a system of scripts. What was an online and phone survey had spawned fax and e-mail versions. What was intended to be a survey of principals now had a set of potential respondents from the schools, each with their own strengths and weaknesses. Answers to common questions from school staff were loosely scripted on an addendum to the original script. A set of tips for phonecallers included points such as “make sure to catch the name of the person who transfers you, so that you can specifically say that Ms X from the office suggested I talk to you” and “If you get transferred to the teacher, make sure you are not talking to the whole class over the loudspeaker.”

Heidi Hamilton, chair of the Georgetown Linguistics department, often refers to conversation as “climbing a tree that climbs back.” In fact, we often talk about meaning as mutually constituted between all of the participants in a conversation. The conversation itself cannot be taken outside of the context in which it lives. The many documents I found from the phonecallers show just how relevant these observations can be in an applied research environment.

The big question that arises from all of this is one of a practical strategy. In particular, I had to figure out how to best address the interview campaign that we had actually run when preparing to rerun the campaign we had intended to run. My solution was to integrate the feedback from the phonecallers and loosen up the script. But I suspect that this tactic will work differently with different phonecallers. I’ve certainly worked with a variety of phonecallers, from those that preferred a script to those that preferred to talk off the cuff. Which makes the best phonecaller? Neither. Both. The ideal phonecaller works with the situation that is presented to them nimbly and professionally while collecting complete and relevant data from the most reliable source. As much of the time as possible.

At this point, I’ve come pretty far afield of my original point, which is that all of these competing predictive strategies have complicated underpinnings.

And what of that?

I believe that the best research is conscious of its strengths and weaknesses and not afraid to work with other strategies in order to generate the most comprehensive picture. As we see comparisons and horse races develop between analytic strategies, I think the best analyses we’ll see will be the ones that fit the results of each of the strategies together, simultaneously developing a fuller breakdown of the election and a fuller picture of our new research environment.

“Not everything that can be counted counts”

“Not everything that counts can be counted, and not everything that can be counted counts” – sign in Einstein’s Princeton office

This quote is from one of my favorite survey reminder postcards of all time, along with an image from from the Emilio Segre visual archives. The postcard layout was an easy and pleasant decision made in association with a straightforward survey we have conducted for nearly a quarter century. …If only social media analysis could be so easy, pleasant or straightforward!

I am in the process of conducting an ethnography of DC taxi drivers. I was motivated to do this study because of the persistent disconnect between the experiences and reports of the taxi drivers and riders I hear from regularly and the snarky (I know this term does not seem technical, but it is absolutely data motivated!) riders who dominate participatory media sources online. My goal at this point of the project is to chase down the disconnect in media participation and see how it maps to policy deliberations and offline experiences. This week I decided to explore ways of quantifying the disconnect.

Inspired by this article in jedem (the eJournal of eDemocracy and Open Government), I decided to start my search using framework based in Social Network Analysis (SNA), in order to use elements of connectedness, authority and relevance as a base. Fortunately, SNA frameworks are widely available to analysts on a budget in the form of web search engines! I went through the first 22 search results for a particular area of interest to my study: the mandatory GPS policy. Of these 22 sites, only 11 had active web 2.0 components. Across all of these sites, there were just two comments from drivers. Three of the sites that didn’t have any comments from drivers did have one post each that sympathized with or defended DC taxi drivers. The remaining three sites had no responses from taxi drivers and no sympathetic responses in defense of the drivers. Barring a couple of comments that were difficult to divine, the rest of the comments were negative comments about DC taxi drivers or the DC taxi industry. This matched my expectations, and, predictably, didn’t match any of my interviews or offline investigations.

The question at this point is one of denominator.

The easiest denominator to use, and, in fact, the least complicated was the number of sites. Using this denominator, only one quarter of the sites had any representation from a DC taxi driver. This is significant, given that the discussions were about aspects of their livelihood, and the drivers will be the most closely affected by the regulatory changes. This is a good, solid statistic from which to investigate the influence of web 2.0 on local policy enactment. However, it doesn’t begin to show the lack of representation the way that a denominator such as number of posts, number of posters, or number of opinions would have. But each one of these alternative denominators has its own set of headaches. Does it matter if one poster expresses an opinion once and another expresses another, slightly different opinion more than once? If everyone agrees, what should the denominator be? What about responses that contain links that are now defunct or insider references that aren’t meaningful to me? Should I consider measures of social capital, endorsements, social connectedness, or the backgrounds of individual posters?

The simplest figure also doesn’t show one of the most striking aspects of this finding; the relative markedness of these posts. In the context of predominantly short, snarky and clever responses, one of the comments began with a formal “Dear DC city councilmembers and intelligent  taxpayers,” and the other spread over three dense, winding posts in large paragraph form.

This brings up an important aspect of social media; that of social action. If every comment is a social action with social intentions, what are the intentions of the posters and how can these be identified? I don’t believe that the majority of posts left were intended as a voice in local politics, but the comments from the drivers clearly were. The majority of posts represent attempts to warrant social capital using humor, not attempts to have a voice in local politics. And they repeated phrases that are often repeated in web 2.0 discussions about the DC taxi situation, but rarely repeated elsewhere. This observation, of course, is pretty meaningless without being anchored to the data itself, both quantitatively and qualitatively. And it makes for some interesting ‘next steps’ in a project that is certainly not short of ‘next steps.’

The main point I want to make here is about the nature of variables in social media research. Compared to a survey, where you ask a question, determined in advance, and have a set of answers to work with in your analysis, you are free to choose your own variables for your analysis. Each choice brings with it a set of constraints and advantages, and some fit your data better than others. But the path to analysis can be a more difficult path to take, and more justification about the choices you make is important. To augment this, a quantitative analysis, which can sometimes have very arbitrary or less clear choices included in it, is best supplemented with a qualitative analysis that delves into the answers themselves and why they fit the coding structure you have imposed.

In all of this, I have quite a bit of work out ahead of me.