More Takeaways from the DC-AAPOR/WSS Summer Conference

Last week I shared my notes from the first two sessions of the DC-AAPOR/ WSS Summer conference preview/review. Here are the rest of the notes, covering the rest of the conference:

Session 3: Accessing and Using Records

  • Side note: Some of us may benefit from a support group format re: matching administrative records
  • AIR experiment with incentives & consent to record linkage: $2 incentive s/t worse than $0. $20 incentive yielded highest response rate and consent rate earlies t in the process, cheaper than phone follow-up
    • If relevant data is available, $20 incentive can be tailored to likely nonrespondents
    • Evaluating race & Hispanic origin questions- this was a big theme over the course of this conference. The social constructiveness of racial/ethnic identity doesn’t map well to survey questions. This Census study found changes in survey answers based on context, location, social position, education, ambiguousness of phenotype, self-perception, question format, census tract, and proxy reports. Also a high number of missing answers.

Session 4: Adaptive Design in Government Surveys

  • A potpourri of quotes from this session that caught my eye:
    • Re: Frauke Kreuter “the mother of all paradata”
      Peter Miller: “Response rates is not the goal”
      Robert Groves: “The way we do things is unsustainable”
    • Response rates are declining, costs are rising
    • Create a dashboard that works for your study. Include the relevant cars you need in order to have a decision maing tool that is tailored/dynamic and data based
      • Include paradata, response data
      • Include info re: mode switching, interventions
      • IMPORTANT: prioritize cases, prioritize modes, shift priorities with experience
      • Subsample open cases (not yet respondes)
      • STOP data collection at a sensible point, before your response bias starts to grow exponentially and before you waste money on expensive interventions that can actually work to make your data less representative
    • Interviewer paradata
      • Chose facts over inference
      • Presence or absence of key features (e.g. ease of access, condition of property)
        • (for a phone survey, these would probably include presence or absence of answer or answering mechanism, etc.)
        • For a household survey, household factors more helpful than neighborhood factors
    • Three kinds of adaptive design
      • Fixed design (ok, this is NOT adaptive)- treat all respondents the same
      • Preplanned adaptive- tailor mailing efforts in advance based on response propensity models
      • Real-time adaptive- adjust mailing efforts in response to real-time response data and evolving response propensities
    • Important aspect of adaptive design: document decisions and evaluate success, re-evaluate future strategy
    • What groups are under-responding and over-responding?
      • Develop propensity models
      • Design modes accordingly
      • Save $ by focusing resources
    • NSCG used adaptive design

Session 5: Public Opinion, Policy & Communication

  • Marital status checklist: categories not mutually exclusive- checkboxes
    • Cain conducted a meta-analysis of federal survey practices
    • Same sex marriage
      • Because of DOMA, federal agencies were not able to use same sex data. Now that it’s been struck down, the question is more important, has funding and policy issues resting on it
      • Exploring measurement:
        • Review of research
        • Focus groups
        • Cognitive interviews
        • Quantitative testing ß current phase
  • Estimates of same sex marriage dramatically inflated by straight people who select gender incorrectly (size/scope/scale)
  • ACS has revised marriage question
  • Instead of mother, father, parent 1, parent 2, …
    • Yields more same sex couples
    • Less nonresponse overall
    • Allow step, adopted, bio, foster, …
    • Plain language
      • Plain Language Act of 2010
      • See handout on plain language for more info
      • Pretty much just good writing practice in general
      • Data visualization makeovers using Tufte guidance
        • Maybe not ideal makeovers, but the data makeover idea is a fun one. I’d like to see a data makeover event of some kind…

Session 7: Questionaire Design and Evaluation

  • Getting your money’s worth! Targeting Resources to Make Cognitice Interviews Most Effective
    • When choosing a sample for cognitive interviews, focus on the people who tend to have the problems you’re investigating. Otherwise, the likelihood of choosing someone with the right problems is quite low
    • AIR experiment: cognitive interviews by phone
      • Need to use more skilled interviewers by phone, because more probing is necessary
      • Awkward silences more awkward without clues to what respondent is doing
      • Hard to evaluate graphics and layout by phone
      • When sharing a screen, interviewer should control mouse (they learned this the hard way)
      • ON the Plus side: more convenient for interviewee and interviewer, interviewers have access to more interviewees, data quality similar, or good enough
      • Try Skype or something?
      • Translation issues (much of the cognitive testing centered around translation issues- I’m not going into detail with them here, because these don’t transfer well from one survey to the next)
        • Education/internationall/translation: They tried to assign equivalent education groups and reflect their equivalences in the question, but when respondents didn’t agree to the equivalences suggested to them they didn’t follow the questions as written

Poster session

  • One poster was laid out like candy land. Very cool, but people stopped by more to make jokes than substantive comments
  • One poster had signals from interviews that the respondent would not cooperate, or 101 signs that your interview will not go smoothly. I could see posting that in an interviewer break room…

Session 8: Identifying and Repairing Measurement and Coverage Errors

  • Health care reform survey: people believe what they believe in spite of the terms and definitions you supply
  • Paraphrased Groves (1989:449) “Although survey language can be standardized, there is no guarantee that interpretation will be the same”
  • Politeness can be a big barrier in interviewer/respondent communication
  • Reduce interviewer rewording
  • Be sure to bring interviewers on board with project goals (this was heavily emphasized on AAPORnet while we were at this conference- the importance of interviewer training, valuing the work of the interviewers, making sure the interviewers feel valued, collecting interviewer feedback and restrategizing during the fielding period and debriefing the interviewers after the fielding period is done)
  • Response format effects when measuring employment: slides requested

Takeaways from the DC AAPOR & WSS Summer Conference Preview/Review 2013

“The way we do things is unsustainable” – Robert Groves, Census

This week I attended a great conference sponsored by DC-AAPOR. I’m typing up my notes from the sessions to share, but there are a lot of notes. This covers the morning sessions on day 1.

We are coming to a new point of understanding with some of the more recent developments in survey research. For the first time in recent memory, the specter of limited budgets loomed large. Researchers weren’t just asking “How can I do my work better?” but “How can I target my improvements so that my work can be better, faster, and less expensive?”

Session 1: Understanding and Dealing with Nonresponse

  • Researchers have been exploring the potential of nonresponse propensity modeling for a while. In the past, nonresponse propensities were used as a way to cut down on bias and draw samples that should yield to a more representative response group.
  • In this session, nonresponse propensity modeling was seen as a way of helping to determine a cutoff point in survey data collection.
  • Any data on mode propensity for individual respondents (in longitudinal surveys) or groups of respondents can be used to target people in their likely best mode from the beginning, instead of treating all respondents to the same mailing strategy. This can drastically reduce field time and costs.
  • Prepaid incentives have become accepted best practice in the world of incentives
  • Our usual methods of contact are continually less successful. It’s good to think outside the box. (Or inside the box: one group used certified UPS mail to deliver prepaid incentives)
  • Dramatic increases in incentives dramatically increased response rates and lowered field times significantly
  • Larger lag times in longitudinal surveys led to a larger dropoff in response rate
  • Remember Leverage Salience Theory- people with a vested interest in a survey are more likely to respond (something to keep in mind when writing invitations, reminders, and other respondent materials, etc.)
  • Nonresponse propensity is important to keep in mind in the imputation phase as well as the mailing or fielding phase of a survey
  • Re-engaging respondents in longitudinal surveys is possible. Recontacting can be difficult, esp. finding updated contact information. It would be helpful to share strategies re: maiden names, Spanish names, etc.


Session 2: Established Modes & New Technologies

  • ACASI>CAPI in terms of sensitive info
  • Desktop & mobile respondents follow similar profiles, vary significantly from distribution of traditional respondent profiles
  • Mobile respondents log frequent re-entries onto the surveys, so surveys must allow for saved progress and reentry
  • Mobile surveys that weren’t mobile optimized had the same completion rates as mobile surveys that were optimized. (There was some speculation that this will change over time, as web optimization becomes more standard)
  • iPhones do some mobile optimization of their own (didn’t yield higher complete rates, though, just a prettier screenshot)
  • The authors of the Gallup paper (McGeeney & Marlar) developed a best practices matrix- I requested a copy
  • Smartphone users are more likely to take a break while completing a survey (according to paradata based on OS)
  • This session boasted a particularly fun presentation by Paul Schroeder (abt SRBI) about distracted driving (a mobile survey! Hah!) in which he “saw the null hypothesis across a golden field, and they ran toward each other and embraced.” He used substantive responses, demographics, etc. to calculate the ideal number of call attempts for different survey subgroups. (This takes me back to a nonrespondent from a recent survey we fielded with a particularly large number of contact attempts, who replied to an e-mail invitation to ask if we had any self-respect left at that point)

Language use & gaps in STEM education

Today our microanalytic research group focused on videos of STEM education.


Watching STEM classes reminds me of a field trip a fellow researcher and I took to observe a physics class that used project based learning. Project based learning is a more hands on and collaborative teaching approach which is gaining popularity among physics educators as an alternative to traditional lecture. We observed an optics lab at a local university, and after the class we spoke about what we had observed. Whereas the other researcher had focused on the optics and math, I had been captivated by the awkwardness of the class. I had never envisioned the PJBL process to be such an awkward one!


The first video that we watched today involved a student interchangeably using the terms chart and graph and softening their use with the term “thing.” There was some debate among the researchers as to whether the student had known the answer but chosen to distance himself from the response or whether the student was hedging because he was uncertain. The teacher responded by telling the student not to talk about things, but rather to talk to her in math terms.


What does it mean to understand something in math? The math educators in the room made it clear that a lack of the correct terminology signaled that the student didn’t necessarily understand the subject matter. There was no way for the teacher to know whether the student knew the difference between a chart and a graph from their use of the terms. The conversation on our end was not about the conceptual competence that the student showed. He was at the board, working through the problem, and he had begun his interaction with a winding description of the process necessary (as he imagined it) to solve the problem. It was clear that he did understand the problem and the necessary steps to solve it on some level (whether correct or not), but that level of understanding was not one that mattered.


I was surprised at the degree to which the use of mathematical language was framed as a choice on the part of the student. The teacher asked the student to use mathematical language with her. One math educator in our group spoke about students “getting away with fudging answers.” One researcher said that the correct terms “must be used,” and another commented about the lack of correct terms as indication that the student did “not have a proper understanding” of the material. All of this talk seems to bely the underlying truth that the student chose to use inexact language for a reason (whether social or epistemic).


The next video we watched showed a math teacher working through a problem. I was really struck by her lack of enthusiasm. I noticed her sighs, her lack of engagement with the students even when directly addressing them, and her tone when reading the problem from the textbook. Despite her apparent lack of enthusiasm, her mood appeared considerably brighter when she finished working through the problem. I found this interesting, because physics teachers usually report that their favorite part of their job is watching the students’ “a-ha!” moments. Maybe the rewards of technical problem solving are a motivator for both students and teachers alike? But the process of technical problem solving itself is rarely as motivating.


All of this leads me to one particularly interesting question. How do people in STEM learning settings distance themselves from the material? What discursive tools do they use? Who uses these discursive tools? And does the use of these tools change over time? I wonder in particular whether discursive distancing, which often parallels female discursive patterns, is more common among females than males as they progress through their education? Is there any kind of quantitative correlate to the use of discursive distancing? Is it more common among people who believe that they aren’t good at STEM? Is discursive distancing less common among people who pursue STEM careers? Is there a correlation between distancing and test scores?


Awkwardness in STEM education is fertile ground for qualitative researchers. To what extent is the learning or solving process emphasized and to what extent is the answer valued above all else? How is mathematical language socialized? The process of solving technical problems is a messy and uncomfortable one. It rarely goes smoothly, and in fact challenges often lead to more challenges. The process of trying and failing or trying and learning is not a sexy or attractive one, and there is rampant concern that focusing on the process of learning robs students of the ability to demonstrate their knowledge in a way that matters to people who speak the traditional languages of math and science.


We spoke a little about the phenomena of connected math. It sounds to me very closely parallel to project based learning initiatives in physics. I was left wondering why such a similar teaching process could be valued as a teaching tool for all students in one field and relegated to a teaching tool for struggling students in another neighboring field. I wonder about the similarities and differences between the outcomes of these methods. Much of this may rest on politics, and I suspect that the politics are rooted in deeply held and less questioned beliefs about STEM education.


STEM education initiatives have grown quite a bit in recent years, and it’s clear that there is quite a bit of interesting research left to be done.

Upcoming DC Event: Online Research Offline Lunch

ETA: Registration for this event is now CLOSED. If you have already signed up, you will receive a confirmation e-mail shortly. Any sign-ups after this date will be stored as a contact list for any future events. Thank you for your interest! We’re excited to gather with such a diverse and interesting group.


Are you in or near the DC area? Come join us!

Although DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. This lunch will provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives.

Date & Time: August 6, 2013, 12:30 p.m.

Location: Near Gallery Place or Metro Center. Once we have a rough headcount, we’ll choose an appropriate location. (Feel free to suggest a place!)

Please RSVP using this form:

Representativeness, qual & quant, and Big Data. Lost in translation?

My biggest challenge in coming from a quantitative background to a qualitative research program was representativeness. I came to class firmly rooted in the principle of Representativeness, and my classmates seemed not to have any idea why it mattered so much to me. Time after time I would get caught up in my data selection. I would pose the wider challenge of representativeness to a colleague, and they would ask “representative of what? why?”


In the survey research world, the researcher begins with a population of interest and finds a way to collect a representative sample of the population for study. In the qualitative world that accompanies survey research units of analysis are generally people, and people are chosen for their representativeness. Representativeness is often constructed by demographic characteristics. If you’ve read this blog before, you know of my issues with demographics. Too often, demographic variables are used as a knee jerk variable instead of better considered variables that are more relevant to the analysis at hand. (Maybe the census collects gender and not program availability, for example, but just because a variable is available and somewhat correlated doesn’t mean that it is in fact a relevant variable, especially when the focus of study is a population for whom gender is such an integral societal difference.)


And yet I spent a whole semester studying 5 minutes of conversation between 4 people. What was that representative of? Nothing but itself. It couldn’t have been exchanged for any other 5 minutes of conversation. It was simply a conversation that this group had and forgot. But over the course of the semester, this piece of conversation taught me countless aspects of conversation research. Every time I delved back into the data, it became richer. It was my first step into the world of microanalysis, where I discovered that just about anything can be a rich dataset if you use it carefully. A snapshot of people at a lecture? Well, how are their bodies oriented? A snapshot of video? A treasure trove of gestures and facial expressions. A piece of graffiti? Semiotic analysis! It goes on. The world of microanalysis is built on the practice of layered noticing. It goes deeper than wide.


But what is it representative of? How could a conversation be representative? Would I need to collect more conversations, but restrict the participants? Collect conversations with more participants, but in similar contexts? How much or how many would be enough?


In the world of microanalysis, people and objects constantly create and recreate themselves. You consistently create and recreate yourself, but your recreations generally fall into a similar range that makes you different from your neighbors. There are big themes in small moments. But what are the small moments representative of? Themselves. Simply, plainly, nothing more and nothing else. Does that mean that they don’t matter? I would argue that there is no better way to understand the world around us in deep detail than through microanalysis. I would also argue that macroanalysis is an important part of discovering the wider patterns in the world around us.


Recently a NY Times blog post by Quentin Hardy has garnered quite a bit of attention.

Why Big Data is Not Truth:

This post has really struck a chord with me, because I have had a hard time understanding Hardy’s complaint. Is big data truth? Is any data truth? All data is what it is; a collection of some sort, collected under a specific set of circumstances. Even data that we hope to be more representative has sampling and contextual limitations. Responsible analysts should always be upfront about what their data represents. Is big data less truthful than other kinds of data? It may be less representative than, say, a systematically collected political poll. But it is what it is: different data, collected under different circumstances in a different way. It shouldn’t be equated with other data that was collected differently. One true weakness of many large scale analyses is the blindness to the nature of the data, but that is a byproduct of the training algorithms that are used for much of the analysis. The algorithms need large training datasets, from anywhere. These sets often are developed through massive web crawlers. Here, context gets dicey. How does a researcher represent the data properly when they have no idea what it is? Hopefully researchers in this context will be wholly aware that, although their data has certain uses, it also has certain [huge] limitations.


I suspect that Hardy’s complaint is with the representations of massive datasets collected from webcrawlers as a complete truth from which any analyses could be run and all of the greater truths of the world could be revealed. On this note, Hardy is exactly right. Data simply is what it is, nothing more and nothing less. And any analysis that focuses on an unknown dataset is just that: an analysis without context. Which is not to say that all analyses need to be representative, but rather that all responsible analyses of good quality need to be self aware. If you do not know what the data represents and when and how it was collected, then you cannot begin to discuss the usefulness of any analysis of it.

The curse of the elevator speech

Yesterday I was involved in an innocent watercooler chat in which I was asked what Sociolinguistics is. This should be an easy enough question, because I just got a master’s degree in it. But it’s not. Sociolinguistics is a large field that means different things to different people. For every way of studying language, there are social and behavioral correlates that can also be studied. So a sociolinguist could focus on any number of linguistic areas, including phonology, syntax, semantics, or, in my case, discourse. My studies focus on the ways in which people use language, and the units of analysis in my studies are above the sentence level. Because Linguistics is such a large and siloed field, explaining Sociolinguistics through the lens of discourse analysis feels a bit like explaining vegetarianism through a pescatarian lens. The real vegetarians and the real linguists would balk.

There was a follow up question at the water cooler about y’all. “Is it a Southern thing?” My answer to this was so admittedly lame that I’ve been trying to think of a better one (sometimes even the most casual conversations linger, don’t they?).

My favorite quote of this past semester was from Jan Blommaert: “Language reflects a life, and not just a birth, and it is a life that is lived in a real sociocultural, historical and political space” Y’all has long been considered a southernism, but when I think back to my own experience with it, it was never about southern language or southern identity. One big clue to this is that I do sometimes use y’all, but I don’t use other southern language features along with it.

If I wanted to further investigate y’all from a sociolinguistic perspective, I would take language samples, either from one or a variety of speakers (and this sampling would have clear, meaningful consequences) and track the uses of y’all to see when it was invoked and what function it serves when invoked. My best, uninformed guess is that it does relational work and invokes registers that are more casual and nonthreatening. But without data, that is nothing but an uninformed guess.

This work has likely been done before. It would be interesting to see.
(ETA: Here is an example of this kind of work in action, by Barbara Johnstone)

What is the role of Ethnography and Microanalysis in Online Research?

There is a large disconnect in online research.

The largest, most profile, highest value and most widely practiced side of online research was created out of a high demand to analyze the large amount of consumer data that is constantly being created and largely public available. This tremendous demand led to research methods that were created in relative haste. Math and programming skills thrived in a realm where social science barely made a whisper. The notion of atheoretical research grew. The level of programming and mathematical competence required to do this work continues to grow higher every day, as the fields of data science and machine learning become continually more nuanced.

The largest, low profile, lowest value and increasingly more practiced side of online research is the academic research. Turning academia toward online research has been like turning a massive ocean liner. For a while online research was not well respected. At this point it is increasingly well respected, thriving in a variety of fields and in a much needed interdisciplinary way, and driven by a search for a better understanding of online behavior and better theories to drive analyses.

I see great value in the intersection between these areas. I imagine that the best programmers have a big appetite for any theory they can use to drive their work in a useful and productive ways. But I don’t see this value coming to bear on the market. Hiring is almost universally focused on programmers and data scientists, and the microanalytic work that is done seems largely invisible to the larger entities out there.

It is common to consider quantitative and qualitative research methods as two separate languages with few bilinguals. At the AAPOR conference in Boston last week, Paul Lavarakas mentioned a book he is working on with Margaret Roller which expands the Total Survey Error model to both quantitative and qualitative research methodology. I spoke with Margaret Roller about the book, and she emphasized the importance of qualitative researchers being able to talk more fluently and openly about methodology and quality controls. I believe that this is, albeit a huge challenge in wording and framing, a very important step for qualitative research, in part because quality frameworks lend credibility to qualitative research in the eyes of a wider research community. I wish this book a great deal of success, and I hope that it is able to find an audience and a frame outside the realm of survey research (Although survey research has a great deal of foundational research, it is not well known outside of the field, and this book will merit a wider audience).

But outside of this book, I’m not quite sure where or how the work of bringing these two distinct areas of research can or will be done.

Also at the AAPOR conference last week, I participated in a panel on The Role of Blogs in Public Opinion Research (intro here and summary here). Blogs serve a special purpose in the field of research. Academic research is foundational and important, but the publish rate on papers is low, and the burden of proof is high. Articles that are published are crafted as an argument. But what of the bumps along the road? The meditations on methodology that arise? Blogs provide a way for researchers to work through challenges and to publish their failures. They provide an experimental space where fields and ideas can come together that previously hadn’t mixed. They provide a space for finding, testing, and crossing boundaries.

Beyond this, they are a vehicle for dissemination. They are accessible and informally advertised. The time frame to publish is short, the burden lower (although I’d like to believe that you have to earn your audience with your words). They are a public face to research.

I hope that we will continue to test these boundaries, to cross over barriers like quantitative and qualitative that are unhelpful and obtrusive. I hope that we will be able to see that we all need each other as researchers, and the quality research that we all want to work for will only be achieved through the mutual recognition that we need.

Revisiting Latino/a identity using Census data

On April 10, I attended a talk by Jennifer Leeman (Research Sociolinguist @Census and Assistant Professor @George Mason) entitled “Spanish and Latino/a identity in the US Census.” This was a great talk. I’ll include the abstract below, but here are some of her main points:

  • Census categories promote and legitimize certain understandings, particularly because the Census, as a tool of the government, has an appearance of neutrality
  • Census must use categories from OMB
  • The distinction between race and ethnicity is fuzzy and full of history.
    • o   In the past, this category has been measured by surname, mothertongue, birthplace
      o   Treated as hereditary (“perpetual foreigner” status)
      o   Self-id new, before interviewer would judge, record
  • In the interview context, macro & micro meet
    • o   Macro level demographic categories
    • o   Micro:
      • Interactional participant roles
      • Indexed through labels & structure
      • Ascribed vs claimed identities
  • The study: 117 telephone interviews in Spanish
    • o   2 questions, ethnicity & race
    • o   Ethnicity includes Hispano, Latino, Español
      • Intended as synonyms but treated as a choice by respondents
      • Different categories than English (Adaptive design at work!)
  • The interviewers played a big role in the elicitation
    • o   Some interviewers emphasized standardization
      • This method functions differently in different conversational contexts
    • o   Some interviewers provided “teaching moments” or on-the-fly definitions
      • Official discourses mediated through interviewer ideologies
      • Definitions vary
  • Race question also problematic
    • o   Different conceptions of Indioamericana
      • Central, South or North American?
  • Role of language
    • o   Assumption of monolinguality problematic, bilingual and multilingual quite common, partial and mixed language resources
    • o   “White” spoken in English different from “white” spoken in Spanish
    • o   Length of time in country, generation in country belies fluid borders
  • Coding process
    • o   Coding responses such as “American, born here”
    • o   ~40% Latino say “other”
    • o   Other category ~ 90% Hispanic (after recoding)
  • So:
    • o   Likely result: one “check all that apply” question
      • People don’t read help texts
    • o   Inherent belief that there is an ideal question out there with “all the right categories”
      • Leeman is not yet ready to believe this
    • o   The takeaway for survey researchers:
      • Carefully consider what you’re asking, how you’re asking it and what information you’re trying to collect
  • See also Pew Hispanic Center report on Latino/a identity




Censuses play a crucial role in the institutionalization and circulation of specific constructions of national identity, national belonging, and social difference, and they are a key site for the production and institutionalization of racial discourse (Anderson 1991; Kertzer & Arel 2002; Nobles 2000; Urla 1994).  With the recent growth in the Latina/o population, there has been increased interest in the official construction of the “Hispanic/Latino/Spanish origin” category (e.g., Rodriguez 2000; Rumbaut 2006; Haney López 2005).  However, the role of language in ethnoracial classification has been largely overlooked (Leeman 2004). So too, little attention has been paid to the processes by which the official classifications become public understandings of ethnoracial difference, or to the ways in which immigrants are interpellated into new racial subjectivities.

This presentation addresses these gaps by examining the ideological role of Spanish in the history of US Census Bureau’s classifications of Latina/os as well as in the official construction of the current “Hispanic/Latino/Spanish origin” category. Further, in order to gain a better understanding of the role of the census-taking in the production of new subjectivities, I analyze Spanish-language telephone interviews conducted as part of Census 2010.  Insights from recent sociocultural research on the language and identity (Bucholtz and Hall 2005) inform my analysis of how racial identities are instantiated and negotiated, and how respondents alternatively resist and take up the identities ascribed to them.

* Dr. Leeman is a Department of Spanish & Portuguese Graduate (GSAS 2000).

Digital Democracy Remixed

I recently transitioned from my study of the many reasons why the voice of DC taxi drivers is largely absent from online discussions into a study of the powerful voice of the Kenyan people in shaping their political narrative using social media. I discovered a few interesting things about digital democracy and social media research along the way, and the contrast between the groups was particularly useful.

Here are some key points:

  • The methods of sensemaking that journalists use in social media is similar to other methods of social media research, except for a few key factors, the most important of which is that the bar for verification is higher
  • The search for identifiable news sources is important to journalists and stands in contrast with research methods that are built on anonymity. This means that the input that journalists will ultimately use will be on a smaller scale than the automated analyses of large datasets widely used in social media research.
  • The ultimate information sources for journalists will be small, but the phenomena that will capture their attention will likely be big. Although journalists need to dig deep into information, something in the large expanse of social media conversation must capture or flag their initial attention
  • It takes some social media savvy to catch the attention of journalists. This social media savvy outweighs linguistic correctness in the ultimate process of getting noticed. Journalists act as intermediaries between social media participants and a larger public audience, and part of the intermediary process is language correcting.
  • Social media savvy is not just about being online. It is about participating in social media platforms in a publicly accessible way in regards to publicly relevant topics and using the patterned dialogic conventions of the platform on a scale that can ultimately draw attention. Many people and publics go online but do not do this.

The analysis of social media data for this project was particularly interesting. My data source was the comments following this posting on the Al Jazeera English Facebook feed.


It evolved quite organically. After a number of rounds of coding I noticed that I kept drawing diagrams in the margins of some of the comments. I combined the diagrams into this framework:


Once this framework was built, I looked closely at the ways in which participants used this framework. Sometimes participants made distinct discursive moves between these levels. But when I tried to map the participants’ movements on their individual diagrams, I noticed that my depictions of their movements rarely matched when I returned to a diagram. Although my coding of the framework was very reliable, my coding of the movements was not at all. This led me to notice that oftentimes the frames were being used more indexically. Participants were indexing levels of the frame, and this indexical process created powerful frame shifts. So, on the level of Kenyan politics exclusively, Uhuru’s crimes had one meaning. But juxtaposed against the crimes of other national leaders’ Uhuru’s crimes had a dramatically different meaning. Similarly, when the legitimacy of the ICC was questioned, the charges took on a dramatically different meaning. When Uhuru’s crimes were embedded in the postcolonial East vs West dynamic, they shrunk to the degree that the indictments seemed petty and hypocritical. And, ultimately, when religion was invoked the persecution of one man seemed wholly irrelevant and sacrilegious.

These powerful frame shifts enable the Kenyan public to have a powerful, narrative changing voice in social media. And their social media savvy enables them to gain the attention of media sources that amplify their voices and thus redefine their public narrative.


Instagram is changing the way I see

I recently joined Instagram (I’m late, I know).

I joined because my daughter wanted to, because her friends had, to see what it was all about. She is artistic, and we like to talk about things like color combinations and camera angles, so Instagram is a good fit for us. But it’s quickly changing the way I understand photography. I’ve always been able to set up a good shot, and I’ve always had an eye for color. But I’ve never seriously followed up on any of it. It didn’t take long on Instagram to learn that an eye for framing and color is not enough to make for anything more than accidental great shots. The great shots that I see are the ones that pick deeper patterns or unexpected contrasts out of seemingly ordinary surroundings. They don’t simply capture beauty, they capture an unexpected natural order or a surprising contrast, or they tell a story. They make you gasp or they make you wonder. They share a vision, a moment, an insight. They’re like the beginning paragraph of a novel or the sketch outline of a poem. Realizing that, I have learned that capturing the obvious beauty around me is not enough. To find the good shots, I’ll need to leave my comfort zone, to feel or notice differently, to wonder what or who belongs in a space and what or who doesn’t, and why any of it would capture anyone’s interest. It’s not enough to see a door. I have to wonder what’s behind it. To my surprise, Instagram has taught me how to think like a writer again, how to find hidden narratives, how to feel contrast again.

Sure this makes for a pretty picture. But what is unexpected about it? Who belongs in this space? Who doesn't? What would catch your eye?

Sure this makes for a pretty picture. But what is unexpected about it? Who belongs in this space? Who doesn’t? What would catch your eye?

This kind of change has a great value, of course, for a social media researcher. The kinds of connections that people forge on social media, the different ways in which people use platforms and the ways in which platforms shape the way we interact with the world around us, both virtual and real, are vitally important elements in the research process. In order to create valid, useful research in social media, the methods and thinking of the researcher have to follow closely with the methods and thinking of the users. If your sensemaking process imitates the sensemaking process of the users, you know that you’re working in the right direction, but if you ignore the behaviors and goals of the users, you have likely missed the point altogether. (For example, if you think of Twitter hashtags simply as an organizational scheme, you’ve missed the strategic, ironic, insightful and often humorous ways in which people use hashtags. Or if you think that hashtags naturally fall into specific patterns, you’re missing their dialogic nature.)

My current research involves the cycle between social media and journalism, and it runs across platforms. I am asking questions like ‘what gets picked up by reporters and why?’ and ‘what is designed for reporters to pick up?’ And some of these questions lead me to examine the differences between funny memes that circulate like wildfire through Twitter leading to trends and a wider stage and the more indepth conversation on public facebook pages, which cannot trend as easily and is far less punchy and digestible. What role does each play in the political process and in constituting news?

Of course, my current research asks more questions than these, but it’s currently under construction. I’d rather not invite you into the workzone until some of the pulp and debris have been swept aside…