What do all of these polling strategies add up to?

Yesterday was a big first for research methodologists across many disciplines. For some of the newer methods, it was the first election that they could be applied to in real time. For some of the older methods, this election was the first to bring competing methodologies, and not just methodological critiques.

Real time sentiment analysis from sites like this summarized Twitter’s take on the election. This paper sought to predict electoral turnout using google searches. InsideFacebook attempted to use Facebook data to track voting. And those are just a few of a rapid proliferation of data sources, analytic strategies and visualizations.

One could ask, who are the winners? Some (including me) were quick to declare a victory for the well honed craft of traditional pollsters, who showed that they were able to repeat their studies with little noise, and that their results were predictive of a wider real world phenomena. Some could call a victory for the emerging field of Data Science. Obama’s Chief Data Scientist is already beginning to be recognized. Comparisons of analytic strategies will spring up all over the place in the coming weeks. The election provided a rare opportunity where so many strategies and so many people were working in one topical area. The comparisons will tell us a lot about where we are in the data horse race.

In fact, most of these methods were successful predictors in spite of their complicated underpinnings. The google searches took into account searches for variations of “vote,” which worked as a kind of reliable predictor but belied the complicated web of naturalistic search terms (which I alluded to in an earlier post about the natural development of hashtags, as explained by Rami Khater of Al Jezeera’s The Stream, a social network generated newscast). I was a real-world example of this methodological complication. Before I went to vote, I googled “sample ballot.” Similar intent, but I wouldn’t have been caught in the analyst’s net.

If you look deeper at the Sentiment Analysis tools that allow you to view the specific tweets that comprise their categorizations, you will quickly see that, although the overall trends were in fact predictive of the election results, the data coding was messy, because language is messy.

And the victorious predictive ability of traditional polling methods belies the complicated nature of interviewing as a data collection technique. Survey methodologists work hard to standardize research interviews in order to maximize the reliability of the interviews. Sometimes these interviews are standardized to the point of recording. Sometimes the interviews are so scripted that interviewers are not allowed to clarify questions, only to repeat them. Critiques of this kind of standardization are common in survey methodology, most notably from Nora Cate Schaeffer, who has raised many important considerations within the survey methodology community while still strongly supporting the importance of interviewing as a methodological tool. My reading assignment for my ethnography class this week is a chapter by Charles Briggs from 1986 (Briggs – Learning how to ask) that proves that many of the new methodological critiques are in fact old methodological critiques. But the critiques are rarely heeded, because they are difficult to apply.

I am currently working on a project that demonstrates some of the problems with standardizing interviews. I am revising a script we used to call a representative sample of U.S. high schools. The script was last used four years ago in a highly successful effort that led to an admirable 98% response rate. But to my surprise, when I went to pull up the old script I found instead a system of scripts. What was an online and phone survey had spawned fax and e-mail versions. What was intended to be a survey of principals now had a set of potential respondents from the schools, each with their own strengths and weaknesses. Answers to common questions from school staff were loosely scripted on an addendum to the original script. A set of tips for phonecallers included points such as “make sure to catch the name of the person who transfers you, so that you can specifically say that Ms X from the office suggested I talk to you” and “If you get transferred to the teacher, make sure you are not talking to the whole class over the loudspeaker.”

Heidi Hamilton, chair of the Georgetown Linguistics department, often refers to conversation as “climbing a tree that climbs back.” In fact, we often talk about meaning as mutually constituted between all of the participants in a conversation. The conversation itself cannot be taken outside of the context in which it lives. The many documents I found from the phonecallers show just how relevant these observations can be in an applied research environment.

The big question that arises from all of this is one of a practical strategy. In particular, I had to figure out how to best address the interview campaign that we had actually run when preparing to rerun the campaign we had intended to run. My solution was to integrate the feedback from the phonecallers and loosen up the script. But I suspect that this tactic will work differently with different phonecallers. I’ve certainly worked with a variety of phonecallers, from those that preferred a script to those that preferred to talk off the cuff. Which makes the best phonecaller? Neither. Both. The ideal phonecaller works with the situation that is presented to them nimbly and professionally while collecting complete and relevant data from the most reliable source. As much of the time as possible.

At this point, I’ve come pretty far afield of my original point, which is that all of these competing predictive strategies have complicated underpinnings.

And what of that?

I believe that the best research is conscious of its strengths and weaknesses and not afraid to work with other strategies in order to generate the most comprehensive picture. As we see comparisons and horse races develop between analytic strategies, I think the best analyses we’ll see will be the ones that fit the results of each of the strategies together, simultaneously developing a fuller breakdown of the election and a fuller picture of our new research environment.

Education from the Bottom Up?

Last night I attended a talk by Shirley Bryce Heath about her new book, Words at Work and Play, moderated by Anne Harper Charity Hudley and Frederick Erickson. Dr Bryce Heath has been following a group of 300 families for 30 years, and in her talk she addressed many of the changes she’d seen in the kids in the time she’d been observing them. She made one particularly interesting point. She mentioned that the world of assessment, and, in fact much of the adult world hasn’t kept up with the kids’ evolution. The assessments that we subject kids to are traditional, reflecting traditional values and sources. She went as far as to say that we don’t know how to see, appreciate or notice these changes, and she pointed out that much of new styles of learning came outside of the school environment.

This part of her talk reminded me of an excellent blog post I read yesterday about unschooling. Unschooling is the process of learning outside of a structured environment. It goes further than homeschooling, which can involve structured curricula. It is curricularly agnostic and focused on the learning styles, interests, and natural motivation of the students. I mentioned the blog post to Terrence Wiley, president of the Center for Applied Linguistics, and he emphasized the underlying idealism of unschooling. It rests on the basic belief that everyone is naturally academically motivated and interested and will naturally embrace learning, in their own way, given the freedom to do it. Unschooling is, as some would say my “spirit animal.” I don’t have the time or the resources to do it with my own kids, and I’m not sure I would even if I were fully able to do it. I have no idea how it could be instituted in any kind of egalitarian or larger scale way. But I still love the idea, in all it’s unpracticality. (Dr Wiley gave me a few reading assignments, explaining that ‘everything old in education is new again’)

Then today I read a blog about the potential of using Wikipedia as a textbook. This idea is very striking, not just because Wikipedia was mostly accurate, freely available, covered the vast majority of the material in this professor’s traditional textbooks, and has an app that will help anyone interested create a custom textbook, but because it actually addresses what kids do anyway! Just this past weekend, my daughter was writing a book report, and I kept complaining that she chose to use Wikipedia to look up the spelling of a character’s name rather than walk upstairs and grab the book. Kids use Wikipedia often and for all kinds of things, and it is often more common for parents and educators to forbid or dismiss this practice than to jump right in with them. I suggest that the blogger not only use Wikipedia, but use the text as a way to show what is or is not accurate, how to tell, and where to find other credible, collaborative sources when it doubt. What an amazing opportunity!

So here’s the question that all of this has been leading to: Given that the world around is is rapidly changing and that our kids are more adept at staying abreast of these changes than they are, could it be time to turn the old expert-novice/ teacher-student paradigm on its head, at least in part? Maybe we need to find ways to let some knowledge come from the bottom up. Maybe we need to let them be the experts. Maybe we need to, at least in part, rethink our role in the educating process?

Frederick Erickson made an excellent point about teaching “You have to learn your students in order to teach them.” He talked about spending the first few days in a class gathering the expertise of the students, and using that knowledge when creating assignments or assigning groups. (I believe Dr Hudley mentioned that she did this, too. Or maybe he supplied the quote, and she supplied the example?)

All of this makes me wonder what the potential is for respecting the knowledge and expertise of the students, and working from there. What does bottom-up or student-led education look like? How can it be integrated into the learning process in order to make it more responsive, adaptive and modern?

Of course, this is as much a dream for a wider society as unschooling is for my own family. To a large extent, practicality shoots it all in the foot with the starting gun. But a girl can dream, no?

I conducted my first diversity training today…

One of the perks of my grad program is learning how to conduct diversity training.

Today I was able to put that skill to use for the first time. I conducted a workshop for a local parents group about Talking with your Kids about Race and Diversity. I co-facilitated it with Elvira Magomedova, a recent graduate from the MLC program who has more experience and more of a focus in this area. It was a really interesting and rewarding experience.

We did 4 activities:

1. We introduced ourselves by telling our immigration stories. I saw this last week at an open house at my daughter’s middle school, and it profoundly reminded me about the personal ways in which we all embody global history and the immigrant nature of the US. Between feuding clans in Ireland,  narrow escapes from the holocaust and traveling singers in Europe, this exercise is both powerful and fun. Characters and events really come alive, and everyone is left on a more equal footing.

2. For the 2nd activity, we explored the ways in which we identify ourselves. We each put a circle in the center of a sheet of paper, an then we added four bubble spokes with groups or cultures or ways in which we identify ourselves. The exercise came from Cultural Awareness Learning Module One. At the bottom of the page, we explored these relationships more deeply, e.g. “I’m a parent, but I’m not a stay at home parent” or “I’m Muslim, but I’m not practicing my religion.” We spoke in depth about our pages in pairs and then shared some with the group.

3. This is a fun activity for parents and kid alike. We split into two groups, culture A and culture B. Each culture has a list of practices, e.g. standing close or far, making eye contact or not, extensive vs minimal greetings or leavetaking, shaking or not shaking hands, … The groups learn, practice, and then mingle. This is a profoundly awkward activity!

After mingling, we get back into the group and discuss the experience. It soon becomes obvious that people take differences in “culture” personally. People complain that it seemed like their interlocuters were just trying to get away from them, or seemed overly interested in them, or…. They also complain about how hard it is to adjust your practices to act in the prescribed way.

This exercise is a good way for people to understand the ways in which conflicting cultural norms play out, and it helps parents to understand how to work out misunderstandings with their kids.

4. Finally, my daughter made a slide show of people from all over the world. The people varied in countless physical ways from each other, and we used them to stimulate conversation about physical differences. As adults, we tend to ascribe a bevvy of sociological baggage to these physical differences, but the reality is that, unless we’re Steven Colbert, there are striking physical differences between people. As parents, we are often taken aback when our kids speak openly about differences that we’ve grown accustomed to not talking about. It’s natural and normal to wonder how to handle these observations.

The upshot of this conversation is that describing anyone by a single physical category doesn’t really make sense. If you’re talking about a physical description of someone, you have a number of physical features to comment on. Whereas referring to anyone by a single physical feature could be offensive, a more detailed description is simply a more accurate physical description. We don’t have to use judgmental words, like “good hair,” but that shouldn’t stop us from talking about curly, straight, wavy, thick or thin. We can talk about people in terms of their height or body shape, face shape, hair texture, color or style, eye shape or color, mouth shape, ear size, nose style, skin tone, and so much more. Artificial racial or ethnic groupings don’t *really* describe what someone looks like, talks like, or has experienced.

More than this, once we have seen people in any kind of action, we have their actions and our relationship with them to use as resources. Given all of those resources, choosing race or ethnicity as a first descriptive level with our kids, or even using that descriptor and stopping, sends the message to the kids that that is the only feature that matters. It draws boundaries before it begins conversations. It passes “us and them” along.

Race and ethnicity are one way to describe a person, but they are far from the only way. And they, more than any other way, carry the most baggage. Does that mean they should be avoided or declared taboo?

This week in my Ethnography of Communication class, we each went to Gallaudet, the deaf university in DC, and observed. One of my classmates commented about her discomfort with her lack of fluency in ASL, or American Sign Language. Her comment reminded me of my kids and their cousins. My kids speak English, and only a little bit of Amharic and Tigrinya. Some of their cousins only spoke Tigrinya when they met. Some only spoke Swedish. Some spoke English with very different accents. But the language barriers never stopped them from playing with each other.

In fact, we talk about teaching our kids about diversity, but our kids should be the ones to teach us!

Here are the main lessons I’ve learned from my kids:

1. Don’t cut yourself off from people because you don’t share a common language. Communication actually runs much deeper than language. I think, for example, of one of my sisters inlaw. When we first met, we didn’t have a common language. But the more I was able to get to know her over time, the more we share. I really cherish my relationship with her, and I wouldn’t have it if I had let my language concerns get in the way of communicating with her.

2. People vary a lot, strikingly, in physical ways. These are worthy of comment, okay to notice, and important parts of what make people unique.

3. If you cut yourself off from discomfort or potential differences, you draw a line between you and many of the people around you.

4. It is okay to be wrong, or to still be learning. Learning is a lifelong process. Just because we’re adults doesn’t mean we have to have it all down pat. Don’t be afraid to fail, to mess up. Your fear will get you nowhere. How could you have learned anything if you were afraid of messing up?

In sum, this experience was a powerful one and an interesting one. I sincerely hope that the conversations we began will continue.

* Edited to Add:

Thandie Newton TED talk, Embracing Otherness

Chimamanda Adichie TED talk: The danger of a single story

GREAT letter with loads of resources: http://goodmenproject.com/ethics-values/why-i-dont-want-to-talk-about-race/

an interesting article that we read in class: why white parents don’t talk about race

another interesting article: Lippi Green 1997 Teaching Children How to Discriminate

 

Repeating language: what do we repeat, and what does it signal?

Yesterday I attended a talk by Jon Kleinberg entitled “Status, Power & Incentives in Social Media” in Honor of the UMD Human-Computer Interaction Lab’s 30th Anniversary.

 

This talk was dense and full of methods that are unfamiliar to me. He first discussed logical representations of human relationships, including orientations of sentiment and status, and then he ventured into discursive evidence of these relationships. Finally, he introduced formulas for influence in social media and talked about ways to manipulate the formulas by incentivizing desired behavior and deincentivizing less desired behavior.

 

In Linguistics, we talk a lot about linguistic accommodation. In any communicative event, it is normal for participant’s speech patterns to converge in some ways. This can be through repetition of words or grammatical structures. Kleinberg presented research about the social meaning of linguistic accommodation, showing that participants with less power tend to accommodate participants with more power more than participants with more power accommodate participants with less power. This idea of quantifying social influence is a very powerful notion in online research, where social influence is a more practical and useful research goal than general representativeness.

 

I wonder what strategies we use, consciously and unconsciously, when we accommodate other speakers. I wonder whether different forms of repetition have different underlying social meanings.

 

At the end of the talk, there was some discussion about both the constitution of iconic speech (unmarked words assembled in marked ways) and the meaning of norm flouting.

 

These are very promising avenues for online text research, and it is exciting to see them play out.

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

I had wanted to write these notes up into paragraphs, but I think the notes will be more timely, relevant and readable if I share them as they are. This was a really great conference- very relevant and timely- based on a really great issue of Public Opinion Quarterly. As I was reminded at the DC African Festival (a great festival, lots of fun, highly recommended) on Saturday, “In order to understand the future you must embrace the past.”

DC AAPOR Annual Public Opinion Quarterly Special Issue Conference

75th Anniversary Edition

The Past, Present and Future of Survey Methodology and Public Opinion Research

Look out for slides from the event here: http://www.dc-aapor.org/pastevents.php

 

Note: Of course, I took more notes in some sessions than others…

Peter Miller:

–       Adaptive design- tracking changes in estimates across mailing waves and tracking response bias, is becoming standard practice at Census

–       Check out Howard Schuman’s article tracking attitudes toward Christopher Columbus

  • Ended up doing some field research in the public library, reading children’s books

Stanley Presser:

–       Findings have no meaning independent of the method with which they were collected

–       Balance of substance and method make POQ unique (this was a repeated theme)

Robert Groves:

–       The survey was the most important invention in Social Science in the 20th century – quote credit?

–       3 era’s of Survey research (boundaries somewhat arbritrary)

  • 1930-1960
    • Foundation laid, practical development
  • 1960-1990
    • Founders pass on their survey endeavors to their protégés
    • From face to face to phone and computer methods
    • Emergence & Dominance of Dillman method
    • Growth of methodological research
    • Total Survey Error perspective dominates
    • Big increase in federal surveys
    • Expansion of survey centers & private sector organizations
    • Some articles say survey method dying because of nonresponse and inflating costs. This is a perennial debate. Groves speculated that around every big election time, someone finds it in their interest to doubt the polls and assigns a jr reporter to write a piece calling the polls into question.
  • 1990à
    • Influence of other fields, such as social cognitive psychology
    • Nonresponse up, costs up à volunteer panels
    • Mobile phones decrease cost effectiveness of phone surveys
    • Rise of internet only survey groups
    • Increase in surveys
    • Organizational/ business/ management skills more influential than science/ scientists
    • Now: software platforms, culture clash with all sides saying “Who are these people? Why do they talk so funny? Why don’t they know what we know?”
    • Future
      • Rise of organic data
      • Use of administrative data
      • Combining data sets
      • Proprietary data sets
      • Multi-mode
      • More statistical gymnastics

Mike Brick:

  • Society’s demand for information is Insatiable
  • Re: Heckathorn/ Respondent Driven samples
    • Adaptive/ indirect sampling is better
    • Model based methods
      • Missing data problem
      • Cost the main driver now
      • Estimation methods
      • Future
        • Rise of multi-frame surveys
        • Administrative records
        • Sampling theory w/nonsampling errors at design & data collection stages
          • Sample allocation
          • Responsive & adaptive design
          • Undercoverage bias can’t be fixed at the back end
            • *Biggest problem we face*
            • Worse than nonresponse
            • Doug Rivers (2007)
              • Math sampling
              • Web & volunteer samples
              • 1st shot at a theory of nonprobability sampling
            • Quota sampling failed in 2 high profile examples
              • Problem: sample from interviews/ biased
              • But that’s FIXABLE
            • Observational
              • Case control & eval studies
              • Focus on single treatment effect
              • “tougher to measure everything than to measure one thing”

Mick Couper:

–       Mode an outdated concept

  • Too much variety and complexity
  • Modes are multidimensional
    • Degree of interviewer involvement
    • Degree of contact
    • Channels of communication
    • Level of privacy
    • Technology (used by whom?)
    • Synchronous vs. asynchronous
  • More important to look at dimensions other than mode
  • Mode is an attribute of a respondent or item
  • Basic assumption of mixed mode is that there is no difference in responses by mode, but this is NOT true
    • We know of many documented, nonignorable, nonexplainable mode differences
    • Not “the emperor has no clothes” but “the emperor is wearing suggestive clothes”
    • Dilemma: differences not Well understood
      • Sometimes theory comes after facts
      • That’s where we are now- waiting for the theory to catch up (like where we are on nonprobability sampling)

–       So, the case for mixed mode collection so far is mixed

  • Mail w/web option has been shown to have a lower response rate than mail only across 24-26 studies, at least!!
    • (including Dillman, JPSM, …)
    • Why? What can we do to fix this?
    • Sequential modes?
      • Evidence is really mixed
      • The impetus for this is more cost than response rate
      • No evidence that it brings in a better mix of people

–       What about Organic data?

  • Cheap, easily available
  • But good?
  • Disadvantages:
    • One var at a time
    • No covariates
    • Stability of estimates over time?
    • Potential for mischief
      • E.g. open or call-in polls
      • My e.g. #muslimrage
  • Organic data wide, thin
  • Survey data narrow, deep

–       Face to face

  • Benchmark, gold standard, increasingly rare

–       Interviewers

  • Especially helpful in some cases
    • Nonobservation
    • Explaining, clarifying

–       Future

  • Technical changes will drive dev’t
  • Modes and combinations of modes will proliferate
  • Selection bias The Biggest Threat
  • Further proliferation of surveys
    • Difficult for us to distinguish our work from “any idiot out there doing them”

–       Surveys are tools for democracy

  • Shouldn’t be restricted to tools for the elite
  • BUT
  • There have to be some minimum standards

–       “Surveys are tools and methodologists are the toolmakers”

Nora Cate Schaeffer:

–       Jen Dykema read & summarized 78 design papers- her summary is available in the appendix of the paper

–       Dynamic interactive displays for respondent in order to help collect complex data

–       Making decisions when writing questions

  • See flow chart in paper
    • Some decisions are nested
  • Question characteristics
    • E.g. presence or absence of a feature
      • E.g. response choices

Sunshine Hillygus:

–       Political polling is “a bit of a bar trick”

  • The best value in polls is in understanding why the election went the way it did

–       Final note: “The things we know as a field are going to be important going forward, even if it’s not in the way they’ve been used in the past”

Lori Young and Diana Mutz:

–       Biggest issues:

  • Diversity
  • Selective exposure
  • Interpersonal communication

–       2 kinds of search, influence of each

  • Collaborative filter matching, like Amazon
    • Political targeting
    • Contentious issue: 80% of people said that if they knew a politician was targeting them they wouldn’t vote for that candidate
      • My note: interesting to think about peoples relationships with their superficial categories of identity- it’s taken for granted so much in social science research, yet not by the people within the categories

–       Search engines: the new gatekeepers

  • Page rank & other algorithms
  • No one knows what influence personalization of search results will have
  • Study on search learning: gave systematically different input to train engines are (given same start point), results changes Fast and Substantively

Rob Santos:

–       Necessity mother of invention

  • Economic pressure
  • Reduce costs
  • Entrepreneurial spirit
  • Profit
  • Societal changes
    • Demographic diversification
      • Globalization
      • Multi-lingual
      • Multi-cultural
      • Privacy concerns
      • Declining participation

–       Bottom line: we adapt. Our industry Always Evolves

–       We’re “in the midst of a renaissance, reinventing ourselves”

  • Me: That’s framing for you! Wow!

–       On the rise:

  • Big Data
  • Synthetic Data
    • Transportation industry
    • Census
    • Simulation studies
      • E.g. How many people would pay x amount of income tax under y policy?
  • Bayesian Methods
    • Apply to probability and nonprobability samples
  • New generation
    • Accustomed to and EXPECT rapid technological turnover
    • Fully enmeshed in social media

–       3 big changes:

  • Non-probability sampling
    • “Train already left the station”
    • Level of sophistication varies
    • Model based inference
    • Wide public acceptance
    • Already a proliferation
  • Communication technology
    • Passive data collection
      • Behaviors
        • E.g. pos (point of service) apps
        • Attitudes or opinions
      • Real time collection
        • Prompted recall (apps)
        • Burden reduction
          • Gamification
  • Big Data
    • What is it?
    • Data too big to store
      • (me: “think “firehoses”)
      • Volume, velocity, variety
      • Fuzzy inferences
      • Not necessarily statistical
      • Coursenes insights

–       We need to ask tough questions

  • (theme of next AAPOR conference is just that)
  • We need to question probability samples, too
    • Flawed designs abound
    • High nonresponse & noncoverage
    • Can’t just scrutinize nonprobability samples
  • Nonprobability designs
    • Some good, well accepted methods
    • Diagnostics for measurement
      • How to measure validity?
      • What are the clues?
      • How to create a research agenda to establish validity?
  • Expanding the players
    • Multidisciplinary
      • Substantive scientists
      • Math stats
      • Modelers
      • Econometricians
  • We need
    • Conversations with practitioners
    • Better listening skills

–       AAPOR’s role

  • Create forum for conversation
  • Encourage transparency
  • Engage in outreach
  • Understanding limitations but learning approaches

–       We need to explore the utility of nonprobability samples

–       Insight doesn’t have to be purely from statistical inferences

–       The biggest players in big data to date include:

  • Computational scientists
  • Modelers/ synthetic data’ers

–       We are not a “one size fits all” society, and our research tools should reflect that

My big questions:

–       “What are the borders of our field?”

–       “What makes us who we are, if we don’t do surveys even primarily?”

Linguistic notes:

–       Use of we/who/us

–       Metaphors: “harvest” “firehose”

–       Use of specialized vocabulary

–       Use of the word “comfortable”

–       Interview as a service encounter?

Other notes:

–       This reminds me of Colm O’Muircheartaigh- from that old JPSM distinguished lecture

  • Embracing diversity
  • Allowing noise
  • Encouraging mixed methods

I wish his voice was a part of this discussion…

A brave new vision of the future of social science

I’ve been typing and organizing my notes from yesterday’s dc-aapor event on the past, present and future of survey research (which I still plan to share soon, after a little grooming). The process has been a meditative one.

I’ve been thinking about how I would characterize these same phases- the past, present and future… and then I had a vision of sorts on the way home today that I’d like to share. I’m going to take a minute to be a little post apocalyptic and let the future build itself. You can think of it as a daydream or thought experiment…

The past, I would characterize as the grand discovery of surveys as a tool for data collection; the honing and evolution of that tool in conjunction with its meticulous scientific development and the changing landscape around it; and the growth to dominance and proliferation of the method. The past was an era of measurement, of the total survey error model, of social Science.

The present I would characterize as a rapid coming together, or a perfect storm that is swirling data and ideas and disciplines of study and professions together in a grand sweeping wind. I see the survey folks trudging through the wind, waiting for the storm to pass, feet firmly anchored to solid ground.

The future is essentially the past, turned on its head. The pieces of the past are present, but mixed together and redistributed. Instead of examining the ways in which questions elicit usable data, we look at the data first and develop the questions from patterns in the data. In this era, data is everywhere, of various quality, character and genesis, and the skill is in the sense making.

This future is one of data driven analytic strategies, where research teams intrinsically need to be composed of a spectrum of different, specialized skills.

The kings of this future will be the experts in natural language processing, those with the skill of finding and using patterns in language. All language is patterned. Our job will be to find those patterns and then to discover their social meaning.

The computer scientists and coders will write the code to extract relevant subsets of data, and describe and learn patterns in the data. The natural language processing folks will hone the patterns by grammar and usage. The netnographers will describe and interpret the patterns, the data visualizers will make visual or interactive sense of the patterns, the sociologists will discover constructions of relative social groupings as they emerge and use those patterns. The discourse analysts will look across wider patterns of language and context dependency. The statisticians will make formulas to replicate, describe and evaluate the patterns, and models to predict future behaviors. Data science will be a crucial science built on the foundations of traditional and nontraditional academic disciplines.

How many people does it take to screw in this lightbulb? It depends on the skills of the people or person on the ladder.

Where do surveys fit in to this scheme? To be honest, I’m not sure. The success of surveys seems to rest in part on the failure of faster, cheaper methods with a great deal more inherent error.

This is not the only vision possible, but it’s a vision I saw while commuting home at the end of a damned long week… it’s a vision where naturalistic data is valued and experimentation is an extension of research, where diversity is a natural assumption of the model and not a superimposed dynamic, where the data itself and the patterns within it determine what is possible from it. It’s a vision where traditional academics fit only precariously; a future that could just as easily be ruled out by the constraints of the past as it could be adopted unintentionally, where meaning makers rush to be the rigs in the newest gold rush and theory is as desperately pursued as water sources in a drought.

Remotely following AAPOR conference #aapor

The AAPOR 2012 conference began today in sunny Orlando, Florida. This is my my favorite conference of the year, and I am sorry to miss it. Fortunately, the Twitter action is bringing a lot of the action to homeviewers like us!

https://twitter.com/#!/search/realtime/%23AAPOR

I will keep retweeting some of the action. For those of you who may be concerned that this represents a new era of heavy tweeting for me, rest assured- it wont!

And for anyone who has been wondering what happened to me and my blog, please stay tuned. I am working on an exciting new project that I will eagerly share about in due time.

Searching for Social Meanings in Social Media

This next CLIP event looks really fantastic!

 

Please join us on Wednesday at 11AM in AV Williams room 3258 for the University of Maryland Computational Linguistics and Information Processing (CLIP) colloquium!

 

May 2: Jacob Eisenstein: Searching for social meanings in social media

 

Social interaction is increasingly conducted through online platforms such as Facebook and Twitter, leaving a recorded trace of millions of individual interactions. While some have focused on the supposed deficiencies of social media with respect to more traditional communication channels, language in social media features the same rich connections with personal and group identity, style, and social context. However, social media’s unique set of linguistic affordances causes social meanings to be expressed in new and perhaps surprising ways. This talk will describe research that builds on large-scale social media corpora using analytic tools from statistical machine learning. I will focus on some of the ways in which social media data allow us to go beyond traditional sociolinguistic methods, but I will also discuss lessons from the sociolinguistics literature that the new generation of “big data” research might do well to heed.

 

This research includes collaborations with David Bamman, Brendan O’Connor, Tyler Schnoebelen, Noah A. Smith, and Eric P. Xing.

 

Bio: Jacob Eisenstein is an Assistant Professor in the School of Interactive Computing at Georgia Tech. He works on statistical natural language processing, focusing on social media analysis, discourse, and non-verbal communication. Jacob was a Postdoctoral researcher at Carnegie Mellon and the University of Illinois. He completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award.

 

Location of AV Williams:

http://www.umd.edu/CampusMaps/bld_detail.cfm?bld_code=AVW

http://maps.google.com/maps?q=av+williams

 

Webpage for CLIP events:

https://wiki.umiacs.umd.edu/clip/index.php/Events#Colloquia

 

More rundown on Academedia

So I promised more on Academedia (note: they will add more video and visual resources to the Academedia website in the next few days)…

First, some of Robert Cannon’s (employed with the FCC and a member of Panel B “New Media: A closer look at what works”) insightful gems

Re: internet “a participatory market of free speech”

Re: kids& social media “It’s not a question of whether kids are writing. Kids are writing all the time. It’s whether parents understand that.”

“The issue is not whether to use Wikipedia, but how to use Wikipedia”
Next, the final panel, “Digital Tools for Communication:” http://gnovis-conferences.com/panel-c/
Hitlin (Pew Project for Excellence in Journalism)
People communicate differently about issues on different kinds of media sources.
Re: Trayvon Martin case –> largest issue by media source

  •      Twitter: 21% Outrage @ Zimmerman
  •      Cable & Talk radio: 17% Gun control legislation
  •      Blogs: 15% Role of race

Re: Crimson Hexagon
Pew is different, because they’re in a partnership with Crimson Hexagon to measure trends in Traditional media sources. Also because their standard of error is much higher, and they have a team of hand coders available.

Crimson Hexagon is different, because it combines human coding with machine learning to develop algorithms. It may actually overlap pretty intensely with some of the traditional qualitative coding programs that allow for some machine learning. I can imagine that this feature would appeal especially to researchers who are reluctant to fully embrace machine coding, which is understandable, given the current state of the art. I wonder if, by hosting their users instead of distributing programs, they’re able to store and learn from the codes developed by the users?

CH appears to measure two main domains: topic volume over time and topic sentiment over time. Users get a sense of recall and precision in action as they work with the program, by seeing the results of additions and subtractions to a search lexicon. Through this process, Hitlin got a sense of the meat of the problems with text analysis. He said that it was difficult to find examples that neatly fit into boxes, and that the computer didn’t have an eye for subtlety or things that fit into multiple categories. What he was commenting about was the nature of language in action, or what sociolinguists call Discourse! Through the process of categorizing language, he could sense how complicated it is. Here I get to reiterate one of the main points of this blog: these problems are the reason why linguistics is a necessary aspect of this process. Linguistics is the study of patterns in language, and the patterns we find are inherently different from the patterns we expect to find. Linguistics is a small field, one that people rarely think of. But it is critically essential to a high quality analysis of communication. In fact, we find, when we look for patterns in language, that everything in language is patterned, from its basic morphology and syntax, to its many variations (which are more systematic than we would predict), to methods like metaphor use and intertextuality, and more.

Linguistics is a key, but it’s not a simple fit. Language is patterned in so many ways that linguistics is a huge field. Unfortunately, the subfields of linguistics divide quickly into political and educational camps. It is rare to find a linguist trained in cognitive linguistics, applied linguistics and discourse analysis, for example. But each of these fields are necessary parts of text analysis.

Just as this blog is devoted to knocking down borders in research methods, it is devoted to knocking down borders between subfields and moving forward with strategic intellectual partnerships.

This next speaker in the panel thoroughly blew my mind!

Rami Khater from Al Jazeera English talked about the generation of ‘The Stream,’ an Al Jazeera program that is entirely driven by social media analysis.

Rami can be found on Twitter: @ramisms , and he shared a bit.ly with resources from his talk: bit.ly/yzST1d

The goal of The Stream is to be “a voice of the voiceless,” by monitoring how the hyperlocal goes global. Rami gave a few examples of things we never would have heard about without social media. He showed how hash tags evolve, by starting with competing tags, evolving and changing, and eventually converging into a trend (incidentally, Rami identified the Kony 2012 trend as synthetic from the get go by pointing that there was no organic hashtag evolution. It simply started and nded as #Kony2012). He used TrendsMap to show a quick global map of currently trending hashtags. I put a link to TrendsMap on the tools section of the links on this blog, and I strongly encourage you to experiment with it. My daughter and I spent some time looking at it today, and we found an emerging conversation in South Africa about black people on the Titanic. We followed this up with another tool, Topsy, which allowed us to see what the exact conversation was about. Rami gets to know the emerging conversations and then uses local tools to isolate the genesis of the trend and interview people at its source. Instead, my daughter and I looked at WhereTweeting to see what the people around us are tweeting about. We saw some nice words of wisdom from Iyanla Vanzant that were drowning in what appeared to me to be “a whole bunch of crap!” (“Mom-mmy, you just used the C word!”)

Anyway, the tools that Rami shared are linked over here —->

I encourage you to play around with them, and I encourage you and me both to go check out the recent Stream interview with Ai Wei Wei!

The final speaker on the panel was Karine Megerdoomian from MITRE. I have encountered a few people from MITRE recently at conferences, and I’ve been impressed with all of them! Karine started with some words that made my day:

“How helpful a word cloud is is basically how much work you put into it”

EXactly! Great point, Karine! And she showed a particularly great word cloud that combined useful words and phrases into a single image. Niiice!

Karine spoke a bit about MITRE’s efforts to use machine learning to identify age and gender among internet users. She mentioned that older users tended to use noses in their smilies 🙂 and younger users did not 🙂 . She spoke of how older Iranian users tended to use Persian morphology when creating neologisms, and younger users tended to use English, and she spoke about predicting revolutions and seeing how they are propagated over time.

After this point, the floor was opened up for questions. The first question was a critically important one for researchers. It was about representativeness.

The speakers pointed out that social media has a clear bias toward English speakers, western educated people, white, mail, liberal, US & UK. Every network has a different set of flaws, but every network has flaws. It is important not to just use these analyses as though they were complete. You simply have to go deeper in your analysis.

 

There was a bit more great discussion, but I’m going to end here. I hope that other will cover this event from other perspectives. I didn’t even mention the excellent discussions about education and media!

It’s not WHETHER they use it, but how they ENGAGE with it

Yesterday I attended an excellent event, Academedia, sponsored by the Gnovis Journal and the Communication, Culture and Technology program at Georgetown. I plan to post a fuller summary of the event soon, but I wanted to jump right in with some commentary about an exchange at the event that really weighed heavy on me.

One of the attendees was lamenting his child’s lack of engagement with traditional media sources (particularly news magazines) and worrying about the deeper societal implications of all of the fluff that garners more attention (and spreads faster) than larger scale news events do online. I would characterize this concern as what my professor Mima Dedaic calls “technopanic,” and I believe that his concerns demonstrate a lack of understanding of the nature of social media.

I have mentioned Pew’s report on the Kony 2012 viral video phenomena. One of the main findings of that report was that younger people tend to engage differently with media than older people tend to. Whereas older people were more likely to find out about the video from traditional news sources, younger people were more likely to have heard about it, and heard about it sooner, from social media sources. They were also less likely to have heard about it through traditional media sources, and more likely to have actually seen the video.

In the past media model, the news was composed of a distinct set of entities that could be avoided. I know of quite a few people who prefer not to watch the news or read the newspapers. This orientation has always existed. But in the age of social media, it is much harder to achieve.

When it snows, I know when the flakes begin to fall and the general swath of the storm, even if it’s not local, from my friends who complain about the storm and post pictures of its aftermath. I heard about Michael Jackson’s tragic passing before it was announced in the news. When Egyptians gathered in Tahrir Square, I knew about it from my friend in Egypt. I kept updated on the conflict and on her safety in a community of concerned friends and relatives on her facebook page. I hear about American political ads from people who see them and comment about them. I know what aspects of politicians my friends with different political orientations orient to. For me, social media can provide a faster news source, and often a more balanced news source, than traditional media (although I am an avid consumer of all kinds of media).

News is no longer a distinct entity that must be sought out. It is personalized. It is discussed from many angles from a variety of perspectives with a great deal of frequency from people who have various degrees of knowledge and and a variety of attitudes toward it. The man in the audience’s daughter may spend most of her time giggling over memes or making fan pages, but she is surely also orienting toward the larger world around her in a collaborative and alocative (location independent) way.

As a survey researcher, I like to participate in surveys. Some of these surveys ask about where I heard about something. I’m often very frustrated by the response options, because they are incongrent with the ways that I, and many people I know, learn about things on the internet. Googling is sometimes represented as a process of typing a search term into the box and choosing the first option that pulls up. But how often is that the way we use search engines?

There are two distinct ways that I can think of offhand that I google. One is for a direct, known piece of information, like an address, phone number or a picture of something I am already familiar with. The other is more exploratory. An exploratory search takes some term adjustment, and it requires reading through matches until a contextual understanding can be developed. I have noticed that some people can search far more efficiently than others. There are many tools available on the internet, and a working knowledge of the usefulness and potential of these tools can lead to a much different outcome than a passing use can.

There was a representative from the FCC on the panel who shared some great insights, much of which I will cover later. He spoke about kids being taught in schools that technology is bad (disruptive, disobedient, minimally insightful, …), instead of being taught how to use the technological tools available to them.

He said, it’s not WHETHER they use Wikipedia, but how they ENGAGE with Wikipedia.

This is a crucial point. The more we embrace the usefulness of these tools, the better our capabilities will be.

The other side of technopanic is a fear that engaging online means NOT engaging offline. Data on this topic show quite the opposite. People who engage online are also MORE likely to engage offline. Technology need not replace anything. But it can be an excellent tool when approached without unhelpful prejudices.