What do all of these polling strategies add up to?

Yesterday was a big first for research methodologists across many disciplines. For some of the newer methods, it was the first election that they could be applied to in real time. For some of the older methods, this election was the first to bring competing methodologies, and not just methodological critiques.

Real time sentiment analysis from sites like this summarized Twitter’s take on the election. This paper sought to predict electoral turnout using google searches. InsideFacebook attempted to use Facebook data to track voting. And those are just a few of a rapid proliferation of data sources, analytic strategies and visualizations.

One could ask, who are the winners? Some (including me) were quick to declare a victory for the well honed craft of traditional pollsters, who showed that they were able to repeat their studies with little noise, and that their results were predictive of a wider real world phenomena. Some could call a victory for the emerging field of Data Science. Obama’s Chief Data Scientist is already beginning to be recognized. Comparisons of analytic strategies will spring up all over the place in the coming weeks. The election provided a rare opportunity where so many strategies and so many people were working in one topical area. The comparisons will tell us a lot about where we are in the data horse race.

In fact, most of these methods were successful predictors in spite of their complicated underpinnings. The google searches took into account searches for variations of “vote,” which worked as a kind of reliable predictor but belied the complicated web of naturalistic search terms (which I alluded to in an earlier post about the natural development of hashtags, as explained by Rami Khater of Al Jezeera’s The Stream, a social network generated newscast). I was a real-world example of this methodological complication. Before I went to vote, I googled “sample ballot.” Similar intent, but I wouldn’t have been caught in the analyst’s net.

If you look deeper at the Sentiment Analysis tools that allow you to view the specific tweets that comprise their categorizations, you will quickly see that, although the overall trends were in fact predictive of the election results, the data coding was messy, because language is messy.

And the victorious predictive ability of traditional polling methods belies the complicated nature of interviewing as a data collection technique. Survey methodologists work hard to standardize research interviews in order to maximize the reliability of the interviews. Sometimes these interviews are standardized to the point of recording. Sometimes the interviews are so scripted that interviewers are not allowed to clarify questions, only to repeat them. Critiques of this kind of standardization are common in survey methodology, most notably from Nora Cate Schaeffer, who has raised many important considerations within the survey methodology community while still strongly supporting the importance of interviewing as a methodological tool. My reading assignment for my ethnography class this week is a chapter by Charles Briggs from 1986 (Briggs – Learning how to ask) that proves that many of the new methodological critiques are in fact old methodological critiques. But the critiques are rarely heeded, because they are difficult to apply.

I am currently working on a project that demonstrates some of the problems with standardizing interviews. I am revising a script we used to call a representative sample of U.S. high schools. The script was last used four years ago in a highly successful effort that led to an admirable 98% response rate. But to my surprise, when I went to pull up the old script I found instead a system of scripts. What was an online and phone survey had spawned fax and e-mail versions. What was intended to be a survey of principals now had a set of potential respondents from the schools, each with their own strengths and weaknesses. Answers to common questions from school staff were loosely scripted on an addendum to the original script. A set of tips for phonecallers included points such as “make sure to catch the name of the person who transfers you, so that you can specifically say that Ms X from the office suggested I talk to you” and “If you get transferred to the teacher, make sure you are not talking to the whole class over the loudspeaker.”

Heidi Hamilton, chair of the Georgetown Linguistics department, often refers to conversation as “climbing a tree that climbs back.” In fact, we often talk about meaning as mutually constituted between all of the participants in a conversation. The conversation itself cannot be taken outside of the context in which it lives. The many documents I found from the phonecallers show just how relevant these observations can be in an applied research environment.

The big question that arises from all of this is one of a practical strategy. In particular, I had to figure out how to best address the interview campaign that we had actually run when preparing to rerun the campaign we had intended to run. My solution was to integrate the feedback from the phonecallers and loosen up the script. But I suspect that this tactic will work differently with different phonecallers. I’ve certainly worked with a variety of phonecallers, from those that preferred a script to those that preferred to talk off the cuff. Which makes the best phonecaller? Neither. Both. The ideal phonecaller works with the situation that is presented to them nimbly and professionally while collecting complete and relevant data from the most reliable source. As much of the time as possible.

At this point, I’ve come pretty far afield of my original point, which is that all of these competing predictive strategies have complicated underpinnings.

And what of that?

I believe that the best research is conscious of its strengths and weaknesses and not afraid to work with other strategies in order to generate the most comprehensive picture. As we see comparisons and horse races develop between analytic strategies, I think the best analyses we’ll see will be the ones that fit the results of each of the strategies together, simultaneously developing a fuller breakdown of the election and a fuller picture of our new research environment.

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

I had wanted to write these notes up into paragraphs, but I think the notes will be more timely, relevant and readable if I share them as they are. This was a really great conference- very relevant and timely- based on a really great issue of Public Opinion Quarterly. As I was reminded at the DC African Festival (a great festival, lots of fun, highly recommended) on Saturday, “In order to understand the future you must embrace the past.”

DC AAPOR Annual Public Opinion Quarterly Special Issue Conference

75th Anniversary Edition

The Past, Present and Future of Survey Methodology and Public Opinion Research

Look out for slides from the event here: http://www.dc-aapor.org/pastevents.php

 

Note: Of course, I took more notes in some sessions than others…

Peter Miller:

–       Adaptive design- tracking changes in estimates across mailing waves and tracking response bias, is becoming standard practice at Census

–       Check out Howard Schuman’s article tracking attitudes toward Christopher Columbus

  • Ended up doing some field research in the public library, reading children’s books

Stanley Presser:

–       Findings have no meaning independent of the method with which they were collected

–       Balance of substance and method make POQ unique (this was a repeated theme)

Robert Groves:

–       The survey was the most important invention in Social Science in the 20th century – quote credit?

–       3 era’s of Survey research (boundaries somewhat arbritrary)

  • 1930-1960
    • Foundation laid, practical development
  • 1960-1990
    • Founders pass on their survey endeavors to their protégés
    • From face to face to phone and computer methods
    • Emergence & Dominance of Dillman method
    • Growth of methodological research
    • Total Survey Error perspective dominates
    • Big increase in federal surveys
    • Expansion of survey centers & private sector organizations
    • Some articles say survey method dying because of nonresponse and inflating costs. This is a perennial debate. Groves speculated that around every big election time, someone finds it in their interest to doubt the polls and assigns a jr reporter to write a piece calling the polls into question.
  • 1990à
    • Influence of other fields, such as social cognitive psychology
    • Nonresponse up, costs up à volunteer panels
    • Mobile phones decrease cost effectiveness of phone surveys
    • Rise of internet only survey groups
    • Increase in surveys
    • Organizational/ business/ management skills more influential than science/ scientists
    • Now: software platforms, culture clash with all sides saying “Who are these people? Why do they talk so funny? Why don’t they know what we know?”
    • Future
      • Rise of organic data
      • Use of administrative data
      • Combining data sets
      • Proprietary data sets
      • Multi-mode
      • More statistical gymnastics

Mike Brick:

  • Society’s demand for information is Insatiable
  • Re: Heckathorn/ Respondent Driven samples
    • Adaptive/ indirect sampling is better
    • Model based methods
      • Missing data problem
      • Cost the main driver now
      • Estimation methods
      • Future
        • Rise of multi-frame surveys
        • Administrative records
        • Sampling theory w/nonsampling errors at design & data collection stages
          • Sample allocation
          • Responsive & adaptive design
          • Undercoverage bias can’t be fixed at the back end
            • *Biggest problem we face*
            • Worse than nonresponse
            • Doug Rivers (2007)
              • Math sampling
              • Web & volunteer samples
              • 1st shot at a theory of nonprobability sampling
            • Quota sampling failed in 2 high profile examples
              • Problem: sample from interviews/ biased
              • But that’s FIXABLE
            • Observational
              • Case control & eval studies
              • Focus on single treatment effect
              • “tougher to measure everything than to measure one thing”

Mick Couper:

–       Mode an outdated concept

  • Too much variety and complexity
  • Modes are multidimensional
    • Degree of interviewer involvement
    • Degree of contact
    • Channels of communication
    • Level of privacy
    • Technology (used by whom?)
    • Synchronous vs. asynchronous
  • More important to look at dimensions other than mode
  • Mode is an attribute of a respondent or item
  • Basic assumption of mixed mode is that there is no difference in responses by mode, but this is NOT true
    • We know of many documented, nonignorable, nonexplainable mode differences
    • Not “the emperor has no clothes” but “the emperor is wearing suggestive clothes”
    • Dilemma: differences not Well understood
      • Sometimes theory comes after facts
      • That’s where we are now- waiting for the theory to catch up (like where we are on nonprobability sampling)

–       So, the case for mixed mode collection so far is mixed

  • Mail w/web option has been shown to have a lower response rate than mail only across 24-26 studies, at least!!
    • (including Dillman, JPSM, …)
    • Why? What can we do to fix this?
    • Sequential modes?
      • Evidence is really mixed
      • The impetus for this is more cost than response rate
      • No evidence that it brings in a better mix of people

–       What about Organic data?

  • Cheap, easily available
  • But good?
  • Disadvantages:
    • One var at a time
    • No covariates
    • Stability of estimates over time?
    • Potential for mischief
      • E.g. open or call-in polls
      • My e.g. #muslimrage
  • Organic data wide, thin
  • Survey data narrow, deep

–       Face to face

  • Benchmark, gold standard, increasingly rare

–       Interviewers

  • Especially helpful in some cases
    • Nonobservation
    • Explaining, clarifying

–       Future

  • Technical changes will drive dev’t
  • Modes and combinations of modes will proliferate
  • Selection bias The Biggest Threat
  • Further proliferation of surveys
    • Difficult for us to distinguish our work from “any idiot out there doing them”

–       Surveys are tools for democracy

  • Shouldn’t be restricted to tools for the elite
  • BUT
  • There have to be some minimum standards

–       “Surveys are tools and methodologists are the toolmakers”

Nora Cate Schaeffer:

–       Jen Dykema read & summarized 78 design papers- her summary is available in the appendix of the paper

–       Dynamic interactive displays for respondent in order to help collect complex data

–       Making decisions when writing questions

  • See flow chart in paper
    • Some decisions are nested
  • Question characteristics
    • E.g. presence or absence of a feature
      • E.g. response choices

Sunshine Hillygus:

–       Political polling is “a bit of a bar trick”

  • The best value in polls is in understanding why the election went the way it did

–       Final note: “The things we know as a field are going to be important going forward, even if it’s not in the way they’ve been used in the past”

Lori Young and Diana Mutz:

–       Biggest issues:

  • Diversity
  • Selective exposure
  • Interpersonal communication

–       2 kinds of search, influence of each

  • Collaborative filter matching, like Amazon
    • Political targeting
    • Contentious issue: 80% of people said that if they knew a politician was targeting them they wouldn’t vote for that candidate
      • My note: interesting to think about peoples relationships with their superficial categories of identity- it’s taken for granted so much in social science research, yet not by the people within the categories

–       Search engines: the new gatekeepers

  • Page rank & other algorithms
  • No one knows what influence personalization of search results will have
  • Study on search learning: gave systematically different input to train engines are (given same start point), results changes Fast and Substantively

Rob Santos:

–       Necessity mother of invention

  • Economic pressure
  • Reduce costs
  • Entrepreneurial spirit
  • Profit
  • Societal changes
    • Demographic diversification
      • Globalization
      • Multi-lingual
      • Multi-cultural
      • Privacy concerns
      • Declining participation

–       Bottom line: we adapt. Our industry Always Evolves

–       We’re “in the midst of a renaissance, reinventing ourselves”

  • Me: That’s framing for you! Wow!

–       On the rise:

  • Big Data
  • Synthetic Data
    • Transportation industry
    • Census
    • Simulation studies
      • E.g. How many people would pay x amount of income tax under y policy?
  • Bayesian Methods
    • Apply to probability and nonprobability samples
  • New generation
    • Accustomed to and EXPECT rapid technological turnover
    • Fully enmeshed in social media

–       3 big changes:

  • Non-probability sampling
    • “Train already left the station”
    • Level of sophistication varies
    • Model based inference
    • Wide public acceptance
    • Already a proliferation
  • Communication technology
    • Passive data collection
      • Behaviors
        • E.g. pos (point of service) apps
        • Attitudes or opinions
      • Real time collection
        • Prompted recall (apps)
        • Burden reduction
          • Gamification
  • Big Data
    • What is it?
    • Data too big to store
      • (me: “think “firehoses”)
      • Volume, velocity, variety
      • Fuzzy inferences
      • Not necessarily statistical
      • Coursenes insights

–       We need to ask tough questions

  • (theme of next AAPOR conference is just that)
  • We need to question probability samples, too
    • Flawed designs abound
    • High nonresponse & noncoverage
    • Can’t just scrutinize nonprobability samples
  • Nonprobability designs
    • Some good, well accepted methods
    • Diagnostics for measurement
      • How to measure validity?
      • What are the clues?
      • How to create a research agenda to establish validity?
  • Expanding the players
    • Multidisciplinary
      • Substantive scientists
      • Math stats
      • Modelers
      • Econometricians
  • We need
    • Conversations with practitioners
    • Better listening skills

–       AAPOR’s role

  • Create forum for conversation
  • Encourage transparency
  • Engage in outreach
  • Understanding limitations but learning approaches

–       We need to explore the utility of nonprobability samples

–       Insight doesn’t have to be purely from statistical inferences

–       The biggest players in big data to date include:

  • Computational scientists
  • Modelers/ synthetic data’ers

–       We are not a “one size fits all” society, and our research tools should reflect that

My big questions:

–       “What are the borders of our field?”

–       “What makes us who we are, if we don’t do surveys even primarily?”

Linguistic notes:

–       Use of we/who/us

–       Metaphors: “harvest” “firehose”

–       Use of specialized vocabulary

–       Use of the word “comfortable”

–       Interview as a service encounter?

Other notes:

–       This reminds me of Colm O’Muircheartaigh- from that old JPSM distinguished lecture

  • Embracing diversity
  • Allowing noise
  • Encouraging mixed methods

I wish his voice was a part of this discussion…

Do you ever think about interfaces? Because I do. All the time.

Did you ever see the movie Singles? It came out in the early 90s, shortly before the alternative scene really blew up and I dyed [part of] my hair blue and thought seriously about piercings. Singles was a part of the growth of the alternative movement. In the movie, there is a moment when one character says to another “Do you ever think about traffic? Because I do. All the time.” I spent quite a bit of time obsessing over that line, about what it meant, and, more deeply, what it signaled.

I still think about that line. As I drove toward the turnoff to my mom’s street during our 4th of July vacation, I saw what looked like the turn lane for her street, but it was actually an intersection- less left- turning split immediately preceding the real left turn lane for her street. It threw me off every time, and I kept remembering that romantic moment in Singles when the two characters were getting to know each other’s quirks, and the man was talking about traffic. And it was okay, even cool, to be quirky and think or talk about traffic, even during a romantic moment.

I don’t think about traffic often. But I am no less quirky. Lately, I tend to think about interfaces. Before my first brush with NLP (Natural Language Processing), I thought quite a bit about alternatives to e-mail. Since I discovered the world of text analytics, I have been thinking quite a bit about ways to integrate the knowledge across different fields about methods for text analysis and the needs of quantitative and qualitative researchers. I want to think outside of the sentiment box, because I believe that sentiment analysis does not fully address the underlying richness of textual data. I want to find a way to give researchers what they need, not what they think they want. Recently, my thinking on this topic has flipped. Instead of thinking from the data end, or the analytic possibilities end, or about what programs already exist and what they do, I have started to think about interfaces. This feels like a real epiphany. Once we think about the problem from an interface, or user experience perspective, we can better utilize existing technology and harness user expectations.

Have you read the new Imagine book about how creativity works? I believe that this strategy is the natural step after spending time zoning out on the web, thinking, or not thinking, about research. The more time you cruise, the better feel you develop for what works and what doesn’t, the more you learn what to expect. Interfaces are simply the masks we put on datasets of all sorts. The data could be the world wide web as a whole, results from a site or time period, a database of merchandise, or even a set of open ended survey responses. The goal is to streamline the searching interface and then make it available for use on any number of datasets. We use NLP every day when we search the internet, or shop. We understand it intuitively. Why don’t we extend that understanding to text analysis?

I find myself thinking about what this interface should look like and what I want this program to do.

Not traffic, not as romantic. But still quirky and all-encompassing.

Question Writing is an Art

As a survey researcher, I like to participate in surveys with enough regularity to keep current on any trends in methodology. As a web designer, an aspect of successful design is a seamlessness with the visitor’s expectations. So if the survey design realm has moved toward submit buttons on the upper right hand corner of individual pages, your idea (no matter how clever) to put a submit button on the upper left can result in a disconnect on the part of the user that will effect their behavior on the page. In fact, the survey design world has evolved quite a bit in the last few years, and it is easy to design something that reflects poorly on the quality of your research endeavor. But these design concerns are less of an issue than they have been, because most researchers are using templates.

Yet there is still value in keeping current.

And sometimes we encounter questions that lend themselves to an explanation of the importance of question writing. These questions are a gift for a field that is so difficult to describe in terms of knowledge and skills!

Here is a question I encountered today (I won’t reveal the source):

How often do you purchase potato chips when you eat out at any quick service and fast food restaurants?

2x a week or more
1x a week
1x every 2-3 weeks
1x a month
1x every 2-3 months
Less than 1x every 3 months
Never

This is a prime example of a double barreled question, and it is also an especially difficult question to answer. In my care, I rarely eat at quick service restaurants, especially sandwich places, like this one, that offer potato chips. When I do eat at them, I am tempted to order chips. About half the time I will give in to the temptation with a bag of sunchips, which I’m pretty sure are not made of potato.

In bigger firms that have more time to work through, this information would come out in the process of a cognitive interview or think aloud during the pretesting phase. Many firms, however, have staunchly resisted these important steps in the surveying process, because of their time and expense. It is important to note that the time and expense involved with trying to make usable answers out of poorly written questions can be immense.

I have spent some time thinking about alternatives to cognitive testing, because I have some close experience with places that do not use this method. I suspect that this is a good place for text analytics, because of the power of reaching people quickly and potentially cheaply (depending on your embedded TA processes). Although oftentimes we are nervous about web analytics because of their representativeness, the bar for representativeness is significantly lower in the pretesting stage than in the analysis phase.

But, no matter what pretesting model you choose, it is important to look closely at the questions that you are asking. Are you asking a single question, or would these questions be better separated out into a series?

How often do you eat at quick service sandwich restaurants?

When you eat at quick service restaurants, do you order [potato] chips?

What kind of [potato] chips do you order?

The lesson of all of this is that question writing is important, and the questions we write in surveys will determine the kind of survey responses we receive and the usability of our answers.