Total Survey Error: as Iconic as the Statue of Liberty herself?

In Jan Blommaerts book, the Sociolinguistics of Globalization, I learned about the iconicity of language. Languages, dialects, phrases and words have the potential to be as iconic as the statue of liberty. As I read Blommaert’s book, I am also reading about Total Survey Error, which I believe to be an iconic concept in the field of survey research.

Total Survey Error (TSE) is a relatively new, albeit very comprehensive framework for evaluating a host of potential error sources in survey research. It is often mentioned by AAPOR members (national and local), at JPSM classes and events, and across many other events, publications and classes for survey researchers. But here’s the catch: TSE came about after many of us entered the field. In fact, by the time TSE debuted and caught on as a conceptual framework, many people had already been working in the field for long enough that a framework didn’t seem necessary or applicable.

In the past, survey research was a field that people grew into. There were no degree or certificate programs in survey research. People entered the field from a variety of educational and professional backgrounds and worked their way up through the ranks from data entry, coder or interviewing positions to research assistant and analyst positions, and eventually up to management. Survey research was a field that valued experience, and much of the essential job knowledge came about through experience. This structure strongly characterizes my own office, where the average tenure is fast approaching two decades. The technical and procedural history of the department is alive and well in our collections of artifacts and shared stories. We do our work with ease, because we know the work well, and the team works together smoothly because of our extensive history together. Challenges or questions are an opportunity for remembering past experiences.

Programs such as the Joint Program in Survey Methodology (JPSM, a joint venture between the University of Michigan and University of Maryland) are relatively new, arising, for the most part, once many survey researchers were well established into their routines. Scholarly writings and journals multiplied with the rise of the academic programs. New terms and new methods sprang up. The field gained an alternate mode of entry.

In sociolinguistics, we study evidentiality, because people value different forms of evidence. Toward this end, I did a small study of survey researchers’ language use and mode of evidentials and discovered a very stark split between those that used experience to back up claims and those who relied on research to back up claims. This stark difference matched up well to my own experiences. In fact, when I coach jobseekers who are looking for survey research positions, I  draw on this distinction and recommend that they carefully listen to the types of evidentials they hear from the people interviewing them and try to provide evidence in the same format. The divide may not be visible from the outside of the field, but it is a strong underlying theme within it.

The divide is not immediately visible from the outside because the face of the field is formed by academic and professional institutions that readily embrace the academic terminology. The people who participate in these institutions and organizations tend to be long term participants who have been exposed to the new concepts through past events and efforts.

But I wonder sometimes whether the overwhelming public orientation to these methods doesn’t act to exclude some longtime survey researchers in some ways. I wonder whether some excellent knowledge and history get swept away with the new. I wonder whether institutions that represent survey research represent the field as a whole. I wonder what portion of the field is silent, unrepresented or less connected to collective resources and changes.

Particularly as the field encounters a new set of challenges, I wonder how well prepared the field will be- not just those who have been following these developments closely, but also those who have continued steadfast, strong, and with limited errors- not due to TSE adherence, but due to the strength of their experience. To me, the Total Survey Error Method is a powerful symbol of the changes afoot in the field.

For further reference, I’m including a past AAPOR presidential address by Robert Groves

groves aapor

Proceedings of the Fifty-First Annual Conference of the American Association for Public Opinion Research
Source: Source: The Public Opinion Quarterly, Vol. 60, No. 3 (Autumn, 1996), pp. 471-513
ETA other references:

Bob Groves: The Past, Present and Future of Total Survey Error

Slideshow summary of above article

Is there Interdisciplinary hope for Social Media Research?

I’ve been trying to wrap my head around social media research for a couple of years now. I don’t think it would be as hard to understand from any one academic or professional perspective, but, from an interdisciplinary standpoint, the variety of perspectives and the disconnects between them are stunning.

In the academic realm:

There is the computer science approach to social media research. From this standpoint, we see the fleshing out of machine learning algorithms in a stunning horserace of code development across a few programming languages. This is the most likely to be opaque, proprietary knowledge.

There is the NLP or linguistic approach, which overlaps to some degree with the cs approach, although it is often more closely tied to grammatical rules. In this case, we see grammatical parsers, dictionary development, and api’s or shared programming modules, such as NLTK or GATE. Linguistics is divided as a discipline, and many of these divisions have filtered into NLP.

Both the NLP and CS approaches can be fleshed out, trained, or used on just about any data set.

There are the discourse approaches. Discourse is an area of linguistics concerned with meaning above the level of the sentence. This type of research can follow more of a strict Conversation Analysis approach or a kind of Netnography approach. This school of thought is more concerned with context as a determiner or shaper of meaning than the two approaches above.

For these approaches, the dataset cannot just come from anywhere. The analyst should understand where the data came from.

One could divide these traditions by programming skills, but there are enough of us who do work on both sides that the distinction is superficial. Although, generally speaker, the deeper one’s programming or qualitative skills, the less likely one is to cross over to the other side.

There is also a growing tradition of data science, which is primarily quantitative. Although I have some statistical background and work with quantitative data sets every day, I don’t have a good understanding of data science as a discipline. I assume that the growing field of data visualization would fall into this camp.

In the professional realm:

There are many companies in horseraces to develop the best systems first. These companies use catchphrases like “big data” and “social media firehose” and often focus on sentiment analysis or topic analysis (usually topics are gleaned through keywords). These companies primarily market to the advertising industry and market researchers, often with inflated claims of accuracy, which are possible because of the opacity of their methods.

There is the realm of market research, which is quickly becoming dependent on fast, widely available knowledge. This knowledge is usually gleaned through companies involved in the horserace, without much awareness of the methodology. There is an increasing need for companies to be aware of their brand’s mentions and interactions online, in real time, and as they collect this information it is easy, convenient and cost effective to collect more information in the process, such as sentiment analyses and topic analyses. This field has created an astronomically high demand for big data analysis.

There is the traditional field of survey research. This field is methodical and error focused. Knowledge is created empirically and evaluated critically. Every aspect of the survey process is highly researched and understood in great depth, so new methods are greeted with a natural skepticism. Although they have traditionally been the anchors of good professional research methods and the leaders in the research field, survey researchers are largely outside of the big data rush. Survey researchers tend to value accuracy over timeliness, so the big, fast world of big data, with its dubious ability to create representative samples, hold little allure or relevance.

The wider picture

In the wider picture, we have discussions of access and use. We see a growing proportion of the population coming online on an ever greater variety of devices. On the surface, the digital divide is fast shrinking (albeit still significant). Some of the digital access debate has been expanded into an understanding of differential use- essentially that different people do different activities while online. I want to take this debate further by focusing on discursive access or the digital representation of language ideologies.

The problem

The problem with such a wide spread of methods, needs, focuses and analytic traditions is that there isn’t enough crossover. It is very difficult to find work that spreads across these domains. The audiences are different, the needs are different, the abilities are different, and the professional visions are dramatically different across traditions. Although many people are speaking, it seems like people are largely speaking within silos or echo chambers, and knowledge simply isn’t trickling across borders.

This problem has rapidly grown because the underlying professional industries have quickly calcified. Sentiment analysis is not the revolutionary answer to the text analysis problem, but it is good enough for now, and it is skyrocketing in use. Academia is moving too slow for the demands of industry and not addressing the needs of industry, so other analytic techniques are not being adopted.

Social media analysis would best be accomplished by a team of people, each with different training. But it is not developing that way. And that, I believe, is a big (and fast growing) problem.

I conducted my first diversity training today…

One of the perks of my grad program is learning how to conduct diversity training.

Today I was able to put that skill to use for the first time. I conducted a workshop for a local parents group about Talking with your Kids about Race and Diversity. I co-facilitated it with Elvira Magomedova, a recent graduate from the MLC program who has more experience and more of a focus in this area. It was a really interesting and rewarding experience.

We did 4 activities:

1. We introduced ourselves by telling our immigration stories. I saw this last week at an open house at my daughter’s middle school, and it profoundly reminded me about the personal ways in which we all embody global history and the immigrant nature of the US. Between feuding clans in Ireland,  narrow escapes from the holocaust and traveling singers in Europe, this exercise is both powerful and fun. Characters and events really come alive, and everyone is left on a more equal footing.

2. For the 2nd activity, we explored the ways in which we identify ourselves. We each put a circle in the center of a sheet of paper, an then we added four bubble spokes with groups or cultures or ways in which we identify ourselves. The exercise came from Cultural Awareness Learning Module One. At the bottom of the page, we explored these relationships more deeply, e.g. “I’m a parent, but I’m not a stay at home parent” or “I’m Muslim, but I’m not practicing my religion.” We spoke in depth about our pages in pairs and then shared some with the group.

3. This is a fun activity for parents and kid alike. We split into two groups, culture A and culture B. Each culture has a list of practices, e.g. standing close or far, making eye contact or not, extensive vs minimal greetings or leavetaking, shaking or not shaking hands, … The groups learn, practice, and then mingle. This is a profoundly awkward activity!

After mingling, we get back into the group and discuss the experience. It soon becomes obvious that people take differences in “culture” personally. People complain that it seemed like their interlocuters were just trying to get away from them, or seemed overly interested in them, or…. They also complain about how hard it is to adjust your practices to act in the prescribed way.

This exercise is a good way for people to understand the ways in which conflicting cultural norms play out, and it helps parents to understand how to work out misunderstandings with their kids.

4. Finally, my daughter made a slide show of people from all over the world. The people varied in countless physical ways from each other, and we used them to stimulate conversation about physical differences. As adults, we tend to ascribe a bevvy of sociological baggage to these physical differences, but the reality is that, unless we’re Steven Colbert, there are striking physical differences between people. As parents, we are often taken aback when our kids speak openly about differences that we’ve grown accustomed to not talking about. It’s natural and normal to wonder how to handle these observations.

The upshot of this conversation is that describing anyone by a single physical category doesn’t really make sense. If you’re talking about a physical description of someone, you have a number of physical features to comment on. Whereas referring to anyone by a single physical feature could be offensive, a more detailed description is simply a more accurate physical description. We don’t have to use judgmental words, like “good hair,” but that shouldn’t stop us from talking about curly, straight, wavy, thick or thin. We can talk about people in terms of their height or body shape, face shape, hair texture, color or style, eye shape or color, mouth shape, ear size, nose style, skin tone, and so much more. Artificial racial or ethnic groupings don’t *really* describe what someone looks like, talks like, or has experienced.

More than this, once we have seen people in any kind of action, we have their actions and our relationship with them to use as resources. Given all of those resources, choosing race or ethnicity as a first descriptive level with our kids, or even using that descriptor and stopping, sends the message to the kids that that is the only feature that matters. It draws boundaries before it begins conversations. It passes “us and them” along.

Race and ethnicity are one way to describe a person, but they are far from the only way. And they, more than any other way, carry the most baggage. Does that mean they should be avoided or declared taboo?

This week in my Ethnography of Communication class, we each went to Gallaudet, the deaf university in DC, and observed. One of my classmates commented about her discomfort with her lack of fluency in ASL, or American Sign Language. Her comment reminded me of my kids and their cousins. My kids speak English, and only a little bit of Amharic and Tigrinya. Some of their cousins only spoke Tigrinya when they met. Some only spoke Swedish. Some spoke English with very different accents. But the language barriers never stopped them from playing with each other.

In fact, we talk about teaching our kids about diversity, but our kids should be the ones to teach us!

Here are the main lessons I’ve learned from my kids:

1. Don’t cut yourself off from people because you don’t share a common language. Communication actually runs much deeper than language. I think, for example, of one of my sisters inlaw. When we first met, we didn’t have a common language. But the more I was able to get to know her over time, the more we share. I really cherish my relationship with her, and I wouldn’t have it if I had let my language concerns get in the way of communicating with her.

2. People vary a lot, strikingly, in physical ways. These are worthy of comment, okay to notice, and important parts of what make people unique.

3. If you cut yourself off from discomfort or potential differences, you draw a line between you and many of the people around you.

4. It is okay to be wrong, or to still be learning. Learning is a lifelong process. Just because we’re adults doesn’t mean we have to have it all down pat. Don’t be afraid to fail, to mess up. Your fear will get you nowhere. How could you have learned anything if you were afraid of messing up?

In sum, this experience was a powerful one and an interesting one. I sincerely hope that the conversations we began will continue.

* Edited to Add:

Thandie Newton TED talk, Embracing Otherness

Chimamanda Adichie TED talk: The danger of a single story

GREAT letter with loads of resources: http://goodmenproject.com/ethics-values/why-i-dont-want-to-talk-about-race/

an interesting article that we read in class: why white parents don’t talk about race

another interesting article: Lippi Green 1997 Teaching Children How to Discriminate

 

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

I had wanted to write these notes up into paragraphs, but I think the notes will be more timely, relevant and readable if I share them as they are. This was a really great conference- very relevant and timely- based on a really great issue of Public Opinion Quarterly. As I was reminded at the DC African Festival (a great festival, lots of fun, highly recommended) on Saturday, “In order to understand the future you must embrace the past.”

DC AAPOR Annual Public Opinion Quarterly Special Issue Conference

75th Anniversary Edition

The Past, Present and Future of Survey Methodology and Public Opinion Research

Look out for slides from the event here: http://www.dc-aapor.org/pastevents.php

 

Note: Of course, I took more notes in some sessions than others…

Peter Miller:

–       Adaptive design- tracking changes in estimates across mailing waves and tracking response bias, is becoming standard practice at Census

–       Check out Howard Schuman’s article tracking attitudes toward Christopher Columbus

  • Ended up doing some field research in the public library, reading children’s books

Stanley Presser:

–       Findings have no meaning independent of the method with which they were collected

–       Balance of substance and method make POQ unique (this was a repeated theme)

Robert Groves:

–       The survey was the most important invention in Social Science in the 20th century – quote credit?

–       3 era’s of Survey research (boundaries somewhat arbritrary)

  • 1930-1960
    • Foundation laid, practical development
  • 1960-1990
    • Founders pass on their survey endeavors to their protégés
    • From face to face to phone and computer methods
    • Emergence & Dominance of Dillman method
    • Growth of methodological research
    • Total Survey Error perspective dominates
    • Big increase in federal surveys
    • Expansion of survey centers & private sector organizations
    • Some articles say survey method dying because of nonresponse and inflating costs. This is a perennial debate. Groves speculated that around every big election time, someone finds it in their interest to doubt the polls and assigns a jr reporter to write a piece calling the polls into question.
  • 1990à
    • Influence of other fields, such as social cognitive psychology
    • Nonresponse up, costs up à volunteer panels
    • Mobile phones decrease cost effectiveness of phone surveys
    • Rise of internet only survey groups
    • Increase in surveys
    • Organizational/ business/ management skills more influential than science/ scientists
    • Now: software platforms, culture clash with all sides saying “Who are these people? Why do they talk so funny? Why don’t they know what we know?”
    • Future
      • Rise of organic data
      • Use of administrative data
      • Combining data sets
      • Proprietary data sets
      • Multi-mode
      • More statistical gymnastics

Mike Brick:

  • Society’s demand for information is Insatiable
  • Re: Heckathorn/ Respondent Driven samples
    • Adaptive/ indirect sampling is better
    • Model based methods
      • Missing data problem
      • Cost the main driver now
      • Estimation methods
      • Future
        • Rise of multi-frame surveys
        • Administrative records
        • Sampling theory w/nonsampling errors at design & data collection stages
          • Sample allocation
          • Responsive & adaptive design
          • Undercoverage bias can’t be fixed at the back end
            • *Biggest problem we face*
            • Worse than nonresponse
            • Doug Rivers (2007)
              • Math sampling
              • Web & volunteer samples
              • 1st shot at a theory of nonprobability sampling
            • Quota sampling failed in 2 high profile examples
              • Problem: sample from interviews/ biased
              • But that’s FIXABLE
            • Observational
              • Case control & eval studies
              • Focus on single treatment effect
              • “tougher to measure everything than to measure one thing”

Mick Couper:

–       Mode an outdated concept

  • Too much variety and complexity
  • Modes are multidimensional
    • Degree of interviewer involvement
    • Degree of contact
    • Channels of communication
    • Level of privacy
    • Technology (used by whom?)
    • Synchronous vs. asynchronous
  • More important to look at dimensions other than mode
  • Mode is an attribute of a respondent or item
  • Basic assumption of mixed mode is that there is no difference in responses by mode, but this is NOT true
    • We know of many documented, nonignorable, nonexplainable mode differences
    • Not “the emperor has no clothes” but “the emperor is wearing suggestive clothes”
    • Dilemma: differences not Well understood
      • Sometimes theory comes after facts
      • That’s where we are now- waiting for the theory to catch up (like where we are on nonprobability sampling)

–       So, the case for mixed mode collection so far is mixed

  • Mail w/web option has been shown to have a lower response rate than mail only across 24-26 studies, at least!!
    • (including Dillman, JPSM, …)
    • Why? What can we do to fix this?
    • Sequential modes?
      • Evidence is really mixed
      • The impetus for this is more cost than response rate
      • No evidence that it brings in a better mix of people

–       What about Organic data?

  • Cheap, easily available
  • But good?
  • Disadvantages:
    • One var at a time
    • No covariates
    • Stability of estimates over time?
    • Potential for mischief
      • E.g. open or call-in polls
      • My e.g. #muslimrage
  • Organic data wide, thin
  • Survey data narrow, deep

–       Face to face

  • Benchmark, gold standard, increasingly rare

–       Interviewers

  • Especially helpful in some cases
    • Nonobservation
    • Explaining, clarifying

–       Future

  • Technical changes will drive dev’t
  • Modes and combinations of modes will proliferate
  • Selection bias The Biggest Threat
  • Further proliferation of surveys
    • Difficult for us to distinguish our work from “any idiot out there doing them”

–       Surveys are tools for democracy

  • Shouldn’t be restricted to tools for the elite
  • BUT
  • There have to be some minimum standards

–       “Surveys are tools and methodologists are the toolmakers”

Nora Cate Schaeffer:

–       Jen Dykema read & summarized 78 design papers- her summary is available in the appendix of the paper

–       Dynamic interactive displays for respondent in order to help collect complex data

–       Making decisions when writing questions

  • See flow chart in paper
    • Some decisions are nested
  • Question characteristics
    • E.g. presence or absence of a feature
      • E.g. response choices

Sunshine Hillygus:

–       Political polling is “a bit of a bar trick”

  • The best value in polls is in understanding why the election went the way it did

–       Final note: “The things we know as a field are going to be important going forward, even if it’s not in the way they’ve been used in the past”

Lori Young and Diana Mutz:

–       Biggest issues:

  • Diversity
  • Selective exposure
  • Interpersonal communication

–       2 kinds of search, influence of each

  • Collaborative filter matching, like Amazon
    • Political targeting
    • Contentious issue: 80% of people said that if they knew a politician was targeting them they wouldn’t vote for that candidate
      • My note: interesting to think about peoples relationships with their superficial categories of identity- it’s taken for granted so much in social science research, yet not by the people within the categories

–       Search engines: the new gatekeepers

  • Page rank & other algorithms
  • No one knows what influence personalization of search results will have
  • Study on search learning: gave systematically different input to train engines are (given same start point), results changes Fast and Substantively

Rob Santos:

–       Necessity mother of invention

  • Economic pressure
  • Reduce costs
  • Entrepreneurial spirit
  • Profit
  • Societal changes
    • Demographic diversification
      • Globalization
      • Multi-lingual
      • Multi-cultural
      • Privacy concerns
      • Declining participation

–       Bottom line: we adapt. Our industry Always Evolves

–       We’re “in the midst of a renaissance, reinventing ourselves”

  • Me: That’s framing for you! Wow!

–       On the rise:

  • Big Data
  • Synthetic Data
    • Transportation industry
    • Census
    • Simulation studies
      • E.g. How many people would pay x amount of income tax under y policy?
  • Bayesian Methods
    • Apply to probability and nonprobability samples
  • New generation
    • Accustomed to and EXPECT rapid technological turnover
    • Fully enmeshed in social media

–       3 big changes:

  • Non-probability sampling
    • “Train already left the station”
    • Level of sophistication varies
    • Model based inference
    • Wide public acceptance
    • Already a proliferation
  • Communication technology
    • Passive data collection
      • Behaviors
        • E.g. pos (point of service) apps
        • Attitudes or opinions
      • Real time collection
        • Prompted recall (apps)
        • Burden reduction
          • Gamification
  • Big Data
    • What is it?
    • Data too big to store
      • (me: “think “firehoses”)
      • Volume, velocity, variety
      • Fuzzy inferences
      • Not necessarily statistical
      • Coursenes insights

–       We need to ask tough questions

  • (theme of next AAPOR conference is just that)
  • We need to question probability samples, too
    • Flawed designs abound
    • High nonresponse & noncoverage
    • Can’t just scrutinize nonprobability samples
  • Nonprobability designs
    • Some good, well accepted methods
    • Diagnostics for measurement
      • How to measure validity?
      • What are the clues?
      • How to create a research agenda to establish validity?
  • Expanding the players
    • Multidisciplinary
      • Substantive scientists
      • Math stats
      • Modelers
      • Econometricians
  • We need
    • Conversations with practitioners
    • Better listening skills

–       AAPOR’s role

  • Create forum for conversation
  • Encourage transparency
  • Engage in outreach
  • Understanding limitations but learning approaches

–       We need to explore the utility of nonprobability samples

–       Insight doesn’t have to be purely from statistical inferences

–       The biggest players in big data to date include:

  • Computational scientists
  • Modelers/ synthetic data’ers

–       We are not a “one size fits all” society, and our research tools should reflect that

My big questions:

–       “What are the borders of our field?”

–       “What makes us who we are, if we don’t do surveys even primarily?”

Linguistic notes:

–       Use of we/who/us

–       Metaphors: “harvest” “firehose”

–       Use of specialized vocabulary

–       Use of the word “comfortable”

–       Interview as a service encounter?

Other notes:

–       This reminds me of Colm O’Muircheartaigh- from that old JPSM distinguished lecture

  • Embracing diversity
  • Allowing noise
  • Encouraging mixed methods

I wish his voice was a part of this discussion…

A brave new vision of the future of social science

I’ve been typing and organizing my notes from yesterday’s dc-aapor event on the past, present and future of survey research (which I still plan to share soon, after a little grooming). The process has been a meditative one.

I’ve been thinking about how I would characterize these same phases- the past, present and future… and then I had a vision of sorts on the way home today that I’d like to share. I’m going to take a minute to be a little post apocalyptic and let the future build itself. You can think of it as a daydream or thought experiment…

The past, I would characterize as the grand discovery of surveys as a tool for data collection; the honing and evolution of that tool in conjunction with its meticulous scientific development and the changing landscape around it; and the growth to dominance and proliferation of the method. The past was an era of measurement, of the total survey error model, of social Science.

The present I would characterize as a rapid coming together, or a perfect storm that is swirling data and ideas and disciplines of study and professions together in a grand sweeping wind. I see the survey folks trudging through the wind, waiting for the storm to pass, feet firmly anchored to solid ground.

The future is essentially the past, turned on its head. The pieces of the past are present, but mixed together and redistributed. Instead of examining the ways in which questions elicit usable data, we look at the data first and develop the questions from patterns in the data. In this era, data is everywhere, of various quality, character and genesis, and the skill is in the sense making.

This future is one of data driven analytic strategies, where research teams intrinsically need to be composed of a spectrum of different, specialized skills.

The kings of this future will be the experts in natural language processing, those with the skill of finding and using patterns in language. All language is patterned. Our job will be to find those patterns and then to discover their social meaning.

The computer scientists and coders will write the code to extract relevant subsets of data, and describe and learn patterns in the data. The natural language processing folks will hone the patterns by grammar and usage. The netnographers will describe and interpret the patterns, the data visualizers will make visual or interactive sense of the patterns, the sociologists will discover constructions of relative social groupings as they emerge and use those patterns. The discourse analysts will look across wider patterns of language and context dependency. The statisticians will make formulas to replicate, describe and evaluate the patterns, and models to predict future behaviors. Data science will be a crucial science built on the foundations of traditional and nontraditional academic disciplines.

How many people does it take to screw in this lightbulb? It depends on the skills of the people or person on the ladder.

Where do surveys fit in to this scheme? To be honest, I’m not sure. The success of surveys seems to rest in part on the failure of faster, cheaper methods with a great deal more inherent error.

This is not the only vision possible, but it’s a vision I saw while commuting home at the end of a damned long week… it’s a vision where naturalistic data is valued and experimentation is an extension of research, where diversity is a natural assumption of the model and not a superimposed dynamic, where the data itself and the patterns within it determine what is possible from it. It’s a vision where traditional academics fit only precariously; a future that could just as easily be ruled out by the constraints of the past as it could be adopted unintentionally, where meaning makers rush to be the rigs in the newest gold rush and theory is as desperately pursued as water sources in a drought.

Question Writing is an Art

As a survey researcher, I like to participate in surveys with enough regularity to keep current on any trends in methodology. As a web designer, an aspect of successful design is a seamlessness with the visitor’s expectations. So if the survey design realm has moved toward submit buttons on the upper right hand corner of individual pages, your idea (no matter how clever) to put a submit button on the upper left can result in a disconnect on the part of the user that will effect their behavior on the page. In fact, the survey design world has evolved quite a bit in the last few years, and it is easy to design something that reflects poorly on the quality of your research endeavor. But these design concerns are less of an issue than they have been, because most researchers are using templates.

Yet there is still value in keeping current.

And sometimes we encounter questions that lend themselves to an explanation of the importance of question writing. These questions are a gift for a field that is so difficult to describe in terms of knowledge and skills!

Here is a question I encountered today (I won’t reveal the source):

How often do you purchase potato chips when you eat out at any quick service and fast food restaurants?

2x a week or more
1x a week
1x every 2-3 weeks
1x a month
1x every 2-3 months
Less than 1x every 3 months
Never

This is a prime example of a double barreled question, and it is also an especially difficult question to answer. In my care, I rarely eat at quick service restaurants, especially sandwich places, like this one, that offer potato chips. When I do eat at them, I am tempted to order chips. About half the time I will give in to the temptation with a bag of sunchips, which I’m pretty sure are not made of potato.

In bigger firms that have more time to work through, this information would come out in the process of a cognitive interview or think aloud during the pretesting phase. Many firms, however, have staunchly resisted these important steps in the surveying process, because of their time and expense. It is important to note that the time and expense involved with trying to make usable answers out of poorly written questions can be immense.

I have spent some time thinking about alternatives to cognitive testing, because I have some close experience with places that do not use this method. I suspect that this is a good place for text analytics, because of the power of reaching people quickly and potentially cheaply (depending on your embedded TA processes). Although oftentimes we are nervous about web analytics because of their representativeness, the bar for representativeness is significantly lower in the pretesting stage than in the analysis phase.

But, no matter what pretesting model you choose, it is important to look closely at the questions that you are asking. Are you asking a single question, or would these questions be better separated out into a series?

How often do you eat at quick service sandwich restaurants?

When you eat at quick service restaurants, do you order [potato] chips?

What kind of [potato] chips do you order?

The lesson of all of this is that question writing is important, and the questions we write in surveys will determine the kind of survey responses we receive and the usability of our answers.

To go big, first think small

We use language all of the time. Because of this, we are all experts in language use. As native speakers of a language, we are experts in the intricacies of that language.

Why, then, do people study linguistics? Aren’t we all linguists?

Absolutely not.

We are experts in *using* language, but we are not experts in the methods we employ. Believe it or not, much of the process of speaking and hearing is not conscious. If it was, we would be sensorally overwhelmed with the sheer volume of words around us. Instead, listening comprehension involves a process of merging what we expect to hear with what we gauge to be the most important elements of what we do hear. The process of speaking involves merging our estimates of what the people we communicate with know and expect to hear with our understanding of the social expectations surrounding our words and our relationships and distilling these sources into a workable expression. The hearer will reconstruct elements of this process using cues that are sometimes conscious and sometimes not.

We often think of language as simple and mechanistic, but it is not simple at all. As conversational analysts, our job is to study conversation that we have access to in an attempt to reconstruct the elements that constituted the interaction. Even small chunks of conversation encode quite a bit of information.

The process of conversation analysis is very much contrary to our sense of language as regular language users. This makes the process of explaining our research to people outside our field difficult. It is difficult to justify the research, and it is difficult to explain why such small pieces of data can be so useful, when most other fields of research rely on greater volumes of data.

In fact, a greater volume of data can be more harmful than helpful in conversation analysis. Conversation is heavily dependent on its context; on the people conversing, their relationship, their expectations, their experiences that day, the things on their mind, what they expect from each other and the situation, their understanding of language and expectations, and more. The same sentence can have greatly different meanings once those factors are taken into account.

At a time when there is so much talk of the glory of big data, it is especially important to keep in mind the contributions of small data. These contributions are the ones that jeopardize the utility and promise of big data, and if these contributions can be captured in creative ways, they will be the true promise of the field.

Not what language users expect to see, but rather what we use every day, more or less consciously.

When Code Is Hot

Excellent article on TechCrunch by Jon Evans, “When Code is Hot”

http://techcrunch.com/2012/04/07/when-code-is-hot/

Excerpt:

“That first cited piece above begins with “Parlez-vous Python?”, a cutesy bit that’s also a pet peeve. Non-coders tend to think of different programming languages as, well, different languages. I’ve long maintained that while programming itself — “computational thinking”, as the professor put it — is indeed very like a language, “programming languages” are mere dialects; some crude and terse, some expressive and eloquent, but all broadly used to convey the same concepts in much the same way.

 
Like other languages, though, or like music, it’s best learned by the young. I am skeptical of the notion that many people who start learning to code in their 30s or even 20s will ever really grok the fundamental abstract notions of software architecture and design.

 
Stross quotes Michael Littman of Rutgers: “Computational thinking should have been covered in middle school, and it isn’t, so we in the C.S. department must offer the equivalent of a remedial course.” Similarly, the Guardian recently ran an excellent series of articles on why all children should be taught how to code. (One interesting if depressing side note there: the older the students, the more likely it is that girls will be peer-pressured out of the technical arena.)”

Research and Little League

I recently had a revelation about research methodology.

In my Intercultural Communication class, a presenter showed a picture of a moment in a baseball game. The conversation that followed was about baseball and about Little League. It missed the point.

Look around you. You are flooded with visual data. Open your ears. You are flooded with auditory data. Open your senses. What are you touching? Do you smell anything? The world is full of sensory data, so much data, in fact, that we could never take it all in.

This is where attention come in. Focus. Foreground. We quickly filter out sounds to focus on, points in the visual field that are the most meaningful at any given moment. In this way, we are efficient and capable. But we are not researchers.

To conduct research is to focus on a moment in time, an interaction, a photograph, etc. and look more deeply at it. Research begins with careful observation. Research includes deconstructing an element into its constituent pieces and thinking carefully about those pieces.

What does a linguist do? A linguist takes the time to look at language and unpack it to reconstitute its context, creation and motivation. Linguistics is asking ‘what is happening?’ ‘what tools are being used’ and ‘what is being accomplished?’ Linguistics is taking the time to look more closely at the elements of the picture and not restrict oneself to the natural foreground.

Laypeople talk about the content of language. People talk about the boy in the picture who is jumping for joy. Researchers look at the trajectory of the eyes in the crowd to see where people are focusing their attention. They notice the fence between the audience and the players and the way people interact with it. They notice the baseball on the ground. They notice the sunshine and the clothing that the people are wearing. They can uncover the deeper story of what was happening in that moment, instead of surmising about the apparent focus.

These are the skills we are learning.