Total Survey Error: as Iconic as the Statue of Liberty herself?

In Jan Blommaerts book, the Sociolinguistics of Globalization, I learned about the iconicity of language. Languages, dialects, phrases and words have the potential to be as iconic as the statue of liberty. As I read Blommaert’s book, I am also reading about Total Survey Error, which I believe to be an iconic concept in the field of survey research.

Total Survey Error (TSE) is a relatively new, albeit very comprehensive framework for evaluating a host of potential error sources in survey research. It is often mentioned by AAPOR members (national and local), at JPSM classes and events, and across many other events, publications and classes for survey researchers. But here’s the catch: TSE came about after many of us entered the field. In fact, by the time TSE debuted and caught on as a conceptual framework, many people had already been working in the field for long enough that a framework didn’t seem necessary or applicable.

In the past, survey research was a field that people grew into. There were no degree or certificate programs in survey research. People entered the field from a variety of educational and professional backgrounds and worked their way up through the ranks from data entry, coder or interviewing positions to research assistant and analyst positions, and eventually up to management. Survey research was a field that valued experience, and much of the essential job knowledge came about through experience. This structure strongly characterizes my own office, where the average tenure is fast approaching two decades. The technical and procedural history of the department is alive and well in our collections of artifacts and shared stories. We do our work with ease, because we know the work well, and the team works together smoothly because of our extensive history together. Challenges or questions are an opportunity for remembering past experiences.

Programs such as the Joint Program in Survey Methodology (JPSM, a joint venture between the University of Michigan and University of Maryland) are relatively new, arising, for the most part, once many survey researchers were well established into their routines. Scholarly writings and journals multiplied with the rise of the academic programs. New terms and new methods sprang up. The field gained an alternate mode of entry.

In sociolinguistics, we study evidentiality, because people value different forms of evidence. Toward this end, I did a small study of survey researchers’ language use and mode of evidentials and discovered a very stark split between those that used experience to back up claims and those who relied on research to back up claims. This stark difference matched up well to my own experiences. In fact, when I coach jobseekers who are looking for survey research positions, I  draw on this distinction and recommend that they carefully listen to the types of evidentials they hear from the people interviewing them and try to provide evidence in the same format. The divide may not be visible from the outside of the field, but it is a strong underlying theme within it.

The divide is not immediately visible from the outside because the face of the field is formed by academic and professional institutions that readily embrace the academic terminology. The people who participate in these institutions and organizations tend to be long term participants who have been exposed to the new concepts through past events and efforts.

But I wonder sometimes whether the overwhelming public orientation to these methods doesn’t act to exclude some longtime survey researchers in some ways. I wonder whether some excellent knowledge and history get swept away with the new. I wonder whether institutions that represent survey research represent the field as a whole. I wonder what portion of the field is silent, unrepresented or less connected to collective resources and changes.

Particularly as the field encounters a new set of challenges, I wonder how well prepared the field will be- not just those who have been following these developments closely, but also those who have continued steadfast, strong, and with limited errors- not due to TSE adherence, but due to the strength of their experience. To me, the Total Survey Error Method is a powerful symbol of the changes afoot in the field.

For further reference, I’m including a past AAPOR presidential address by Robert Groves

groves aapor

Proceedings of the Fifty-First Annual Conference of the American Association for Public Opinion Research
Source: Source: The Public Opinion Quarterly, Vol. 60, No. 3 (Autumn, 1996), pp. 471-513
ETA other references:

Bob Groves: The Past, Present and Future of Total Survey Error

Slideshow summary of above article

Advertisement

Is there Interdisciplinary hope for Social Media Research?

I’ve been trying to wrap my head around social media research for a couple of years now. I don’t think it would be as hard to understand from any one academic or professional perspective, but, from an interdisciplinary standpoint, the variety of perspectives and the disconnects between them are stunning.

In the academic realm:

There is the computer science approach to social media research. From this standpoint, we see the fleshing out of machine learning algorithms in a stunning horserace of code development across a few programming languages. This is the most likely to be opaque, proprietary knowledge.

There is the NLP or linguistic approach, which overlaps to some degree with the cs approach, although it is often more closely tied to grammatical rules. In this case, we see grammatical parsers, dictionary development, and api’s or shared programming modules, such as NLTK or GATE. Linguistics is divided as a discipline, and many of these divisions have filtered into NLP.

Both the NLP and CS approaches can be fleshed out, trained, or used on just about any data set.

There are the discourse approaches. Discourse is an area of linguistics concerned with meaning above the level of the sentence. This type of research can follow more of a strict Conversation Analysis approach or a kind of Netnography approach. This school of thought is more concerned with context as a determiner or shaper of meaning than the two approaches above.

For these approaches, the dataset cannot just come from anywhere. The analyst should understand where the data came from.

One could divide these traditions by programming skills, but there are enough of us who do work on both sides that the distinction is superficial. Although, generally speaker, the deeper one’s programming or qualitative skills, the less likely one is to cross over to the other side.

There is also a growing tradition of data science, which is primarily quantitative. Although I have some statistical background and work with quantitative data sets every day, I don’t have a good understanding of data science as a discipline. I assume that the growing field of data visualization would fall into this camp.

In the professional realm:

There are many companies in horseraces to develop the best systems first. These companies use catchphrases like “big data” and “social media firehose” and often focus on sentiment analysis or topic analysis (usually topics are gleaned through keywords). These companies primarily market to the advertising industry and market researchers, often with inflated claims of accuracy, which are possible because of the opacity of their methods.

There is the realm of market research, which is quickly becoming dependent on fast, widely available knowledge. This knowledge is usually gleaned through companies involved in the horserace, without much awareness of the methodology. There is an increasing need for companies to be aware of their brand’s mentions and interactions online, in real time, and as they collect this information it is easy, convenient and cost effective to collect more information in the process, such as sentiment analyses and topic analyses. This field has created an astronomically high demand for big data analysis.

There is the traditional field of survey research. This field is methodical and error focused. Knowledge is created empirically and evaluated critically. Every aspect of the survey process is highly researched and understood in great depth, so new methods are greeted with a natural skepticism. Although they have traditionally been the anchors of good professional research methods and the leaders in the research field, survey researchers are largely outside of the big data rush. Survey researchers tend to value accuracy over timeliness, so the big, fast world of big data, with its dubious ability to create representative samples, hold little allure or relevance.

The wider picture

In the wider picture, we have discussions of access and use. We see a growing proportion of the population coming online on an ever greater variety of devices. On the surface, the digital divide is fast shrinking (albeit still significant). Some of the digital access debate has been expanded into an understanding of differential use- essentially that different people do different activities while online. I want to take this debate further by focusing on discursive access or the digital representation of language ideologies.

The problem

The problem with such a wide spread of methods, needs, focuses and analytic traditions is that there isn’t enough crossover. It is very difficult to find work that spreads across these domains. The audiences are different, the needs are different, the abilities are different, and the professional visions are dramatically different across traditions. Although many people are speaking, it seems like people are largely speaking within silos or echo chambers, and knowledge simply isn’t trickling across borders.

This problem has rapidly grown because the underlying professional industries have quickly calcified. Sentiment analysis is not the revolutionary answer to the text analysis problem, but it is good enough for now, and it is skyrocketing in use. Academia is moving too slow for the demands of industry and not addressing the needs of industry, so other analytic techniques are not being adopted.

Social media analysis would best be accomplished by a team of people, each with different training. But it is not developing that way. And that, I believe, is a big (and fast growing) problem.

Dispatch from the quantitative | qualitative border

On Tuesday evening I attended my first WAPA meeting (Washington Association of Professional Anthropologists). This group meets monthly, first with a happy hour and then with a speaker. Because I have more of a quantitative background, the work of professional anthropologists really blows my mind. The topics are wide ranging and the work interesting and innovative. I’ve been sorry to miss so many of their gatherings.

This week’s topic was near and dear to my heart in two ways.

1. The work was done in a survey context as a qualitative investigation preceding the development of survey questions. As a professional survey methodologist, I have worked through the surprisingly complicated question writing process many hundreds of times, so this approach really fascinates me!

2. The work surrounded the topic of childbirth. As a mother of two and a [partially] trained birth assistant, I love to talk about childbirth.

The purpose of the study at hand was to explore infant mortality in greater depth by investigating certain aspects of the delivery process. The topics of interest included:

– whether the birth was attended by a professional or not
– whether the birth was at home or in a medical facility
– delivery of the placenta
– how soon after the birth the baby was wiped
– cord cutting and tying
– whether the baby was swaddled and whether the baby’s head was covered
– how soon the baby was bathed

The study was based on 80 respondents (half facility births, half homebirths) (half moms of newborns, half moms of 1-2 year olds) from each of two countries. The researchers collected two kinds of data: extensive unstructured interviews and survey questions. The interviews were coded using Atlas ti into specific, identifiable, repeated events that were relevant to infant mortality and then placed onto a timeline. The timeline guided the recommended order of the survey questions.

One audience member shared that she would have collected stories of “what is a normal childbirth?” from participants in addition to the women’s personal stories. Her focus with this tactic was to collect the language with which people usually discuss these events in childbirth. She mentioned that her field was linguistic anthropology. The language she was talking about is referred to by survey researchers as “native terms-” essentially the terms that people normally use when discussing a given topic. One of the goals of question writing is to write a question using the terms that a respondent would naturally use to classify their response, making the response process easier for the respondent and collecting higher quality data. The presenters mentioned that, although they did not collect normative stories, collecting native terms was a part of their research process and recommendations.

The topics of focus are problematic ones to investigate. Most women can tell whether or not they gave birth in a facility and whether or not the birth was attended by a professional. Women can usually remember their labor and delivery in detail (usually for the rest of their lives), as well as the first time they held and fed their babies. Often women can also remember the delivery of the placenta or whether or not they hemorrhaged or tore significantly during the birth process.

But other aspects of the birth, such as the cord cutting and tying and the first wiping and swaddling of the baby, are usually done by someone other than the mother (if there is someone else present). They often don’t command the attention of the mother, who is full of emotion and adrenaline and catching her breath from an all encompassing, life changingly powerful experience. These moments are often not as memorable as others, and the mothers are often not as fully aware of them or able to report them.

I wondered if the moms were able to use the same level of detail in retelling these parts of their stories? Was there any indication that these sections of the stories they told were their own personal stories and not a general recounting of events as they are supposed to happen? In survey research, we talk about satisficing, or providing an answer because an answer is expected, not because it is correct. In societies where babies are frequently born at home, people often grow up around childbirth and know the general, expected order of events. How would the results of the study have been different if the researchers had used a slightly different approach: instead of assuming that the mothers would be able to recount all of these details of their own experiences, the researchers could have taken a deeper look at who performed the target activities, how detailed an account of the activities the mothers were able to provide, and the nature of the mom’s involvement or role in the target activities.

I wondered if working with this alternative approach would have led to questions more like “The next few questions refer to the moments after your baby was born and the first time you held and nursed your baby. Was the baby already wiped when you first held and nursed them? Was the babies cord already cut and tied? Was the baby already swaddled? Was the baby’s head already covered?” Although questions like these wouldn’t separate out the first 5 minutes from the first 10, they would likely be easier for the mom to answer and yield more complete and accurate responses.

All in all, this event was a fantastic one. I learned about an area of research that I hadn’t known existed. The speaker was great, and the audience was engaged. If you have an opportunity to attend a WAPA event, I highly recommend it.

Storytelling about correlation and causation

Many researchers have great war stories to tell about the perilous waters between correlation and causation. Here is my personal favorite:

In the late 90’s, I was working with neurosurgery patients in a medical psychology clinic in a hospital. We gave each of the patients a battery of cognitive tests before their surgery and then administered the same battery 6 months after the surgery. Our goal was to check for cognitive changes that may have resulted from the surgery. One researcher from outside the clinic focused on our strongest finding: a significant reduction of anxiety from pre-op to post-op. She hypothesized that this dramatic finding was evidence that the neural basis for anxiety was affected by the surgery. Had she only taken a minute to explain her  hypothesis in plain terms to a layperson, especially one that could imagine the anxiety a patient could potentially experience hours before brain surgery, she surely would have withdrawn her request for our data and slipped quietly out of our clinic.

“Correlation does not imply causation” is a research catchphrase that is drilled into practitioners from internhood and intro classes onward. It is particularly true when working with language, because all linguistic behavior is highly patterned behavior. Researchers from many other disciplines would kill to have chi square tests as strong as linguists’ chi squares. In fact, linguists have to reach deeper into their statistical toolkits, because the significance levels alone can be misleading or inadequate.

People who use language but don’t study linguistics usually aren’t aware of the degree of patterning that underlies the communication process. Language learning has statistical underpinnings, and language use has statistical underpinnings. It is because of this patterning that linguistic machine learning is possible. But, linguistic patterning is a double edged sword- potentially helpful in programming and harmful in analysis. Correlations abound, and they’re mostly real correlations, although, statistically speaking, some will be products of peculiarities in a dataset. But outside of any context or theory, these findings are meaningless. They don’t speak to the underlying relationship between the variables in any way.

A word of caution to researchers whose work centers around the discovery of correlations. Be careful with your findings. You may have found evidence that shows that a correlation may exist. But that is all you have found. Take your next steps carefully. First, step back and think about your work in layman’s terms. What did you find, and is that really anything meaningful? If your findings still show some prospects, double down further and dig deeper. Try to get some better idea of what is happening. Get some context.

Because a correlation alone is no gold nugget. You may think you’ve found some fashion, but your emperor could very well still be naked.