Upcoming DC Event: Online Research Offline Lunch

ETA: Registration for this event is now CLOSED. If you have already signed up, you will receive a confirmation e-mail shortly. Any sign-ups after this date will be stored as a contact list for any future events. Thank you for your interest! We’re excited to gather with such a diverse and interesting group.

—–

Are you in or near the DC area? Come join us!

Although DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. This lunch will provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives.

Date & Time: August 6, 2013, 12:30 p.m.

Location: Near Gallery Place or Metro Center. Once we have a rough headcount, we’ll choose an appropriate location. (Feel free to suggest a place!)

Please RSVP using this form:

Spam, Personal histories and Language competencies

Over the recent holiday, I spent some time sorting through many boxes of family memorabilia. Some of you have probably done this with your families. It is fascinating, sentimental and mind-boggling. Highlights include both the things that strike a chord and things that can be thrown away. It’s a balance of efficiency and sap.

 

I’m always amazed by the way family memorabilia tells both private, personal histories and larger public ones. The boxes I dealt with last week were my mom’s, and her passion was politics. Even the Christmas cards she saved give pieces of political histories. Old thank you cards provide unknown nuggets of political strategy. She had even saved stirrers and plastic cups from an inauguration!

 

Campaign button found in the family files

Campaign button found in the family files

 

 

My mom continued to work in politics throughout her life, but the work that she did more recently is understandably fresher and more tangible for me. I remember looking through printed Christmas cards from politicians and wondering why she held on to them. In her later years I worried about her tendency to hold on to mail merged political letters. I wondered if her tendency to personalize impersonal documents made her vulnerable to fraud. To me, her belief in these documents made no sense.

 

Flash forward one year to me sorting through boxes of handwritten letters from politicians that mirror the spam she held on to. For many years she received handwritten letters from elected politicians in Washington. At some point, the handwritten letters evolved into typed letters that were hand-corrected and included handwritten sections. These evolved into typed letters on which the only handwriting was the signature. Eventually, even the signatures became printed. But the intention and function of these letters remained the same, even as their typography evolved. She believed in these letters because she had been receiving them for many decades. She believed they were personal because she had seen more of them that were personal than not. The phrases that I believe to be formulaic and spammy were once handwritten, intentional, personal and probably even heartfelt.

 

 

There are a few directions I could go from here:

 

– I better understand why older people complain about the impersonalization of modern society and wax poetic about the old letter writing tradition. I could include a few anecdotes about older family members.

 

– I’m amazed that people would take the time to write long letters using handwriting that may never have been deciphered

 

– I could wax poetic about some of the cool things I found in the storage facility

 

 

But I won’t. Not in this blog. Instead, I’ll talk about competencies.

 

Spam is a manifest of language competencies, although we often dismiss it as a total lack of language competence. In my Linguistics study, we were quickly taught the mantra “difference, not deficiency.” In fact it takes quite a bit of skill to develop spam letters. In survey research, the survey invitation letters that people so often dismiss have been heavily researched and optimized to yield a maximum response rate. In his book The Sociolinguistics of Globalization, Jan Blommaert details the many competencies necessary to create the Nigerian bank scam letters that were so heavily circulated a few years ago. And now I’ve learned that the political letters that I’m so quick to dismiss as thoughtless mail merges are actually part of a deep tradition of political action. Will that be enough for me to hold on to them? No. But I am saving the handwritten stuff. Boxes and boxes of it!

 

 

One day last week, as I drove to the storage facility I heard an interview with Michael Pollan about Food Literacy. Pollan’s point was that the food draughts in some urban areas are not just a function of access (Food draughts are areas where fresh food is difficult to obtain and grocery stores are few and far between, if they’re available at all). Pollan believes that even if there were grocery stores available, the people in these neighborhoods lack the basic cooking skills to prepare the food. He cited a few basic cooking skills which are not basic to me (partly because I’m a vegetarian, and partly because of the cooking traditions I learned from) as a part of his argument.

 

As a linguist, it is very interesting to hear the baggage that people attach to language metaphorically carried over to food (“food illiteracy”). I wonder what value the “difference, not deficiency” mantra holds here. I’m not ready to believe that people in areas subject to food draught are indeed kitchen illiterate. But I wouldn’t hesitate to agree that their food cultures probably differ significantly from Pollan’s. The basic staples and cooking methods probably differ significantly. Pollan could probably make a lot more headway with his cause if, instead of assuming that the people he is trying to help lack any basic cooking skills, he advocated toward a culture change that included access, attainability, and the potential to learn different practical cooking skills. It’s a subtle shift, but an important one.

 

As a proud uncook, I’m a huge fan of any kind of food preparation that is two steps or less, cheap, easy and fresh. Fast food for me involves putting a sweet potato in the microwave and pressing “potato,” grabbing for an apple or carrots and peanut butter, or tossing chickpeas into a dressing. Slow food involves the basic sautéing, roasting, etc. that Pollan advocates. I imagine that the skills he advocates are more practical and enjoyable for him than they are for people like me, whose mealtimes are usually limited and chaotic. What he calls basic is impractical for many of us. And the differences in time and money involved in uncooking and “basics” add up quickly.

 

 

 

So I’ve taken this post in quite a few directions, but it all comes together under one important point. Different language skills are not a lack of language skills altogether. Similarly, different survival skills are not a total lack of survival skills. We all carry unique skillsets that reflect our personal histories with those skills as well as the larger public histories that our personal histories help to compose. We, as people, are part of a larger public. The political spam I see doesn’t meet my expectations of valuable, personal communication, but it is in fact part of a rich political history. The people who Michael Pollan encounters have ways of feeding themselves that differ from Pollan’s expectations, but they are not without important survival skills. Cultural differences are not an indication of an underlying lack of culture.

2013-07-05 11.13.21

 

The curse of the elevator speech

Yesterday I was involved in an innocent watercooler chat in which I was asked what Sociolinguistics is. This should be an easy enough question, because I just got a master’s degree in it. But it’s not. Sociolinguistics is a large field that means different things to different people. For every way of studying language, there are social and behavioral correlates that can also be studied. So a sociolinguist could focus on any number of linguistic areas, including phonology, syntax, semantics, or, in my case, discourse. My studies focus on the ways in which people use language, and the units of analysis in my studies are above the sentence level. Because Linguistics is such a large and siloed field, explaining Sociolinguistics through the lens of discourse analysis feels a bit like explaining vegetarianism through a pescatarian lens. The real vegetarians and the real linguists would balk.

There was a follow up question at the water cooler about y’all. “Is it a Southern thing?” My answer to this was so admittedly lame that I’ve been trying to think of a better one (sometimes even the most casual conversations linger, don’t they?).

My favorite quote of this past semester was from Jan Blommaert: “Language reflects a life, and not just a birth, and it is a life that is lived in a real sociocultural, historical and political space” Y’all has long been considered a southernism, but when I think back to my own experience with it, it was never about southern language or southern identity. One big clue to this is that I do sometimes use y’all, but I don’t use other southern language features along with it.

If I wanted to further investigate y’all from a sociolinguistic perspective, I would take language samples, either from one or a variety of speakers (and this sampling would have clear, meaningful consequences) and track the uses of y’all to see when it was invoked and what function it serves when invoked. My best, uninformed guess is that it does relational work and invokes registers that are more casual and nonthreatening. But without data, that is nothing but an uninformed guess.

This work has likely been done before. It would be interesting to see.
(ETA: Here is an example of this kind of work in action, by Barbara Johnstone)

Revisiting Latino/a identity using Census data

On April 10, I attended a talk by Jennifer Leeman (Research Sociolinguist @Census and Assistant Professor @George Mason) entitled “Spanish and Latino/a identity in the US Census.” This was a great talk. I’ll include the abstract below, but here are some of her main points:

  • Census categories promote and legitimize certain understandings, particularly because the Census, as a tool of the government, has an appearance of neutrality
  • Census must use categories from OMB
  • The distinction between race and ethnicity is fuzzy and full of history.
    • o   In the past, this category has been measured by surname, mothertongue, birthplace
      o   Treated as hereditary (“perpetual foreigner” status)
      o   Self-id new, before interviewer would judge, record
  • In the interview context, macro & micro meet
    • o   Macro level demographic categories
    • o   Micro:
      • Interactional participant roles
      • Indexed through labels & structure
      • Ascribed vs claimed identities
  • The study: 117 telephone interviews in Spanish
    • o   2 questions, ethnicity & race
    • o   Ethnicity includes Hispano, Latino, Español
      • Intended as synonyms but treated as a choice by respondents
      • Different categories than English (Adaptive design at work!)
  • The interviewers played a big role in the elicitation
    • o   Some interviewers emphasized standardization
      • This method functions differently in different conversational contexts
    • o   Some interviewers provided “teaching moments” or on-the-fly definitions
      • Official discourses mediated through interviewer ideologies
      • Definitions vary
  • Race question also problematic
    • o   Different conceptions of Indioamericana
      • Central, South or North American?
  • Role of language
    • o   Assumption of monolinguality problematic, bilingual and multilingual quite common, partial and mixed language resources
    • o   “White” spoken in English different from “white” spoken in Spanish
    • o   Length of time in country, generation in country belies fluid borders
  • Coding process
    • o   Coding responses such as “American, born here”
    • o   ~40% Latino say “other”
    • o   Other category ~ 90% Hispanic (after recoding)
  • So:
    • o   Likely result: one “check all that apply” question
      • People don’t read help texts
    • o   Inherent belief that there is an ideal question out there with “all the right categories”
      • Leeman is not yet ready to believe this
    • o   The takeaway for survey researchers:
      • Carefully consider what you’re asking, how you’re asking it and what information you’re trying to collect
  • See also Pew Hispanic Center report on Latino/a identity

 

 

 ABSTRACT

Censuses play a crucial role in the institutionalization and circulation of specific constructions of national identity, national belonging, and social difference, and they are a key site for the production and institutionalization of racial discourse (Anderson 1991; Kertzer & Arel 2002; Nobles 2000; Urla 1994).  With the recent growth in the Latina/o population, there has been increased interest in the official construction of the “Hispanic/Latino/Spanish origin” category (e.g., Rodriguez 2000; Rumbaut 2006; Haney López 2005).  However, the role of language in ethnoracial classification has been largely overlooked (Leeman 2004). So too, little attention has been paid to the processes by which the official classifications become public understandings of ethnoracial difference, or to the ways in which immigrants are interpellated into new racial subjectivities.

This presentation addresses these gaps by examining the ideological role of Spanish in the history of US Census Bureau’s classifications of Latina/os as well as in the official construction of the current “Hispanic/Latino/Spanish origin” category. Further, in order to gain a better understanding of the role of the census-taking in the production of new subjectivities, I analyze Spanish-language telephone interviews conducted as part of Census 2010.  Insights from recent sociocultural research on the language and identity (Bucholtz and Hall 2005) inform my analysis of how racial identities are instantiated and negotiated, and how respondents alternatively resist and take up the identities ascribed to them.

* Dr. Leeman is a Department of Spanish & Portuguese Graduate (GSAS 2000).

Digital Democracy Remixed

I recently transitioned from my study of the many reasons why the voice of DC taxi drivers is largely absent from online discussions into a study of the powerful voice of the Kenyan people in shaping their political narrative using social media. I discovered a few interesting things about digital democracy and social media research along the way, and the contrast between the groups was particularly useful.

Here are some key points:

  • The methods of sensemaking that journalists use in social media is similar to other methods of social media research, except for a few key factors, the most important of which is that the bar for verification is higher
  • The search for identifiable news sources is important to journalists and stands in contrast with research methods that are built on anonymity. This means that the input that journalists will ultimately use will be on a smaller scale than the automated analyses of large datasets widely used in social media research.
  • The ultimate information sources for journalists will be small, but the phenomena that will capture their attention will likely be big. Although journalists need to dig deep into information, something in the large expanse of social media conversation must capture or flag their initial attention
  • It takes some social media savvy to catch the attention of journalists. This social media savvy outweighs linguistic correctness in the ultimate process of getting noticed. Journalists act as intermediaries between social media participants and a larger public audience, and part of the intermediary process is language correcting.
  • Social media savvy is not just about being online. It is about participating in social media platforms in a publicly accessible way in regards to publicly relevant topics and using the patterned dialogic conventions of the platform on a scale that can ultimately draw attention. Many people and publics go online but do not do this.

The analysis of social media data for this project was particularly interesting. My data source was the comments following this posting on the Al Jazeera English Facebook feed.

fb

It evolved quite organically. After a number of rounds of coding I noticed that I kept drawing diagrams in the margins of some of the comments. I combined the diagrams into this framework:

scales

Once this framework was built, I looked closely at the ways in which participants used this framework. Sometimes participants made distinct discursive moves between these levels. But when I tried to map the participants’ movements on their individual diagrams, I noticed that my depictions of their movements rarely matched when I returned to a diagram. Although my coding of the framework was very reliable, my coding of the movements was not at all. This led me to notice that oftentimes the frames were being used more indexically. Participants were indexing levels of the frame, and this indexical process created powerful frame shifts. So, on the level of Kenyan politics exclusively, Uhuru’s crimes had one meaning. But juxtaposed against the crimes of other national leaders’ Uhuru’s crimes had a dramatically different meaning. Similarly, when the legitimacy of the ICC was questioned, the charges took on a dramatically different meaning. When Uhuru’s crimes were embedded in the postcolonial East vs West dynamic, they shrunk to the degree that the indictments seemed petty and hypocritical. And, ultimately, when religion was invoked the persecution of one man seemed wholly irrelevant and sacrilegious.

These powerful frame shifts enable the Kenyan public to have a powerful, narrative changing voice in social media. And their social media savvy enables them to gain the attention of media sources that amplify their voices and thus redefine their public narrative.

readyforcnn

Still grappling with demographics

Last year I wrote about my changing perspective on demographic variables. My grappling has continued since then.
I think of it as an academic puberty of sorts.

I remember the many crazy thought exercises I subjected myself to as a teenager, as I tried to forge my own set of beliefs and my own place in the world. I questioned everything. At times I was under so much construction that it was a wonder I functioned at all. Thankfully, I survived to enter my twenties intact. But lately I have been caught in a similar thought exercise of sorts, second guessing the use of sociological demographic variables in research.

Two sample projects mark two sides of the argument. One is a potential study of the climate for underrepresented faculty members in physics departments. In our exploration of this subject, the meaning of underrepresented was raised. Indeed there are a number of ways in which a faculty member could be underrepresented or made uncomfortable: gender, race, ethnicity, accent, bodily differences or disabilities, sexual orientation, religion, … At some point, one could ask whether it matters which of these inspired prejudicial or different treatment, or whether the hostile climate is, in and of itself, important to note. Does it make sense to tick off which of a set of possible prejudices are stronger or weaker at a particular department? Or does it matter first that the uncomfortable climate exists, and that personal differences that should be professionally irrelevant are coming into professional play. One could argue that the climate should be the first phase of the study, and any demographics could be secondary. One might be particularly tempted to argue for this arrangement given the small sizes of the departments and hesitation among many faculty members to supply information that could identify them personally.

If that was the only project on my mind, I might be tempted to take a more deconstructionist view of demographic variables altogether. But there is another project that I’m working on that argues against the deconstructionist view- the Global Survey of Physicists.

(Side or backstory: The global survey is kind of a pet project of mine, and it was the project that led me to grad school. Working on it involved coordinating survey design, translation and dissemination with representatives from over 100 countries. This was our first translation project. It began in English and was then translated into 7 additional languages. The translation process took almost a full year and was full of unexpected complications. Near the end of this phase, I attended a talk at the Bureau of Labor Statistics by Yuling Pan from Census. The talk was entitled ‘the Sociolinguistics of Survey Translation.’ I attended it never having heard of Sociolinguistics before. During the course of the talk, Yuling detailed and dissected experiences that paralleled my own into useful pieces and diagnosed and described some of the challenges I had encountered in detail. I was so impressed with her talk that I googled Sociolinguistics as soon as I returned to my office, discovered the MLC a few minutes later. One month later I was visiting Georgetown and working on my application for the MLC. I like to say it was like being swept up off my feet and then engaging in a happy shotgun marriage)

The Global Survey was designed to elicit gender differences in terms of experiences, climate, resources and opportunities, as well as the effects of personal and family constraints and decisions on school and career. The survey worked particularly well, and each dive into the data proves fascinating. This week I delved deeper into the dynamics of one country and saw women’s sources of support erode as they progressed further into school and work, saw the women transition from a virtual parity in school to difficult careers, beginning with their significantly larger chance of having to choose their job because it was the only offer they received, and becoming significantly worse with the introduction of kids. In fact, we found through this survey that kids tend to slow women’s careers and accelerate men’s!

What do these findings say about the use of demographic variables? They certainly validate their usefulness and cause me to wonder whether a lack of focus on demographics would lessen the usefulness of the faculty study. Here I’m reminded that it is important, when discussing demographic variables, to keep in mind that they are not arbitrary. They reflect ways of seeing that are deeply engrained in society. Gender, for example, is the first thing to note about a baby, and it determines a great deal from that point in. Excluding race or ethnicity seems foolish, too, in a society that so deeply engrains these distinctions.

The problem may be in the a priori or unconsidered applications of demographic variables. All too often, the same tired set of variables are dredged up without first considering whether they would even provide a useful distinction or the most useful cuts to a dataset. A recent example of this is the study that garnered some press about racial differences in e-learning. From what I read of the study, all e-learning was collapsed into a single entity, an outcome or dependent variable (as in some kind if measure of success of e-learning), and run by a set of traditional x’s or independent variables, like race and socioeconomic status. In this case, I would have preferred to first see a deeper look into the mechanics of e-learning than a knee jerk rush to the demographic variables. What kind of e-learning course was it? What kinds of interaction were fostered between the students and the teacher, material and other students? So many experiences of e-learning were collapsed together, and differences in course types and learning environments make for more useful and actionable recommendations than demographics ever could.

In the case of the faculty and global surveys as well, one should ask what approaches to the data would yield the most useful analyses. Finding demographic differences leads to what- an awareness of discrimination? Discrimination is deep seeded and not easily cured. It is easy to document and difficult to fix. And yet, more specific information about climate, resources and opportunities could be more useful or actionable. It helps to ask what we can achieve through our research. Are we simply validating or proving known societal differences or are we working to create actionable recommendations? What are the most useful distinctions?

Most likely, if you take the time to carefully consider the information you collect, the usefulness of your analyses and the validity of your hypotheses, you are one step above anyone rotely applying demographic variables out of ill-considered habit. Kudos to you for that!

Is there Interdisciplinary hope for Social Media Research?

I’ve been trying to wrap my head around social media research for a couple of years now. I don’t think it would be as hard to understand from any one academic or professional perspective, but, from an interdisciplinary standpoint, the variety of perspectives and the disconnects between them are stunning.

In the academic realm:

There is the computer science approach to social media research. From this standpoint, we see the fleshing out of machine learning algorithms in a stunning horserace of code development across a few programming languages. This is the most likely to be opaque, proprietary knowledge.

There is the NLP or linguistic approach, which overlaps to some degree with the cs approach, although it is often more closely tied to grammatical rules. In this case, we see grammatical parsers, dictionary development, and api’s or shared programming modules, such as NLTK or GATE. Linguistics is divided as a discipline, and many of these divisions have filtered into NLP.

Both the NLP and CS approaches can be fleshed out, trained, or used on just about any data set.

There are the discourse approaches. Discourse is an area of linguistics concerned with meaning above the level of the sentence. This type of research can follow more of a strict Conversation Analysis approach or a kind of Netnography approach. This school of thought is more concerned with context as a determiner or shaper of meaning than the two approaches above.

For these approaches, the dataset cannot just come from anywhere. The analyst should understand where the data came from.

One could divide these traditions by programming skills, but there are enough of us who do work on both sides that the distinction is superficial. Although, generally speaker, the deeper one’s programming or qualitative skills, the less likely one is to cross over to the other side.

There is also a growing tradition of data science, which is primarily quantitative. Although I have some statistical background and work with quantitative data sets every day, I don’t have a good understanding of data science as a discipline. I assume that the growing field of data visualization would fall into this camp.

In the professional realm:

There are many companies in horseraces to develop the best systems first. These companies use catchphrases like “big data” and “social media firehose” and often focus on sentiment analysis or topic analysis (usually topics are gleaned through keywords). These companies primarily market to the advertising industry and market researchers, often with inflated claims of accuracy, which are possible because of the opacity of their methods.

There is the realm of market research, which is quickly becoming dependent on fast, widely available knowledge. This knowledge is usually gleaned through companies involved in the horserace, without much awareness of the methodology. There is an increasing need for companies to be aware of their brand’s mentions and interactions online, in real time, and as they collect this information it is easy, convenient and cost effective to collect more information in the process, such as sentiment analyses and topic analyses. This field has created an astronomically high demand for big data analysis.

There is the traditional field of survey research. This field is methodical and error focused. Knowledge is created empirically and evaluated critically. Every aspect of the survey process is highly researched and understood in great depth, so new methods are greeted with a natural skepticism. Although they have traditionally been the anchors of good professional research methods and the leaders in the research field, survey researchers are largely outside of the big data rush. Survey researchers tend to value accuracy over timeliness, so the big, fast world of big data, with its dubious ability to create representative samples, hold little allure or relevance.

The wider picture

In the wider picture, we have discussions of access and use. We see a growing proportion of the population coming online on an ever greater variety of devices. On the surface, the digital divide is fast shrinking (albeit still significant). Some of the digital access debate has been expanded into an understanding of differential use- essentially that different people do different activities while online. I want to take this debate further by focusing on discursive access or the digital representation of language ideologies.

The problem

The problem with such a wide spread of methods, needs, focuses and analytic traditions is that there isn’t enough crossover. It is very difficult to find work that spreads across these domains. The audiences are different, the needs are different, the abilities are different, and the professional visions are dramatically different across traditions. Although many people are speaking, it seems like people are largely speaking within silos or echo chambers, and knowledge simply isn’t trickling across borders.

This problem has rapidly grown because the underlying professional industries have quickly calcified. Sentiment analysis is not the revolutionary answer to the text analysis problem, but it is good enough for now, and it is skyrocketing in use. Academia is moving too slow for the demands of industry and not addressing the needs of industry, so other analytic techniques are not being adopted.

Social media analysis would best be accomplished by a team of people, each with different training. But it is not developing that way. And that, I believe, is a big (and fast growing) problem.

Dispatch from the quantitative | qualitative border

On Tuesday evening I attended my first WAPA meeting (Washington Association of Professional Anthropologists). This group meets monthly, first with a happy hour and then with a speaker. Because I have more of a quantitative background, the work of professional anthropologists really blows my mind. The topics are wide ranging and the work interesting and innovative. I’ve been sorry to miss so many of their gatherings.

This week’s topic was near and dear to my heart in two ways.

1. The work was done in a survey context as a qualitative investigation preceding the development of survey questions. As a professional survey methodologist, I have worked through the surprisingly complicated question writing process many hundreds of times, so this approach really fascinates me!

2. The work surrounded the topic of childbirth. As a mother of two and a [partially] trained birth assistant, I love to talk about childbirth.

The purpose of the study at hand was to explore infant mortality in greater depth by investigating certain aspects of the delivery process. The topics of interest included:

– whether the birth was attended by a professional or not
– whether the birth was at home or in a medical facility
– delivery of the placenta
– how soon after the birth the baby was wiped
– cord cutting and tying
– whether the baby was swaddled and whether the baby’s head was covered
– how soon the baby was bathed

The study was based on 80 respondents (half facility births, half homebirths) (half moms of newborns, half moms of 1-2 year olds) from each of two countries. The researchers collected two kinds of data: extensive unstructured interviews and survey questions. The interviews were coded using Atlas ti into specific, identifiable, repeated events that were relevant to infant mortality and then placed onto a timeline. The timeline guided the recommended order of the survey questions.

One audience member shared that she would have collected stories of “what is a normal childbirth?” from participants in addition to the women’s personal stories. Her focus with this tactic was to collect the language with which people usually discuss these events in childbirth. She mentioned that her field was linguistic anthropology. The language she was talking about is referred to by survey researchers as “native terms-” essentially the terms that people normally use when discussing a given topic. One of the goals of question writing is to write a question using the terms that a respondent would naturally use to classify their response, making the response process easier for the respondent and collecting higher quality data. The presenters mentioned that, although they did not collect normative stories, collecting native terms was a part of their research process and recommendations.

The topics of focus are problematic ones to investigate. Most women can tell whether or not they gave birth in a facility and whether or not the birth was attended by a professional. Women can usually remember their labor and delivery in detail (usually for the rest of their lives), as well as the first time they held and fed their babies. Often women can also remember the delivery of the placenta or whether or not they hemorrhaged or tore significantly during the birth process.

But other aspects of the birth, such as the cord cutting and tying and the first wiping and swaddling of the baby, are usually done by someone other than the mother (if there is someone else present). They often don’t command the attention of the mother, who is full of emotion and adrenaline and catching her breath from an all encompassing, life changingly powerful experience. These moments are often not as memorable as others, and the mothers are often not as fully aware of them or able to report them.

I wondered if the moms were able to use the same level of detail in retelling these parts of their stories? Was there any indication that these sections of the stories they told were their own personal stories and not a general recounting of events as they are supposed to happen? In survey research, we talk about satisficing, or providing an answer because an answer is expected, not because it is correct. In societies where babies are frequently born at home, people often grow up around childbirth and know the general, expected order of events. How would the results of the study have been different if the researchers had used a slightly different approach: instead of assuming that the mothers would be able to recount all of these details of their own experiences, the researchers could have taken a deeper look at who performed the target activities, how detailed an account of the activities the mothers were able to provide, and the nature of the mom’s involvement or role in the target activities.

I wondered if working with this alternative approach would have led to questions more like “The next few questions refer to the moments after your baby was born and the first time you held and nursed your baby. Was the baby already wiped when you first held and nursed them? Was the babies cord already cut and tied? Was the baby already swaddled? Was the baby’s head already covered?” Although questions like these wouldn’t separate out the first 5 minutes from the first 10, they would likely be easier for the mom to answer and yield more complete and accurate responses.

All in all, this event was a fantastic one. I learned about an area of research that I hadn’t known existed. The speaker was great, and the audience was engaged. If you have an opportunity to attend a WAPA event, I highly recommend it.

Storytelling about correlation and causation

Many researchers have great war stories to tell about the perilous waters between correlation and causation. Here is my personal favorite:

In the late 90’s, I was working with neurosurgery patients in a medical psychology clinic in a hospital. We gave each of the patients a battery of cognitive tests before their surgery and then administered the same battery 6 months after the surgery. Our goal was to check for cognitive changes that may have resulted from the surgery. One researcher from outside the clinic focused on our strongest finding: a significant reduction of anxiety from pre-op to post-op. She hypothesized that this dramatic finding was evidence that the neural basis for anxiety was affected by the surgery. Had she only taken a minute to explain her  hypothesis in plain terms to a layperson, especially one that could imagine the anxiety a patient could potentially experience hours before brain surgery, she surely would have withdrawn her request for our data and slipped quietly out of our clinic.

“Correlation does not imply causation” is a research catchphrase that is drilled into practitioners from internhood and intro classes onward. It is particularly true when working with language, because all linguistic behavior is highly patterned behavior. Researchers from many other disciplines would kill to have chi square tests as strong as linguists’ chi squares. In fact, linguists have to reach deeper into their statistical toolkits, because the significance levels alone can be misleading or inadequate.

People who use language but don’t study linguistics usually aren’t aware of the degree of patterning that underlies the communication process. Language learning has statistical underpinnings, and language use has statistical underpinnings. It is because of this patterning that linguistic machine learning is possible. But, linguistic patterning is a double edged sword- potentially helpful in programming and harmful in analysis. Correlations abound, and they’re mostly real correlations, although, statistically speaking, some will be products of peculiarities in a dataset. But outside of any context or theory, these findings are meaningless. They don’t speak to the underlying relationship between the variables in any way.

A word of caution to researchers whose work centers around the discovery of correlations. Be careful with your findings. You may have found evidence that shows that a correlation may exist. But that is all you have found. Take your next steps carefully. First, step back and think about your work in layman’s terms. What did you find, and is that really anything meaningful? If your findings still show some prospects, double down further and dig deeper. Try to get some better idea of what is happening. Get some context.

Because a correlation alone is no gold nugget. You may think you’ve found some fashion, but your emperor could very well still be naked.

Fertile soil from dry dirt. Thank you, Netherlands!

The mood workshop (microanalysis of online data) in Nijmegen last week was immensely helpful for me. In two short days, my research lost some branches and grew some deeper roots. Definitely worth 21+ hours of travel!

Aerial shot of Greenland. Can't tell where the clouds end and the snow and ice begin!

Aerial shot of Greenland. Can’t tell where the clouds end and the snow and ice begin!

The retooling began early on the first day. My first, burning question for the group was about choosing representative data. The shocking first answer: why? To someone with a quantitative background, this question was mind blowing. The sky is up, the ground is down, and data should be representative. But representative of what?

Here we return to the nature of the data. What data are you looking at? What kind of motivated behavior does it represent? Essentially, I am looking at online conversation. We know that counting conversational topics is fruitless- that’s the first truth of conversation analysis. And we know that counting conversational participation is usually misguided. So what was I trying to represent?

My goal is to track a silence that happens across site types, largely independent of stimulus. No matter what kind of news article about taxis in Washington DC, no matter the source, the driver perspective is almost completely absent, and if it is represented the responses are noticeably different or marked. I had thought that if I could find a way to count this underrepresentation I could launch a systematic, grounded critique of the notion of participatory media and pose the question of which values were being maintained from the ground up. What is social capital in online news discourse, who speaks, and which speakers are ratified?

But this is not a question of representative sampling alone. Although sampling could offer a sense of context to the data, the meat and potatoes of the analysis are in fact fodder for conversation analysis. A more useful and interesting research question emerged: how are these online conversations constructed so as to make a pro taxi response dispreferred or marked? This question invokes pronoun usage, intertextuality, conversational reach, crowd based sanctioning, conversational structure and pair parts, register, and more. It provides grounding for a rich, layered analysis. Fertile soil from dry dirt. Thank you, Netherlands.

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You'd be safer living in one of these houseboats!

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You’d be safer living in one of these houseboats!