The surprising unpredictability of language in use

This morning I recieved an e-mail from an international professional association that I belong to. The e-mail was in English, but it was not written by an American. As a linguist, I recognized the differences in formality and word use as signs that the person who wrote the e-mail is speaking from a set of experiences with English that differ from my own. Nothing in the e-mail was grammatically incorrect (although as a linguist I am hesitant to judge any linguistic differences as correct or incorrect, especially out of context).

Then later this afternoon I saw a tweet from Twitter on the correct use of Twitter abbreviations (RT, MT, etc.). If the growth of new Twitter users has indeed leveled off then Twitter is lucky, because the more Twitter grows the less they will be able to influence the language use of their base.

Language is a living entity that grows, evolves and takes shape based on individual experiences and individual perceptions of language use. If you think carefully about your experiences with language learning, you will quickly see that single exposures and dictionary definitions teach you little, but repeated viewings across contexts teach you much more about language.

Language use is patterned. Every word combination has a likelihood of appearing together, and that likelihood varies based on a host of contextual factors. Language use is complex. We use words in a variety of ways across a variety of contexts. These facts make language interesting, but they also obscure language use from casual understanding. The complicated nature of language in use interferes with analysts who build assumptions about language into their research strategies without realizing that their assumptions would not stand up to careful observation or study.

I would advise anyone involved in the study of language use (either as a primary or secondary aspect of their analysis) to take language use seriously. Fortunately, linguistics is fun and language is everywhere. So hop to it!

Planning a second “Online Research, Offline Lunch”

In August we hosted the first Online Research, Offline Lunch for researchers involved in online research in any field, discipline or sector in the DC area. Although Washington DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. These lunches provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives. We kept the first gathering small. But the enthusiasm for this small event was quite large, and it was a great success! We had interesting conversations, learned a lot, made some valuable connections, and promised to meet again.

Many expressed interest in the lunches but weren’t able to attend. If you have any specific scheduling requests, please let me know now. Although I certainly can’t accommodate everyone’s preferences, I will do my best to take them into account.

Here is a form that can be used to add new people to the list. If you’re already on the list you do not need to sign up again. Please feel free to share the form with anyone else who may be interested:

 

Language use & gaps in STEM education

Today our microanalytic research group focused on videos of STEM education.

 

Watching STEM classes reminds me of a field trip a fellow researcher and I took to observe a physics class that used project based learning. Project based learning is a more hands on and collaborative teaching approach which is gaining popularity among physics educators as an alternative to traditional lecture. We observed an optics lab at a local university, and after the class we spoke about what we had observed. Whereas the other researcher had focused on the optics and math, I had been captivated by the awkwardness of the class. I had never envisioned the PJBL process to be such an awkward one!

 

The first video that we watched today involved a student interchangeably using the terms chart and graph and softening their use with the term “thing.” There was some debate among the researchers as to whether the student had known the answer but chosen to distance himself from the response or whether the student was hedging because he was uncertain. The teacher responded by telling the student not to talk about things, but rather to talk to her in math terms.

 

What does it mean to understand something in math? The math educators in the room made it clear that a lack of the correct terminology signaled that the student didn’t necessarily understand the subject matter. There was no way for the teacher to know whether the student knew the difference between a chart and a graph from their use of the terms. The conversation on our end was not about the conceptual competence that the student showed. He was at the board, working through the problem, and he had begun his interaction with a winding description of the process necessary (as he imagined it) to solve the problem. It was clear that he did understand the problem and the necessary steps to solve it on some level (whether correct or not), but that level of understanding was not one that mattered.

 

I was surprised at the degree to which the use of mathematical language was framed as a choice on the part of the student. The teacher asked the student to use mathematical language with her. One math educator in our group spoke about students “getting away with fudging answers.” One researcher said that the correct terms “must be used,” and another commented about the lack of correct terms as indication that the student did “not have a proper understanding” of the material. All of this talk seems to bely the underlying truth that the student chose to use inexact language for a reason (whether social or epistemic).

 

The next video we watched showed a math teacher working through a problem. I was really struck by her lack of enthusiasm. I noticed her sighs, her lack of engagement with the students even when directly addressing them, and her tone when reading the problem from the textbook. Despite her apparent lack of enthusiasm, her mood appeared considerably brighter when she finished working through the problem. I found this interesting, because physics teachers usually report that their favorite part of their job is watching the students’ “a-ha!” moments. Maybe the rewards of technical problem solving are a motivator for both students and teachers alike? But the process of technical problem solving itself is rarely as motivating.

 

All of this leads me to one particularly interesting question. How do people in STEM learning settings distance themselves from the material? What discursive tools do they use? Who uses these discursive tools? And does the use of these tools change over time? I wonder in particular whether discursive distancing, which often parallels female discursive patterns, is more common among females than males as they progress through their education? Is there any kind of quantitative correlate to the use of discursive distancing? Is it more common among people who believe that they aren’t good at STEM? Is discursive distancing less common among people who pursue STEM careers? Is there a correlation between distancing and test scores?

 

Awkwardness in STEM education is fertile ground for qualitative researchers. To what extent is the learning or solving process emphasized and to what extent is the answer valued above all else? How is mathematical language socialized? The process of solving technical problems is a messy and uncomfortable one. It rarely goes smoothly, and in fact challenges often lead to more challenges. The process of trying and failing or trying and learning is not a sexy or attractive one, and there is rampant concern that focusing on the process of learning robs students of the ability to demonstrate their knowledge in a way that matters to people who speak the traditional languages of math and science.

 

We spoke a little about the phenomena of connected math. It sounds to me very closely parallel to project based learning initiatives in physics. I was left wondering why such a similar teaching process could be valued as a teaching tool for all students in one field and relegated to a teaching tool for struggling students in another neighboring field. I wonder about the similarities and differences between the outcomes of these methods. Much of this may rest on politics, and I suspect that the politics are rooted in deeply held and less questioned beliefs about STEM education.

 

STEM education initiatives have grown quite a bit in recent years, and it’s clear that there is quite a bit of interesting research left to be done.

Fitness for Purpose, Representativeness and the perils of online reviews

Have you ever planned a trip online? In January, when I traveled to Amsterdam, I did all of the legwork online and ended up in a surprising place.

Amsterdam City Center is extremely easy to navigate. From the train station (a quick ride from the airport and a quick ride around The Netherlands), the canals extend outward like spokes. Each canal is flanked by streets. Then the city has a number of concentric rings emanating from the train station. Not only is the underlying map easy to navigate, there is a traveler station at the center and maps available periodically. English speaking tourists will see that not only do many people speak English, but Dutch has enough overlap with English to be comprehensible after even a short exposure.

But the city center experience was not as smooth for me. I studied map after map in the city center without finding my hotel. I asked for directions, and no one had heard of the hotel or the street it was on. The traveler center seemed flummoxed as well. Eventually I found someone who could help and found myself on a long commuter tram ride well outside the city center and tourist areas. The hotel had received great reviews and recommendations from many travelers. But clearly, the travelers who boasted about it were not quite the typical travelers, who likely would have ended up in one of the many hotels I saw from the tram window.

Have you ever discovered a restaurant online? I recently went to a nice, local restaurant that I’d been reading about for years. I ordered the truffle fries (fries with truffle salt and some kind of fondue sauce), because people had really raved about them, only to discover once they arrived that they were fundamentally french fries (totally not my bag- I hate fried food).

These review sites are not representative of anything. And yet we/I repeatedly use them as if they were reliable sources of information. One could easily argue that they may not be representative, but they are good enough for their intended use (fitness for purpose <– big, controversial notion from a recent AAPOR task force report on Nonprobability Sampling). I would argue that they are clearly not excellent for their intended use. But does that invalidate them altogether? They often they provide the only window that we have into the whatever it is that we intend them for.

Truffle fried aside, the restaurant was great. And location aside, the hotel was definitely an interesting experience.

Toilet capsule in hotel room (with frosted glass rotating pane for some degree of privacy)

Toilet capsule in hotel room (with frosted glass rotating pane for some degree of privacy)

The curse of the elevator speech

Yesterday I was involved in an innocent watercooler chat in which I was asked what Sociolinguistics is. This should be an easy enough question, because I just got a master’s degree in it. But it’s not. Sociolinguistics is a large field that means different things to different people. For every way of studying language, there are social and behavioral correlates that can also be studied. So a sociolinguist could focus on any number of linguistic areas, including phonology, syntax, semantics, or, in my case, discourse. My studies focus on the ways in which people use language, and the units of analysis in my studies are above the sentence level. Because Linguistics is such a large and siloed field, explaining Sociolinguistics through the lens of discourse analysis feels a bit like explaining vegetarianism through a pescatarian lens. The real vegetarians and the real linguists would balk.

There was a follow up question at the water cooler about y’all. “Is it a Southern thing?” My answer to this was so admittedly lame that I’ve been trying to think of a better one (sometimes even the most casual conversations linger, don’t they?).

My favorite quote of this past semester was from Jan Blommaert: “Language reflects a life, and not just a birth, and it is a life that is lived in a real sociocultural, historical and political space” Y’all has long been considered a southernism, but when I think back to my own experience with it, it was never about southern language or southern identity. One big clue to this is that I do sometimes use y’all, but I don’t use other southern language features along with it.

If I wanted to further investigate y’all from a sociolinguistic perspective, I would take language samples, either from one or a variety of speakers (and this sampling would have clear, meaningful consequences) and track the uses of y’all to see when it was invoked and what function it serves when invoked. My best, uninformed guess is that it does relational work and invokes registers that are more casual and nonthreatening. But without data, that is nothing but an uninformed guess.

This work has likely been done before. It would be interesting to see.
(ETA: Here is an example of this kind of work in action, by Barbara Johnstone)

Fertile soil from dry dirt. Thank you, Netherlands!

The mood workshop (microanalysis of online data) in Nijmegen last week was immensely helpful for me. In two short days, my research lost some branches and grew some deeper roots. Definitely worth 21+ hours of travel!

Aerial shot of Greenland. Can't tell where the clouds end and the snow and ice begin!

Aerial shot of Greenland. Can’t tell where the clouds end and the snow and ice begin!

The retooling began early on the first day. My first, burning question for the group was about choosing representative data. The shocking first answer: why? To someone with a quantitative background, this question was mind blowing. The sky is up, the ground is down, and data should be representative. But representative of what?

Here we return to the nature of the data. What data are you looking at? What kind of motivated behavior does it represent? Essentially, I am looking at online conversation. We know that counting conversational topics is fruitless- that’s the first truth of conversation analysis. And we know that counting conversational participation is usually misguided. So what was I trying to represent?

My goal is to track a silence that happens across site types, largely independent of stimulus. No matter what kind of news article about taxis in Washington DC, no matter the source, the driver perspective is almost completely absent, and if it is represented the responses are noticeably different or marked. I had thought that if I could find a way to count this underrepresentation I could launch a systematic, grounded critique of the notion of participatory media and pose the question of which values were being maintained from the ground up. What is social capital in online news discourse, who speaks, and which speakers are ratified?

But this is not a question of representative sampling alone. Although sampling could offer a sense of context to the data, the meat and potatoes of the analysis are in fact fodder for conversation analysis. A more useful and interesting research question emerged: how are these online conversations constructed so as to make a pro taxi response dispreferred or marked? This question invokes pronoun usage, intertextuality, conversational reach, crowd based sanctioning, conversational structure and pair parts, register, and more. It provides grounding for a rich, layered analysis. Fertile soil from dry dirt. Thank you, Netherlands.

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You'd be safer living in one of these houseboats!

Canal in Amsterdam (note: the workshop was in Nijmegen, not Amsterdam. Also note: the dangers of parallel parking next to a canal. You’d be safer living in one of these houseboats!

“Not everything that can be counted counts”

“Not everything that counts can be counted, and not everything that can be counted counts” – sign in Einstein’s Princeton office

This quote is from one of my favorite survey reminder postcards of all time, along with an image from from the Emilio Segre visual archives. The postcard layout was an easy and pleasant decision made in association with a straightforward survey we have conducted for nearly a quarter century. …If only social media analysis could be so easy, pleasant or straightforward!

I am in the process of conducting an ethnography of DC taxi drivers. I was motivated to do this study because of the persistent disconnect between the experiences and reports of the taxi drivers and riders I hear from regularly and the snarky (I know this term does not seem technical, but it is absolutely data motivated!) riders who dominate participatory media sources online. My goal at this point of the project is to chase down the disconnect in media participation and see how it maps to policy deliberations and offline experiences. This week I decided to explore ways of quantifying the disconnect.

Inspired by this article in jedem (the eJournal of eDemocracy and Open Government), I decided to start my search using framework based in Social Network Analysis (SNA), in order to use elements of connectedness, authority and relevance as a base. Fortunately, SNA frameworks are widely available to analysts on a budget in the form of web search engines! I went through the first 22 search results for a particular area of interest to my study: the mandatory GPS policy. Of these 22 sites, only 11 had active web 2.0 components. Across all of these sites, there were just two comments from drivers. Three of the sites that didn’t have any comments from drivers did have one post each that sympathized with or defended DC taxi drivers. The remaining three sites had no responses from taxi drivers and no sympathetic responses in defense of the drivers. Barring a couple of comments that were difficult to divine, the rest of the comments were negative comments about DC taxi drivers or the DC taxi industry. This matched my expectations, and, predictably, didn’t match any of my interviews or offline investigations.

The question at this point is one of denominator.

The easiest denominator to use, and, in fact, the least complicated was the number of sites. Using this denominator, only one quarter of the sites had any representation from a DC taxi driver. This is significant, given that the discussions were about aspects of their livelihood, and the drivers will be the most closely affected by the regulatory changes. This is a good, solid statistic from which to investigate the influence of web 2.0 on local policy enactment. However, it doesn’t begin to show the lack of representation the way that a denominator such as number of posts, number of posters, or number of opinions would have. But each one of these alternative denominators has its own set of headaches. Does it matter if one poster expresses an opinion once and another expresses another, slightly different opinion more than once? If everyone agrees, what should the denominator be? What about responses that contain links that are now defunct or insider references that aren’t meaningful to me? Should I consider measures of social capital, endorsements, social connectedness, or the backgrounds of individual posters?

The simplest figure also doesn’t show one of the most striking aspects of this finding; the relative markedness of these posts. In the context of predominantly short, snarky and clever responses, one of the comments began with a formal “Dear DC city councilmembers and intelligent  taxpayers,” and the other spread over three dense, winding posts in large paragraph form.

This brings up an important aspect of social media; that of social action. If every comment is a social action with social intentions, what are the intentions of the posters and how can these be identified? I don’t believe that the majority of posts left were intended as a voice in local politics, but the comments from the drivers clearly were. The majority of posts represent attempts to warrant social capital using humor, not attempts to have a voice in local politics. And they repeated phrases that are often repeated in web 2.0 discussions about the DC taxi situation, but rarely repeated elsewhere. This observation, of course, is pretty meaningless without being anchored to the data itself, both quantitatively and qualitatively. And it makes for some interesting ‘next steps’ in a project that is certainly not short of ‘next steps.’

The main point I want to make here is about the nature of variables in social media research. Compared to a survey, where you ask a question, determined in advance, and have a set of answers to work with in your analysis, you are free to choose your own variables for your analysis. Each choice brings with it a set of constraints and advantages, and some fit your data better than others. But the path to analysis can be a more difficult path to take, and more justification about the choices you make is important. To augment this, a quantitative analysis, which can sometimes have very arbitrary or less clear choices included in it, is best supplemented with a qualitative analysis that delves into the answers themselves and why they fit the coding structure you have imposed.

In all of this, I have quite a bit of work out ahead of me.