Reflections on Social Network Analysis & Social Media Research from #SMSociety13

A dispatch from a quantitative side of social media research!

Here are a few of my reflections from the Social Media & Society conference in Halifax and my Social Network Analysis class.

I should first mention that I was lucky in two ways.

  1. I finished the James Bond movie ‘Skyfall’ as my last Air Canada flight was landing. (Ok, I didn’t have to mention that)
  2. I finished my online course on Social Network Analysis  hours before leaving for a conference that kicked off with an excellent  talk about Networks and diffusion. And then on the second day of the conference I was able to manipulate a network visualization with my hands using a 96 inch touchscreen at the Dalhousie University Social Media Lab  (Great lab, by the way, with some very interesting and freely available tools)

 

This picture doesn't do this screen justice. This is *data heaven*

This picture doesn’t do this screen justice. This is *data heaven*

Social networks are networks built to describe human action in social media environments. They contain nodes (dots), which could represent people, usernames, objects, etc. and edges, lines joining nodes that represent some kind of relationship (friend, follower, contact, or a host of other quantitative measures). The course was a particularly great introduction to Social Network Analysis, because it included a book that was clear and interesting, a set of youtube videos and a website, all of which were built to work together. The instructor (Dr Jen Golbeck, also the author of the book and materials) has a very unique interest in SNA which gives the class an important added dimension. Her focus is on operational definitions and quantitative measures of trust, and because of this we were taught to carefully consider the role of the edges and edge weights in our networks.

Sharad Goel’s plenary at #SMSociety13 was a very different look at networks. He questioned the common notion of viral diffusion online by looking at millions of cases of diffusion. He discovered that very few diffusions actual resemble any kind of viral model. Instead, most diffusion happens on a very small scale. He used Justin Bieber as an example of diffusion. Bieber has the largest number of followers on Twitter, so when it he posts something it has a very wide reach (“the Bieber effect”). However, people don’t share content as often as we imagine. In fact, only a very small proportion of his followers share it, and only a small proportion of their followers share it. Overall, the path is wide and shallow, with less vertical layers than we had previously envisioned.

Goel’s research is an example of Big Data in action. He said that Big Data methods are important when the phenomenon you want to study happens very infrequently (e.g. one in a million), as is the case for actual instances of viral diffusion.

His conclusions were big, and this line of research is very informative and useful for anyone trying to communicate on a large scale.

Sidenote: the term ‘ego network’ came up quite a few times during the conference, but not everyone knew what an ego network is. An ego network begins with a single node and is measured by degrees. A one degree social network looks a bit like an asterisk- it simply shows all of the nodes that are directly connected to the original node. A 1.5 degree network would include the first degree connections as well as the connections between them. A two degree network contains all of the first degree connections to these nodes that were in the one degree network. And so on.

One common research strategy is to compare across ego networks.

My next post will move on from SNA to more qualitative aspects of the conference

Source: https://twitter.com/JeffreyKeefer/status/378921564281921537/photo/1 This was the backdrop for a qualitive panel

Source: https://twitter.com/JeffreyKeefer/status/378921564281921537/photo/1
This was the backdrop for a qualitative panel. It says “Every time you say ‘data driven decision’ a fairy dies.

Advertisements

Upcoming DC Event: Online Research Offline Lunch

ETA: Registration for this event is now CLOSED. If you have already signed up, you will receive a confirmation e-mail shortly. Any sign-ups after this date will be stored as a contact list for any future events. Thank you for your interest! We’re excited to gather with such a diverse and interesting group.

—–

Are you in or near the DC area? Come join us!

Although DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. This lunch will provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives.

Date & Time: August 6, 2013, 12:30 p.m.

Location: Near Gallery Place or Metro Center. Once we have a rough headcount, we’ll choose an appropriate location. (Feel free to suggest a place!)

Please RSVP using this form:

Spam, Personal histories and Language competencies

Over the recent holiday, I spent some time sorting through many boxes of family memorabilia. Some of you have probably done this with your families. It is fascinating, sentimental and mind-boggling. Highlights include both the things that strike a chord and things that can be thrown away. It’s a balance of efficiency and sap.

 

I’m always amazed by the way family memorabilia tells both private, personal histories and larger public ones. The boxes I dealt with last week were my mom’s, and her passion was politics. Even the Christmas cards she saved give pieces of political histories. Old thank you cards provide unknown nuggets of political strategy. She had even saved stirrers and plastic cups from an inauguration!

 

Campaign button found in the family files

Campaign button found in the family files

 

 

My mom continued to work in politics throughout her life, but the work that she did more recently is understandably fresher and more tangible for me. I remember looking through printed Christmas cards from politicians and wondering why she held on to them. In her later years I worried about her tendency to hold on to mail merged political letters. I wondered if her tendency to personalize impersonal documents made her vulnerable to fraud. To me, her belief in these documents made no sense.

 

Flash forward one year to me sorting through boxes of handwritten letters from politicians that mirror the spam she held on to. For many years she received handwritten letters from elected politicians in Washington. At some point, the handwritten letters evolved into typed letters that were hand-corrected and included handwritten sections. These evolved into typed letters on which the only handwriting was the signature. Eventually, even the signatures became printed. But the intention and function of these letters remained the same, even as their typography evolved. She believed in these letters because she had been receiving them for many decades. She believed they were personal because she had seen more of them that were personal than not. The phrases that I believe to be formulaic and spammy were once handwritten, intentional, personal and probably even heartfelt.

 

 

There are a few directions I could go from here:

 

– I better understand why older people complain about the impersonalization of modern society and wax poetic about the old letter writing tradition. I could include a few anecdotes about older family members.

 

– I’m amazed that people would take the time to write long letters using handwriting that may never have been deciphered

 

– I could wax poetic about some of the cool things I found in the storage facility

 

 

But I won’t. Not in this blog. Instead, I’ll talk about competencies.

 

Spam is a manifest of language competencies, although we often dismiss it as a total lack of language competence. In my Linguistics study, we were quickly taught the mantra “difference, not deficiency.” In fact it takes quite a bit of skill to develop spam letters. In survey research, the survey invitation letters that people so often dismiss have been heavily researched and optimized to yield a maximum response rate. In his book The Sociolinguistics of Globalization, Jan Blommaert details the many competencies necessary to create the Nigerian bank scam letters that were so heavily circulated a few years ago. And now I’ve learned that the political letters that I’m so quick to dismiss as thoughtless mail merges are actually part of a deep tradition of political action. Will that be enough for me to hold on to them? No. But I am saving the handwritten stuff. Boxes and boxes of it!

 

 

One day last week, as I drove to the storage facility I heard an interview with Michael Pollan about Food Literacy. Pollan’s point was that the food draughts in some urban areas are not just a function of access (Food draughts are areas where fresh food is difficult to obtain and grocery stores are few and far between, if they’re available at all). Pollan believes that even if there were grocery stores available, the people in these neighborhoods lack the basic cooking skills to prepare the food. He cited a few basic cooking skills which are not basic to me (partly because I’m a vegetarian, and partly because of the cooking traditions I learned from) as a part of his argument.

 

As a linguist, it is very interesting to hear the baggage that people attach to language metaphorically carried over to food (“food illiteracy”). I wonder what value the “difference, not deficiency” mantra holds here. I’m not ready to believe that people in areas subject to food draught are indeed kitchen illiterate. But I wouldn’t hesitate to agree that their food cultures probably differ significantly from Pollan’s. The basic staples and cooking methods probably differ significantly. Pollan could probably make a lot more headway with his cause if, instead of assuming that the people he is trying to help lack any basic cooking skills, he advocated toward a culture change that included access, attainability, and the potential to learn different practical cooking skills. It’s a subtle shift, but an important one.

 

As a proud uncook, I’m a huge fan of any kind of food preparation that is two steps or less, cheap, easy and fresh. Fast food for me involves putting a sweet potato in the microwave and pressing “potato,” grabbing for an apple or carrots and peanut butter, or tossing chickpeas into a dressing. Slow food involves the basic sautéing, roasting, etc. that Pollan advocates. I imagine that the skills he advocates are more practical and enjoyable for him than they are for people like me, whose mealtimes are usually limited and chaotic. What he calls basic is impractical for many of us. And the differences in time and money involved in uncooking and “basics” add up quickly.

 

 

 

So I’ve taken this post in quite a few directions, but it all comes together under one important point. Different language skills are not a lack of language skills altogether. Similarly, different survival skills are not a total lack of survival skills. We all carry unique skillsets that reflect our personal histories with those skills as well as the larger public histories that our personal histories help to compose. We, as people, are part of a larger public. The political spam I see doesn’t meet my expectations of valuable, personal communication, but it is in fact part of a rich political history. The people who Michael Pollan encounters have ways of feeding themselves that differ from Pollan’s expectations, but they are not without important survival skills. Cultural differences are not an indication of an underlying lack of culture.

2013-07-05 11.13.21

 

Total Survey Error: nanny to some, wise elder for some, strange parental friend for others

Total Survey Error and I are long-time acquaintences, just getting to know each other better. Looking at TSE is, for me, like looking at my work in survey research through a distorted mirror to an alternate universe. This week, I’ve spent some time closely reading Groves’ Past, Present and Future of Total Survey Error, and it provided some historical context to the framework, as well as an experienced account of its strengths and weaknesses.

Errors are an important area of study across many fields. Historically, models about error assumed that people didn’t really make errors often. Those attitudes are alive and well in many fields and workplaces today. Instead of carefully considering errors, they are often dismissed as indicators of incompetence. However, some workplaces are changing the way they approach errors. I did some collaborative research on medical errors in 2012 and was introduced to the term HRO or High-Reliability Organization. This is an error focused model of management that assumes that errors will be made, and not all errors can be anticipated. Therefore, every error should be embraced as a learning opportunity to build a better organizational framework.

From time to time, various members of our working group have been driven to create checklists for particular aspects of our work. In my experience, the checklists are very helpful for work that we do infrequently and virtually useless for work that we do daily. Writing a checklist for your daily work is a bit like writing instructions on how you brush your teeth and expecting to keep those instructions updated whenever you make a change of sorts. Undoubtedly, you’ll reread the instructions and wonder when you switched from a vertical to a circular motion for a given tooth. And yet there are so many important elements to our work, and so many areas where people could make less than ideal decisions (small or large). From this need rose Deming, with the first survey quality checklist. After Deming, a few other models arose. Eventually, TSE became the cumulative working framework or foundational framework for the field of survey research.

In my last blog, I spoke about the strangeness of coming across a foundational framework after working in the field without one. The framework is a conceptually important one, separating out sources of errors in ways that make shortcomings and strengths apparent and clarifying what is more or less known about a project.

But in practice, this model has not become the applied working model that its founders and biggest proponents expected it to be. This is for two reasons (that I’ll focus on), one of which Groves mentioned in some detail in this paper and one of which he barely touched on (but likely drove him out of the field).

1. The framework has mathematical properties, and this has led to its more intensive use on aspects of the survey process that are traditionally quantitative. TSE research in areas of sampling, coverage, response and aspects of analysis is quite common, but TSE research in other areas is much less common. In fact, many of the less quantifiable parts of the survey process are almost dismissed in favor of the more quantifiable parts. A survey with a particularly low TSE value could have huge underlying problems or be of minimal use once complete.
2. The framework doesn’t explicitly consider the human factors that govern research behind the scenes. Groves mentioned that the end users of the data are not deeply considered in the model, but neither are the other financial and personal (and personafinancial) constraints that govern much decision making. Ideally, the end goal of research is high quality research that yields a useful and relevant response for as minimal cost as possible. In practice, however, the goal is both to keep costs low and to satisfy a system of interrelated (and often conflicting) personal or professional (personaprofessional?) interests. If the most influential of these interests are not particularly interested in (or appreciative of) the model, practitioners are highly unlikely to take the time to apply it.

Survey research requires very close attention to detail in order to minimize errors. It requires an intimate working knowledge of math and of computer programming. It also benefits from a knowledge of human behavior and the research environment. If I were to recommend any changes to the TSE model, I would recommend a bit more task based detail, to incorporate more of the highly valued working knowledge that is often inherent and unspoken in the training of new researchers. I would also recommend a more of an HRO orientation toward error, anticipating and embracing unexpected errors as a source of additions to the model. And I would recommend some deeper incorporation of the personal and financial constraints and the roles they play (clearly an easier change to introduce than to flesh out in any great detail!). I would recommend a shift of focus, away from the quantitative modeling aspects and to the overall applicability and importance of a detailed, applied working model.

I’ve suggested before that survey research does not have a strong enough public face for the general public to understand or deeply value our work. A model that is better embraced by the field could for the basis for a public face, but the model would have to appeal to practitioners on a practical level. The question is: how do you get members of a well established field who have long been working within it and gaining expertise to accept a framework that grew into a foundational piece independent of their work?

Repeating language: what do we repeat, and what does it signal?

Yesterday I attended a talk by Jon Kleinberg entitled “Status, Power & Incentives in Social Media” in Honor of the UMD Human-Computer Interaction Lab’s 30th Anniversary.

 

This talk was dense and full of methods that are unfamiliar to me. He first discussed logical representations of human relationships, including orientations of sentiment and status, and then he ventured into discursive evidence of these relationships. Finally, he introduced formulas for influence in social media and talked about ways to manipulate the formulas by incentivizing desired behavior and deincentivizing less desired behavior.

 

In Linguistics, we talk a lot about linguistic accommodation. In any communicative event, it is normal for participant’s speech patterns to converge in some ways. This can be through repetition of words or grammatical structures. Kleinberg presented research about the social meaning of linguistic accommodation, showing that participants with less power tend to accommodate participants with more power more than participants with more power accommodate participants with less power. This idea of quantifying social influence is a very powerful notion in online research, where social influence is a more practical and useful research goal than general representativeness.

 

I wonder what strategies we use, consciously and unconsciously, when we accommodate other speakers. I wonder whether different forms of repetition have different underlying social meanings.

 

At the end of the talk, there was some discussion about both the constitution of iconic speech (unmarked words assembled in marked ways) and the meaning of norm flouting.

 

These are very promising avenues for online text research, and it is exciting to see them play out.

The Bones of Solid Research?

What are the elements that make research “research” and not just “observation?” Where are the bones of the beast, and do all strategies share the same skeleton?

Last Thursday, in my Ethnography of Communication class, we spent the first half hour of class time taking field notes in the library coffee shop. Two parts of the experience struck me the hardest.

1.) I was exhausted. Class came at the end of a long, full work day, toward the end of a week that was full of back to school nights, work, homework and board meetings. I began my observation by ordering a (badly needed) coffee. My goal as I ordered was to see how few words I had to utter in order to complete the transaction. (In my defense, I am usually relatively talkative and friendly…) The experience of observing and speaking as little as possible reminded me of one of the coolest things I’d come across in my degree study: Charlotte Linde, SocioRocketScientist at NASA

2.) Charlotte Linde, SocioRocketScientist at NASA. Dr Linde had come to speak with the GU Linguistics department early in my tenure as a grad student. She mentioned that her thesis had been about the geography of communication- specifically: How did the layout of an (her?) apartment building help shape communication within it?

This idea had struck me, and stayed with me, but it didn’t really make sense until I began to study Ethnography of Communication. In the coffee shop, I structured my fieldnotes like a map and investigated it in terms of zones of activities. Then I investigated expectations and conventions of communication in each zone. As a follow-up to this activity, I’ll either return to the same shop or head to another coffee shop to do some contrastive mapping.

The process of Ethnography embodies the dynamic between quantitative and qualitative methods for me. When I read ethnographic research, I really find myself obsessing over ‘what makes this research?’ and ‘how is each statement justified?’ Survey methodology, which I am still doing every day at work, is so deeply structured that less structured research is, by contrast, a bit bewildering or shocking. Reading about qualitative methodology makes it seem so much more dependable and structured than reading ethnographic research papers does.

Much of the process of learning ethnography is learning yourself; your priorities, your organization, … learning why you notice what you do and evaluate it the way you do… Conversely, much of the process of reading ethnographic research seems to involve evaluation or skepticism of the researcher, the researcher’s perspective and the researcher’s interpretation. As a reader, the places where the researcher’s perspective varies from mine is clear and easy to see, as much as my own perspective is invisible to me.

All of this leads me back to the big questions I’m grappling with. Is this structured observational method the basis for all research? And how much structure does observation need to have in order to qualify as research?

I’d be interested to hear what you think of these issues!

Unlocking patterns in language

In linguistics study, we quickly learn that all language is patterned. Although the actual words we produce vary widely, the process of production does not. The process of constructing baby talk was found to be consistent across kids from 15 different languages. When any two people who do not speak overlapping languages come together and try to speak, the process is the same. When we look at any large body of data, we quickly learn that just about any linguistic phenomena is subject to statistical likelihood. Grammatical patterns govern the basic structure of what we see in the corpus. Variations in language use may tweak these patterns, but each variation is a patterned tweak with its own set of statistical likelihoods. Variations that people are quick to call bastardizations are actually patterned departures from what those people consider to be “standard” english. Understanding “differences not defecits” is a crucially important part of understanding and processing language, because any variation, even texting shorthand, “broken english,” or slang, can be better understood and used once its underlying structure is recognized.

The patterns in language extend beyond grammar to word usage. The most frequent words in a corpus are function words such as “a” and “the,” and the most frequent collocations are combinations like “and the” or “and then it.” These patterns govern the findings of a lot of investigations into textual data. A certain phrase may show up as a frequent member of a dataset simply because it is a common or lexicalized expression, and another combination may not appear because it is more rare- this could be particularly problematic, because what is rare is often more noticeable or important.

Here are some good starter questions to ask to better understand your textual data:

1) Where did this data come from? What was it’s original purpose and context?

2) What did the speakers intend to accomplish by producing this text?

3) What type of data or text, or genre, does this represent?

4) How was this data collected? Where is it from?

5) Who are the speakers? What is their relationship to eachother?

6) Is there any cohesion to the text?

7) What language is the text in? What is the linguistic background of the speakers?

8) Who is the intended audience?

9) What kind of repetition do you see in the text? What about repetition within the context of a conversation? What about repetition of outside elements?

10) What stands out as relatively unusual or rare within the body of text?

11) What is relatively common within the dataset?

12) What register is the text written in? Casual? Academic? Formal? Informal?

13) Pronoun use. Always look at pronoun use. It’s almost always enlightening.

These types of questions will take you much further into your dataset that the knee-jerk question “What is this text about?”

Now, go forth and research! …And be sure to report back!