Reflections on Social Network Analysis & Social Media Research from #SMSociety13

A dispatch from a quantitative side of social media research!

Here are a few of my reflections from the Social Media & Society conference in Halifax and my Social Network Analysis class.

I should first mention that I was lucky in two ways.

  1. I finished the James Bond movie ‘Skyfall’ as my last Air Canada flight was landing. (Ok, I didn’t have to mention that)
  2. I finished my online course on Social Network Analysis  hours before leaving for a conference that kicked off with an excellent  talk about Networks and diffusion. And then on the second day of the conference I was able to manipulate a network visualization with my hands using a 96 inch touchscreen at the Dalhousie University Social Media Lab  (Great lab, by the way, with some very interesting and freely available tools)

 

This picture doesn't do this screen justice. This is *data heaven*

This picture doesn’t do this screen justice. This is *data heaven*

Social networks are networks built to describe human action in social media environments. They contain nodes (dots), which could represent people, usernames, objects, etc. and edges, lines joining nodes that represent some kind of relationship (friend, follower, contact, or a host of other quantitative measures). The course was a particularly great introduction to Social Network Analysis, because it included a book that was clear and interesting, a set of youtube videos and a website, all of which were built to work together. The instructor (Dr Jen Golbeck, also the author of the book and materials) has a very unique interest in SNA which gives the class an important added dimension. Her focus is on operational definitions and quantitative measures of trust, and because of this we were taught to carefully consider the role of the edges and edge weights in our networks.

Sharad Goel’s plenary at #SMSociety13 was a very different look at networks. He questioned the common notion of viral diffusion online by looking at millions of cases of diffusion. He discovered that very few diffusions actual resemble any kind of viral model. Instead, most diffusion happens on a very small scale. He used Justin Bieber as an example of diffusion. Bieber has the largest number of followers on Twitter, so when it he posts something it has a very wide reach (“the Bieber effect”). However, people don’t share content as often as we imagine. In fact, only a very small proportion of his followers share it, and only a small proportion of their followers share it. Overall, the path is wide and shallow, with less vertical layers than we had previously envisioned.

Goel’s research is an example of Big Data in action. He said that Big Data methods are important when the phenomenon you want to study happens very infrequently (e.g. one in a million), as is the case for actual instances of viral diffusion.

His conclusions were big, and this line of research is very informative and useful for anyone trying to communicate on a large scale.

Sidenote: the term ‘ego network’ came up quite a few times during the conference, but not everyone knew what an ego network is. An ego network begins with a single node and is measured by degrees. A one degree social network looks a bit like an asterisk- it simply shows all of the nodes that are directly connected to the original node. A 1.5 degree network would include the first degree connections as well as the connections between them. A two degree network contains all of the first degree connections to these nodes that were in the one degree network. And so on.

One common research strategy is to compare across ego networks.

My next post will move on from SNA to more qualitative aspects of the conference

Source: https://twitter.com/JeffreyKeefer/status/378921564281921537/photo/1 This was the backdrop for a qualitive panel

Source: https://twitter.com/JeffreyKeefer/status/378921564281921537/photo/1
This was the backdrop for a qualitative panel. It says “Every time you say ‘data driven decision’ a fairy dies.

Advertisement

MOOC’s, Libraries, Online learning and Thirsting for knowledge

Let me begin by telling you a story.

The story began when I was in high school searching for the right college. My mom and I took a road trip the summer after my junior year of college. We took our time and covered quite a bit of ground. I discovered Hot97 in New York and Pepto Bismol in North Carolina. I fell in love with upstate NY. After our return, I began the application and interview process. The most memorable moment came during my interview with a representative from Cornell. She asked if I had any burning questions, and I decided to go ahead and ask her a question that had really been nagging at me: What is the difference between a class at Cornell and a class at a community college? She was shocked and deeply offended. She told me that anyone could get a great education anywhere they could find a library, and obviously I wasn’t right for Cornell.

This exchange has haunted me ever since. I do love to read, sure, but a library alone could never create the magic that a classroom can create. And the most magical classes happen when the students are engaged, interested, attentive, involved, participating, excited and following through with the homework. Part of this magic comes from the teacher. A great teacher can cultivate this kind of environment with ease, but most really struggle when it doesn’t happen organically.

I’m not sure everyone would agree that classrooms can be magical. I may have been spoiled with great classes. I’ve just finished a masters’ program where I loved the classes, loved the reading and loved the assignments, but I’m not sure that every student would approach school with as much relish. I love learning.

Tomorrow I begin an educational experiment. I will start a course in Social Network Analysis from Statistics.com. This is a paid course, and I’ve chosen to be held responsible for my work (you can choose whether or not to submit homework for grading). Next month, the experiment will deepen when I begin my first MOOC. The MOOC is a data analysis class that teaches R. I’m very eager to learn R and to revisit some statistical methods that I haven’t been able to use much. The experiment will not be pure, because three of my coworkers have decided to attend the class as well. We’ll be fortunate enough to experience part of the course in-person.

I’m not sure how I feel about distance education before beginning this experiment. Learning is something that I really love to do in-person. But so many things that happen online can be evaluated the same way. I recently read articles and commentary about a controversial paper on Twitter research SSRN-id2235423. The research is fodder for some great discussion, but many commenters on the news articles simply chose to trash Twitter. They bemoaned the 140 character limit so strongly that one would think that Twitter is a land of Paris Hilton’s and cats. I’d like the critics to know that yes, you can find Paris Hilton and cats on Twitter or just about anywhere else online. But you can also find something deeper, something that interests you. I recently introduced my nephew to Twitter. He’s a news junkie of sorts, and he was fascinated to see how much emerging news and quality commentary was available. The weekly #wjchat’s alone are reason to follow Twitter (#wjchat is a weekly methods chat between social media journalists) The reach of people on Twitter is unparalleled, and the ability to follow specific areas of interest in deeply engaged ways is also unparalleled. When used correctly, Twitter is a powerful tool.

Online learning as well as the potential to be a powerful tool. But it will require engagement from the people involved. We will need to suspend our natural hesitancy and develop the necessary competencies. I really hope that my classmates will be willing to embrace the experience!

Deeper into the family files; recipes and gendered histories

One picture of my mom really captures me. It is of my parents at a political event, greeting George Will. My mother was heavily involved in politics, and she had likely been one of the event’s organizers, partially responsible for bringing George Will to town. She has a huge smile, and her eyes are sparkling. She is focused on George Will. But George Will is not even looking in her direction. He’s greeting my father, who likely had nothing to do with the event.

The deeper I look into the family files, the deeper my understanding of my mother’s particular brand of feminism grows.

As I grew into a woman, she was proud of me for many reasons, but she never came to grips with my love of cooking. She hated cooking with a passion, and she treated kitchens like they held hostages, like they had latch stations at their gates that clicked into place around unsuspecting ankles as they crossed the threshold. She was a great cook. She created souffles with ease. But she hated the kitchen and hated to see me cook. In fact, I learned to cook against her will, once I’d left home for college. In our lives together we shared many activities, but we never cooked together. She never shared a recipe with me. Granted, I didn’t push. I am a vegetarian, bent on healthy cooking and vegan substitutions, and she was a part of a sour cream generation.

Her kitchen history came alive in a wholly different way as I sorted through her papers. I found boxes, books, clippings and handwritten recipes. I’d seen all of these often in my youth, but I’d never looked through them. As I looked through them now, a new kind of culture began to take shape. These recipes weren’t the anonymous instructions that I find on the internet when I search. They had histories. They belonged to the women that created them. They gave credit to any creative twist on the old standards. They seemed as unique as footprints. And they were clearly passed around quite a bit. I imagined my mom tasting something delicious at a friend’s house and asking for the recipe, and I imagined the pride that the cook had felt in that moment. I can imagine moments like these, but they seem incongruous, deeply out of character for all involved.

Cooking is not simply about our need to fuel our bodies. And it’s a different process for my husband to cook (he loves to cook and has a professional cooking background) than it is for me. My time in the kitchen is part of a deeply gendered history. It is heavy with expectations, ideals and predefined roles. Maybe this is why I avoid recipes? Following a recipe seems to be about creating an ideal and trying to embody it. It’s about believing in your potential to make some fantasy a reality for your family. It’s about embodying a role that has been laid before you. It’s about achieving an unrealistic standard. A successful dish isn’t just food for the belly or a pleasant taste. It’s a sense of accomplishment, a sense of pride, a sense of achievement. It’s about the success of the cook and the nurturing of those around the cook. It connects a woman to a greater tradition of women in the kitchen.

In our histories, people are pegged into traditional societal roles that they may or may not fit into easily. One one hand, they are held back from other roles and relegated to these. But on the other hand, they embody these roles in a way that rises above the call of duty.

These traditions embody uniqueness, a common respect and understanding, a kind of sisterhood, and a common striving. My mom hated the kitchen. But she was a part of a sisterhood that I’m discovering as more of a historian than a participant. Would I trade my professional or academic success for that sisterhood? Absolutely not. But as a woman in the kitchen, I want to understand what these traditions meant to the women who came before me. I want to understand how they redefined them and rose above them. I want to understand how they fit themselves and the women around them into these roles.

I will pass these recipes on to my daughters- not as instructions for cooking or instructions for life, but as a way of carrying on a sisterhood forged by the women who came before us.

I recently accomplished my first successful omelet!

I recently accomplished my first successful omelet!

For further (& really interesting) reading: http://www.presenttensejournal.org/vol1/cooking-codes-cookbook-discourses-as-womens-rhetorical-practices/

Fitness for Purpose, Representativeness and the perils of online reviews

Have you ever planned a trip online? In January, when I traveled to Amsterdam, I did all of the legwork online and ended up in a surprising place.

Amsterdam City Center is extremely easy to navigate. From the train station (a quick ride from the airport and a quick ride around The Netherlands), the canals extend outward like spokes. Each canal is flanked by streets. Then the city has a number of concentric rings emanating from the train station. Not only is the underlying map easy to navigate, there is a traveler station at the center and maps available periodically. English speaking tourists will see that not only do many people speak English, but Dutch has enough overlap with English to be comprehensible after even a short exposure.

But the city center experience was not as smooth for me. I studied map after map in the city center without finding my hotel. I asked for directions, and no one had heard of the hotel or the street it was on. The traveler center seemed flummoxed as well. Eventually I found someone who could help and found myself on a long commuter tram ride well outside the city center and tourist areas. The hotel had received great reviews and recommendations from many travelers. But clearly, the travelers who boasted about it were not quite the typical travelers, who likely would have ended up in one of the many hotels I saw from the tram window.

Have you ever discovered a restaurant online? I recently went to a nice, local restaurant that I’d been reading about for years. I ordered the truffle fries (fries with truffle salt and some kind of fondue sauce), because people had really raved about them, only to discover once they arrived that they were fundamentally french fries (totally not my bag- I hate fried food).

These review sites are not representative of anything. And yet we/I repeatedly use them as if they were reliable sources of information. One could easily argue that they may not be representative, but they are good enough for their intended use (fitness for purpose <– big, controversial notion from a recent AAPOR task force report on Nonprobability Sampling). I would argue that they are clearly not excellent for their intended use. But does that invalidate them altogether? They often they provide the only window that we have into the whatever it is that we intend them for.

Truffle fried aside, the restaurant was great. And location aside, the hotel was definitely an interesting experience.

Toilet capsule in hotel room (with frosted glass rotating pane for some degree of privacy)

Toilet capsule in hotel room (with frosted glass rotating pane for some degree of privacy)

What next, after graduation?

A question that recent graduates are often asked is “what next, now that you’ve graduated?” This is a different question for graduates in different stages of their lives. When I finished my bachelor’s I could answer with the types of jobs I was applying to and my plans of where to live next. In fact, I wasn’t one to leave these big questions unanswered: I moved and began a full-time research position within a few weeks of my last set of finals. I was eager to begin my life without school. Nine months later I began another research position, chosen because of the shear intensity and rigor of the interview (I had two interviewers firing questions at me, and I loved it. Crazy, right?). At this point, I’ve been at the second job for about 14 years.

What keeps you at a job for 14 years? This is an important question, because keeping with a job when everything is not fresh and new is a special sort of challenge. There have been a few keys:

1. Stay in the moment. There are quite a few different projects that I juggle at once, and I work on each project across multiple stages. For each of these stages in the research process, I have elements that I particularly enjoy. I try to focus on these key elements while I work on each project.

2. Know yourself. As a worker, I know that I have little patience for repetitive tasks. I tend to be very hardworking and productive, but when tasks become repetitive I quickly get distracted. If I can, I always delegate these tasks away. If I can’t, I juggle them with other projects that complement them, such as tasks that I need to spend more time thinking strategically about or tasks that either have a deadline or can be given a set of short term goals. This way, I feel productive and maintain my morale.

3. Feed yourself. I’ve also learned that I hunger to learn new things. I take advantage of every opportunity to learn new things, to share the new knowledge with my coworkers, and to integrate the things I learn into my work. This keeps my projects fresh. In addition to the standard, core reports that I produce, for example, I add new kinds of analyses or data. This makes the reports more interesting to produce, and it probably keeps them fresh for the reader as well.

4. Maintain relationships. I’ve been lucky enough to work with people I genuinely enjoy and to see them through marriages, graduations, births, deaths, as well as the silly packages they recieve at work. This helps to make work an enjoyable place.

5. Keep moving. Go to the gym, if you can. Go on a walk, if you can. Get up and stretch. Drink a lot of fluids.

Now, back to the question. “What next, after graduation?” For me, this is not a question with a clear, obvious answer. School disturbs the equillibrium of every day life. Juggling work, school and family left me on a constant cycle of challenges and [mostly] successes. How do you come down from that? What happens to that level of productivity? As a mom, there is a looming stack of laundry, dishes and other household tasks always waiting at the ready. In the past week alone, I’ve spent over 6 hours doing make-up gymnastic lessons (with another 2.5 hours coming tomorrow!). Life expands to fit any empty spaces. But given a trade-off between reading Blommaert and folding laundry…

I read a commencement speech by Daniel Foster Wallace that addressed the monotony of life and the power of being alive through the seemingly routine moments. I plan to do just that, but I was shocked to see it laid out in a commencement address. To be a student is to be saddled with the potential of what life could be, and that stands in such contrast to the smaller, daily joys of life without school. I often wondered how well prepared the students around me who hadn’t yet left academia were for life “on the other side.” Now I can see why some people choose to stay in school! If it weren’t for the many sacrifices my family made in order for me to go to school, I probably would have already enrolled in a PhD program.

The transition is surprisingly difficult, and I haven’t yet figured it out.

Representativeness, qual & quant, and Big Data. Lost in translation?

My biggest challenge in coming from a quantitative background to a qualitative research program was representativeness. I came to class firmly rooted in the principle of Representativeness, and my classmates seemed not to have any idea why it mattered so much to me. Time after time I would get caught up in my data selection. I would pose the wider challenge of representativeness to a colleague, and they would ask “representative of what? why?”

 

In the survey research world, the researcher begins with a population of interest and finds a way to collect a representative sample of the population for study. In the qualitative world that accompanies survey research units of analysis are generally people, and people are chosen for their representativeness. Representativeness is often constructed by demographic characteristics. If you’ve read this blog before, you know of my issues with demographics. Too often, demographic variables are used as a knee jerk variable instead of better considered variables that are more relevant to the analysis at hand. (Maybe the census collects gender and not program availability, for example, but just because a variable is available and somewhat correlated doesn’t mean that it is in fact a relevant variable, especially when the focus of study is a population for whom gender is such an integral societal difference.)

 

And yet I spent a whole semester studying 5 minutes of conversation between 4 people. What was that representative of? Nothing but itself. It couldn’t have been exchanged for any other 5 minutes of conversation. It was simply a conversation that this group had and forgot. But over the course of the semester, this piece of conversation taught me countless aspects of conversation research. Every time I delved back into the data, it became richer. It was my first step into the world of microanalysis, where I discovered that just about anything can be a rich dataset if you use it carefully. A snapshot of people at a lecture? Well, how are their bodies oriented? A snapshot of video? A treasure trove of gestures and facial expressions. A piece of graffiti? Semiotic analysis! It goes on. The world of microanalysis is built on the practice of layered noticing. It goes deeper than wide.

 

But what is it representative of? How could a conversation be representative? Would I need to collect more conversations, but restrict the participants? Collect conversations with more participants, but in similar contexts? How much or how many would be enough?

 

In the world of microanalysis, people and objects constantly create and recreate themselves. You consistently create and recreate yourself, but your recreations generally fall into a similar range that makes you different from your neighbors. There are big themes in small moments. But what are the small moments representative of? Themselves. Simply, plainly, nothing more and nothing else. Does that mean that they don’t matter? I would argue that there is no better way to understand the world around us in deep detail than through microanalysis. I would also argue that macroanalysis is an important part of discovering the wider patterns in the world around us.

 

Recently a NY Times blog post by Quentin Hardy has garnered quite a bit of attention.

Why Big Data is Not Truth: http://bits.blogs.nytimes.com/2013/06/01/why-big-data-is-not-truth/

This post has really struck a chord with me, because I have had a hard time understanding Hardy’s complaint. Is big data truth? Is any data truth? All data is what it is; a collection of some sort, collected under a specific set of circumstances. Even data that we hope to be more representative has sampling and contextual limitations. Responsible analysts should always be upfront about what their data represents. Is big data less truthful than other kinds of data? It may be less representative than, say, a systematically collected political poll. But it is what it is: different data, collected under different circumstances in a different way. It shouldn’t be equated with other data that was collected differently. One true weakness of many large scale analyses is the blindness to the nature of the data, but that is a byproduct of the training algorithms that are used for much of the analysis. The algorithms need large training datasets, from anywhere. These sets often are developed through massive web crawlers. Here, context gets dicey. How does a researcher represent the data properly when they have no idea what it is? Hopefully researchers in this context will be wholly aware that, although their data has certain uses, it also has certain [huge] limitations.

 

I suspect that Hardy’s complaint is with the representations of massive datasets collected from webcrawlers as a complete truth from which any analyses could be run and all of the greater truths of the world could be revealed. On this note, Hardy is exactly right. Data simply is what it is, nothing more and nothing less. And any analysis that focuses on an unknown dataset is just that: an analysis without context. Which is not to say that all analyses need to be representative, but rather that all responsible analyses of good quality need to be self aware. If you do not know what the data represents and when and how it was collected, then you cannot begin to discuss the usefulness of any analysis of it.

What is the role of Ethnography and Microanalysis in Online Research?

There is a large disconnect in online research.

The largest, most profile, highest value and most widely practiced side of online research was created out of a high demand to analyze the large amount of consumer data that is constantly being created and largely public available. This tremendous demand led to research methods that were created in relative haste. Math and programming skills thrived in a realm where social science barely made a whisper. The notion of atheoretical research grew. The level of programming and mathematical competence required to do this work continues to grow higher every day, as the fields of data science and machine learning become continually more nuanced.

The largest, low profile, lowest value and increasingly more practiced side of online research is the academic research. Turning academia toward online research has been like turning a massive ocean liner. For a while online research was not well respected. At this point it is increasingly well respected, thriving in a variety of fields and in a much needed interdisciplinary way, and driven by a search for a better understanding of online behavior and better theories to drive analyses.

I see great value in the intersection between these areas. I imagine that the best programmers have a big appetite for any theory they can use to drive their work in a useful and productive ways. But I don’t see this value coming to bear on the market. Hiring is almost universally focused on programmers and data scientists, and the microanalytic work that is done seems largely invisible to the larger entities out there.

It is common to consider quantitative and qualitative research methods as two separate languages with few bilinguals. At the AAPOR conference in Boston last week, Paul Lavarakas mentioned a book he is working on with Margaret Roller which expands the Total Survey Error model to both quantitative and qualitative research methodology. I spoke with Margaret Roller about the book, and she emphasized the importance of qualitative researchers being able to talk more fluently and openly about methodology and quality controls. I believe that this is, albeit a huge challenge in wording and framing, a very important step for qualitative research, in part because quality frameworks lend credibility to qualitative research in the eyes of a wider research community. I wish this book a great deal of success, and I hope that it is able to find an audience and a frame outside the realm of survey research (Although survey research has a great deal of foundational research, it is not well known outside of the field, and this book will merit a wider audience).

But outside of this book, I’m not quite sure where or how the work of bringing these two distinct areas of research can or will be done.

Also at the AAPOR conference last week, I participated in a panel on The Role of Blogs in Public Opinion Research (intro here and summary here). Blogs serve a special purpose in the field of research. Academic research is foundational and important, but the publish rate on papers is low, and the burden of proof is high. Articles that are published are crafted as an argument. But what of the bumps along the road? The meditations on methodology that arise? Blogs provide a way for researchers to work through challenges and to publish their failures. They provide an experimental space where fields and ideas can come together that previously hadn’t mixed. They provide a space for finding, testing, and crossing boundaries.

Beyond this, they are a vehicle for dissemination. They are accessible and informally advertised. The time frame to publish is short, the burden lower (although I’d like to believe that you have to earn your audience with your words). They are a public face to research.

I hope that we will continue to test these boundaries, to cross over barriers like quantitative and qualitative that are unhelpful and obtrusive. I hope that we will be able to see that we all need each other as researchers, and the quality research that we all want to work for will only be achieved through the mutual recognition that we need.

Digital Democracy Remixed

I recently transitioned from my study of the many reasons why the voice of DC taxi drivers is largely absent from online discussions into a study of the powerful voice of the Kenyan people in shaping their political narrative using social media. I discovered a few interesting things about digital democracy and social media research along the way, and the contrast between the groups was particularly useful.

Here are some key points:

  • The methods of sensemaking that journalists use in social media is similar to other methods of social media research, except for a few key factors, the most important of which is that the bar for verification is higher
  • The search for identifiable news sources is important to journalists and stands in contrast with research methods that are built on anonymity. This means that the input that journalists will ultimately use will be on a smaller scale than the automated analyses of large datasets widely used in social media research.
  • The ultimate information sources for journalists will be small, but the phenomena that will capture their attention will likely be big. Although journalists need to dig deep into information, something in the large expanse of social media conversation must capture or flag their initial attention
  • It takes some social media savvy to catch the attention of journalists. This social media savvy outweighs linguistic correctness in the ultimate process of getting noticed. Journalists act as intermediaries between social media participants and a larger public audience, and part of the intermediary process is language correcting.
  • Social media savvy is not just about being online. It is about participating in social media platforms in a publicly accessible way in regards to publicly relevant topics and using the patterned dialogic conventions of the platform on a scale that can ultimately draw attention. Many people and publics go online but do not do this.

The analysis of social media data for this project was particularly interesting. My data source was the comments following this posting on the Al Jazeera English Facebook feed.

fb

It evolved quite organically. After a number of rounds of coding I noticed that I kept drawing diagrams in the margins of some of the comments. I combined the diagrams into this framework:

scales

Once this framework was built, I looked closely at the ways in which participants used this framework. Sometimes participants made distinct discursive moves between these levels. But when I tried to map the participants’ movements on their individual diagrams, I noticed that my depictions of their movements rarely matched when I returned to a diagram. Although my coding of the framework was very reliable, my coding of the movements was not at all. This led me to notice that oftentimes the frames were being used more indexically. Participants were indexing levels of the frame, and this indexical process created powerful frame shifts. So, on the level of Kenyan politics exclusively, Uhuru’s crimes had one meaning. But juxtaposed against the crimes of other national leaders’ Uhuru’s crimes had a dramatically different meaning. Similarly, when the legitimacy of the ICC was questioned, the charges took on a dramatically different meaning. When Uhuru’s crimes were embedded in the postcolonial East vs West dynamic, they shrunk to the degree that the indictments seemed petty and hypocritical. And, ultimately, when religion was invoked the persecution of one man seemed wholly irrelevant and sacrilegious.

These powerful frame shifts enable the Kenyan public to have a powerful, narrative changing voice in social media. And their social media savvy enables them to gain the attention of media sources that amplify their voices and thus redefine their public narrative.

readyforcnn

Still grappling with demographics

Last year I wrote about my changing perspective on demographic variables. My grappling has continued since then.
I think of it as an academic puberty of sorts.

I remember the many crazy thought exercises I subjected myself to as a teenager, as I tried to forge my own set of beliefs and my own place in the world. I questioned everything. At times I was under so much construction that it was a wonder I functioned at all. Thankfully, I survived to enter my twenties intact. But lately I have been caught in a similar thought exercise of sorts, second guessing the use of sociological demographic variables in research.

Two sample projects mark two sides of the argument. One is a potential study of the climate for underrepresented faculty members in physics departments. In our exploration of this subject, the meaning of underrepresented was raised. Indeed there are a number of ways in which a faculty member could be underrepresented or made uncomfortable: gender, race, ethnicity, accent, bodily differences or disabilities, sexual orientation, religion, … At some point, one could ask whether it matters which of these inspired prejudicial or different treatment, or whether the hostile climate is, in and of itself, important to note. Does it make sense to tick off which of a set of possible prejudices are stronger or weaker at a particular department? Or does it matter first that the uncomfortable climate exists, and that personal differences that should be professionally irrelevant are coming into professional play. One could argue that the climate should be the first phase of the study, and any demographics could be secondary. One might be particularly tempted to argue for this arrangement given the small sizes of the departments and hesitation among many faculty members to supply information that could identify them personally.

If that was the only project on my mind, I might be tempted to take a more deconstructionist view of demographic variables altogether. But there is another project that I’m working on that argues against the deconstructionist view- the Global Survey of Physicists.

(Side or backstory: The global survey is kind of a pet project of mine, and it was the project that led me to grad school. Working on it involved coordinating survey design, translation and dissemination with representatives from over 100 countries. This was our first translation project. It began in English and was then translated into 7 additional languages. The translation process took almost a full year and was full of unexpected complications. Near the end of this phase, I attended a talk at the Bureau of Labor Statistics by Yuling Pan from Census. The talk was entitled ‘the Sociolinguistics of Survey Translation.’ I attended it never having heard of Sociolinguistics before. During the course of the talk, Yuling detailed and dissected experiences that paralleled my own into useful pieces and diagnosed and described some of the challenges I had encountered in detail. I was so impressed with her talk that I googled Sociolinguistics as soon as I returned to my office, discovered the MLC a few minutes later. One month later I was visiting Georgetown and working on my application for the MLC. I like to say it was like being swept up off my feet and then engaging in a happy shotgun marriage)

The Global Survey was designed to elicit gender differences in terms of experiences, climate, resources and opportunities, as well as the effects of personal and family constraints and decisions on school and career. The survey worked particularly well, and each dive into the data proves fascinating. This week I delved deeper into the dynamics of one country and saw women’s sources of support erode as they progressed further into school and work, saw the women transition from a virtual parity in school to difficult careers, beginning with their significantly larger chance of having to choose their job because it was the only offer they received, and becoming significantly worse with the introduction of kids. In fact, we found through this survey that kids tend to slow women’s careers and accelerate men’s!

What do these findings say about the use of demographic variables? They certainly validate their usefulness and cause me to wonder whether a lack of focus on demographics would lessen the usefulness of the faculty study. Here I’m reminded that it is important, when discussing demographic variables, to keep in mind that they are not arbitrary. They reflect ways of seeing that are deeply engrained in society. Gender, for example, is the first thing to note about a baby, and it determines a great deal from that point in. Excluding race or ethnicity seems foolish, too, in a society that so deeply engrains these distinctions.

The problem may be in the a priori or unconsidered applications of demographic variables. All too often, the same tired set of variables are dredged up without first considering whether they would even provide a useful distinction or the most useful cuts to a dataset. A recent example of this is the study that garnered some press about racial differences in e-learning. From what I read of the study, all e-learning was collapsed into a single entity, an outcome or dependent variable (as in some kind if measure of success of e-learning), and run by a set of traditional x’s or independent variables, like race and socioeconomic status. In this case, I would have preferred to first see a deeper look into the mechanics of e-learning than a knee jerk rush to the demographic variables. What kind of e-learning course was it? What kinds of interaction were fostered between the students and the teacher, material and other students? So many experiences of e-learning were collapsed together, and differences in course types and learning environments make for more useful and actionable recommendations than demographics ever could.

In the case of the faculty and global surveys as well, one should ask what approaches to the data would yield the most useful analyses. Finding demographic differences leads to what- an awareness of discrimination? Discrimination is deep seeded and not easily cured. It is easy to document and difficult to fix. And yet, more specific information about climate, resources and opportunities could be more useful or actionable. It helps to ask what we can achieve through our research. Are we simply validating or proving known societal differences or are we working to create actionable recommendations? What are the most useful distinctions?

Most likely, if you take the time to carefully consider the information you collect, the usefulness of your analyses and the validity of your hypotheses, you are one step above anyone rotely applying demographic variables out of ill-considered habit. Kudos to you for that!

Total Survey Error: nanny to some, wise elder for some, strange parental friend for others

Total Survey Error and I are long-time acquaintences, just getting to know each other better. Looking at TSE is, for me, like looking at my work in survey research through a distorted mirror to an alternate universe. This week, I’ve spent some time closely reading Groves’ Past, Present and Future of Total Survey Error, and it provided some historical context to the framework, as well as an experienced account of its strengths and weaknesses.

Errors are an important area of study across many fields. Historically, models about error assumed that people didn’t really make errors often. Those attitudes are alive and well in many fields and workplaces today. Instead of carefully considering errors, they are often dismissed as indicators of incompetence. However, some workplaces are changing the way they approach errors. I did some collaborative research on medical errors in 2012 and was introduced to the term HRO or High-Reliability Organization. This is an error focused model of management that assumes that errors will be made, and not all errors can be anticipated. Therefore, every error should be embraced as a learning opportunity to build a better organizational framework.

From time to time, various members of our working group have been driven to create checklists for particular aspects of our work. In my experience, the checklists are very helpful for work that we do infrequently and virtually useless for work that we do daily. Writing a checklist for your daily work is a bit like writing instructions on how you brush your teeth and expecting to keep those instructions updated whenever you make a change of sorts. Undoubtedly, you’ll reread the instructions and wonder when you switched from a vertical to a circular motion for a given tooth. And yet there are so many important elements to our work, and so many areas where people could make less than ideal decisions (small or large). From this need rose Deming, with the first survey quality checklist. After Deming, a few other models arose. Eventually, TSE became the cumulative working framework or foundational framework for the field of survey research.

In my last blog, I spoke about the strangeness of coming across a foundational framework after working in the field without one. The framework is a conceptually important one, separating out sources of errors in ways that make shortcomings and strengths apparent and clarifying what is more or less known about a project.

But in practice, this model has not become the applied working model that its founders and biggest proponents expected it to be. This is for two reasons (that I’ll focus on), one of which Groves mentioned in some detail in this paper and one of which he barely touched on (but likely drove him out of the field).

1. The framework has mathematical properties, and this has led to its more intensive use on aspects of the survey process that are traditionally quantitative. TSE research in areas of sampling, coverage, response and aspects of analysis is quite common, but TSE research in other areas is much less common. In fact, many of the less quantifiable parts of the survey process are almost dismissed in favor of the more quantifiable parts. A survey with a particularly low TSE value could have huge underlying problems or be of minimal use once complete.
2. The framework doesn’t explicitly consider the human factors that govern research behind the scenes. Groves mentioned that the end users of the data are not deeply considered in the model, but neither are the other financial and personal (and personafinancial) constraints that govern much decision making. Ideally, the end goal of research is high quality research that yields a useful and relevant response for as minimal cost as possible. In practice, however, the goal is both to keep costs low and to satisfy a system of interrelated (and often conflicting) personal or professional (personaprofessional?) interests. If the most influential of these interests are not particularly interested in (or appreciative of) the model, practitioners are highly unlikely to take the time to apply it.

Survey research requires very close attention to detail in order to minimize errors. It requires an intimate working knowledge of math and of computer programming. It also benefits from a knowledge of human behavior and the research environment. If I were to recommend any changes to the TSE model, I would recommend a bit more task based detail, to incorporate more of the highly valued working knowledge that is often inherent and unspoken in the training of new researchers. I would also recommend a more of an HRO orientation toward error, anticipating and embracing unexpected errors as a source of additions to the model. And I would recommend some deeper incorporation of the personal and financial constraints and the roles they play (clearly an easier change to introduce than to flesh out in any great detail!). I would recommend a shift of focus, away from the quantitative modeling aspects and to the overall applicability and importance of a detailed, applied working model.

I’ve suggested before that survey research does not have a strong enough public face for the general public to understand or deeply value our work. A model that is better embraced by the field could for the basis for a public face, but the model would have to appeal to practitioners on a practical level. The question is: how do you get members of a well established field who have long been working within it and gaining expertise to accept a framework that grew into a foundational piece independent of their work?