November 7, 2012

What do all of these polling strategies add up to?

Yesterday was a big first for research methodologists across many disciplines. For some of the newer methods, it was the first election that they could be applied to in real time. For some of the older methods, this election was the first to bring competing methodologies, and not just methodological critiques.

Real time sentiment analysis from sites like this summarized Twitter’s take on the election. This paper sought to predict electoral turnout using google searches. InsideFacebook attempted to use Facebook data to track voting. And those are just a few of a rapid proliferation of data sources, analytic strategies and visualizations.

One could ask, who are the winners? Some (including me) were quick to declare a victory for the well honed craft of traditional pollsters, who showed that they were able to repeat their studies with little noise, and that their results were predictive of a wider real world phenomena. Some could call a victory for the emerging field of Data Science. Obama’s Chief Data Scientist is already beginning to be recognized. Comparisons of analytic strategies will spring up all over the place in the coming weeks. The election provided a rare opportunity where so many strategies and so many people were working in one topical area. The comparisons will tell us a lot about where we are in the data horse race.

In fact, most of these methods were successful predictors in spite of their complicated underpinnings. The google searches took into account searches for variations of “vote,” which worked as a kind of reliable predictor but belied the complicated web of naturalistic search terms (which I alluded to in an earlier post about the natural development of hashtags, as explained by Rami Khater of Al Jezeera’s The Stream, a social network generated newscast). I was a real-world example of this methodological complication. Before I went to vote, I googled “sample ballot.” Similar intent, but I wouldn’t have been caught in the analyst’s net.

If you look deeper at the Sentiment Analysis tools that allow you to view the specific tweets that comprise their categorizations, you will quickly see that, although the overall trends were in fact predictive of the election results, the data coding was messy, because language is messy.

And the victorious predictive ability of traditional polling methods belies the complicated nature of interviewing as a data collection technique. Survey methodologists work hard to standardize research interviews in order to maximize the reliability of the interviews. Sometimes these interviews are standardized to the point of recording. Sometimes the interviews are so scripted that interviewers are not allowed to clarify questions, only to repeat them. Critiques of this kind of standardization are common in survey methodology, most notably from Nora Cate Schaeffer, who has raised many important considerations within the survey methodology community while still strongly supporting the importance of interviewing as a methodological tool. My reading assignment for my ethnography class this week is a chapter by Charles Briggs from 1986 (Briggs – Learning how to ask) that proves that many of the new methodological critiques are in fact old methodological critiques. But the critiques are rarely heeded, because they are difficult to apply.

I am currently working on a project that demonstrates some of the problems with standardizing interviews. I am revising a script we used to call a representative sample of U.S. high schools. The script was last used four years ago in a highly successful effort that led to an admirable 98% response rate. But to my surprise, when I went to pull up the old script I found instead a system of scripts. What was an online and phone survey had spawned fax and e-mail versions. What was intended to be a survey of principals now had a set of potential respondents from the schools, each with their own strengths and weaknesses. Answers to common questions from school staff were loosely scripted on an addendum to the original script. A set of tips for phonecallers included points such as “make sure to catch the name of the person who transfers you, so that you can specifically say that Ms X from the office suggested I talk to you” and “If you get transferred to the teacher, make sure you are not talking to the whole class over the loudspeaker.”

Heidi Hamilton, chair of the Georgetown Linguistics department, often refers to conversation as “climbing a tree that climbs back.” In fact, we often talk about meaning as mutually constituted between all of the participants in a conversation. The conversation itself cannot be taken outside of the context in which it lives. The many documents I found from the phonecallers show just how relevant these observations can be in an applied research environment.

The big question that arises from all of this is one of a practical strategy. In particular, I had to figure out how to best address the interview campaign that we had actually run when preparing to rerun the campaign we had intended to run. My solution was to integrate the feedback from the phonecallers and loosen up the script. But I suspect that this tactic will work differently with different phonecallers. I’ve certainly worked with a variety of phonecallers, from those that preferred a script to those that preferred to talk off the cuff. Which makes the best phonecaller? Neither. Both. The ideal phonecaller works with the situation that is presented to them nimbly and professionally while collecting complete and relevant data from the most reliable source. As much of the time as possible.

At this point, I’ve come pretty far afield of my original point, which is that all of these competing predictive strategies have complicated underpinnings.

And what of that?

I believe that the best research is conscious of its strengths and weaknesses and not afraid to work with other strategies in order to generate the most comprehensive picture. As we see comparisons and horse races develop between analytic strategies, I think the best analyses we’ll see will be the ones that fit the results of each of the strategies together, simultaneously developing a fuller breakdown of the election and a fuller picture of our new research environment.

Posted in big data, commentary, data, ethnography of communication, event, outside story, research, research methodology, social media, sociolinguistics, survey methodology
Tagged commentary, data, data analysis, design, discourse analysis, facebook, linguistics, linguistics in action, Natural Language Processing, qualitative, quantitative, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
1 Comment

November 2, 2012

Education from the Bottom Up?

Last night I attended a talk by Shirley Bryce Heath about her new book, Words at Work and Play, moderated by Anne Harper Charity Hudley and Frederick Erickson. Dr Bryce Heath has been following a group of 300 families for 30 years, and in her talk she addressed many of the changes she’d seen in the kids in the time she’d been observing them. She made one particularly interesting point. She mentioned that the world of assessment, and, in fact much of the adult world hasn’t kept up with the kids’ evolution. The assessments that we subject kids to are traditional, reflecting traditional values and sources. She went as far as to say that we don’t know how to see, appreciate or notice these changes, and she pointed out that much of new styles of learning came outside of the school environment.

This part of her talk reminded me of an excellent blog post I read yesterday about unschooling. Unschooling is the process of learning outside of a structured environment. It goes further than homeschooling, which can involve structured curricula. It is curricularly agnostic and focused on the learning styles, interests, and natural motivation of the students. I mentioned the blog post to Terrence Wiley, president of the Center for Applied Linguistics, and he emphasized the underlying idealism of unschooling. It rests on the basic belief that everyone is naturally academically motivated and interested and will naturally embrace learning, in their own way, given the freedom to do it. Unschooling is, as some would say my “spirit animal.” I don’t have the time or the resources to do it with my own kids, and I’m not sure I would even if I were fully able to do it. I have no idea how it could be instituted in any kind of egalitarian or larger scale way. But I still love the idea, in all it’s unpracticality. (Dr Wiley gave me a few reading assignments, explaining that ‘everything old in education is new again’)

Then today I read a blog about the potential of using Wikipedia as a textbook. This idea is very striking, not just because Wikipedia was mostly accurate, freely available, covered the vast majority of the material in this professor’s traditional textbooks, and has an app that will help anyone interested create a custom textbook, but because it actually addresses what kids do anyway! Just this past weekend, my daughter was writing a book report, and I kept complaining that she chose to use Wikipedia to look up the spelling of a character’s name rather than walk upstairs and grab the book. Kids use Wikipedia often and for all kinds of things, and it is often more common for parents and educators to forbid or dismiss this practice than to jump right in with them. I suggest that the blogger not only use Wikipedia, but use the text as a way to show what is or is not accurate, how to tell, and where to find other credible, collaborative sources when it doubt. What an amazing opportunity!

So here’s the question that all of this has been leading to: Given that the world around is is rapidly changing and that our kids are more adept at staying abreast of these changes than they are, could it be time to turn the old expert-novice/ teacher-student paradigm on its head, at least in part? Maybe we need to find ways to let some knowledge come from the bottom up. Maybe we need to let them be the experts. Maybe we need to, at least in part, rethink our role in the educating process?

Frederick Erickson made an excellent point about teaching “You have to learn your students in order to teach them.” He talked about spending the first few days in a class gathering the expertise of the students, and using that knowledge when creating assignments or assigning groups. (I believe Dr Hudley mentioned that she did this, too. Or maybe he supplied the quote, and she supplied the example?)

All of this makes me wonder what the potential is for respecting the knowledge and expertise of the students, and working from there. What does bottom-up or student-led education look like? How can it be integrated into the learning process in order to make it more responsive, adaptive and modern?

Of course, this is as much a dream for a wider society as unschooling is for my own family. To a large extent, practicality shoots it all in the foot with the starting gun. But a girl can dream, no?

Posted in commentary, digital parenting, ethnography of communication, event, social media
Tagged commentary, cool stuff
4 Comments

October 31, 2012

“Not everything that can be counted counts”

“Not everything that counts can be counted, and not everything that can be counted counts” – sign in Einstein’s Princeton office

This quote is from one of my favorite survey reminder postcards of all time, along with an image from from the Emilio Segre visual archives. The postcard layout was an easy and pleasant decision made in association with a straightforward survey we have conducted for nearly a quarter century. …If only social media analysis could be so easy, pleasant or straightforward!

I am in the process of conducting an ethnography of DC taxi drivers. I was motivated to do this study because of the persistent disconnect between the experiences and reports of the taxi drivers and riders I hear from regularly and the snarky (I know this term does not seem technical, but it is absolutely data motivated!) riders who dominate participatory media sources online. My goal at this point of the project is to chase down the disconnect in media participation and see how it maps to policy deliberations and offline experiences. This week I decided to explore ways of quantifying the disconnect.

Inspired by this article in jedem (the eJournal of eDemocracy and Open Government), I decided to start my search using framework based in Social Network Analysis (SNA), in order to use elements of connectedness, authority and relevance as a base. Fortunately, SNA frameworks are widely available to analysts on a budget in the form of web search engines! I went through the first 22 search results for a particular area of interest to my study: the mandatory GPS policy. Of these 22 sites, only 11 had active web 2.0 components. Across all of these sites, there were just two comments from drivers. Three of the sites that didn’t have any comments from drivers did have one post each that sympathized with or defended DC taxi drivers. The remaining three sites had no responses from taxi drivers and no sympathetic responses in defense of the drivers. Barring a couple of comments that were difficult to divine, the rest of the comments were negative comments about DC taxi drivers or the DC taxi industry. This matched my expectations, and, predictably, didn’t match any of my interviews or offline investigations.

The question at this point is one of denominator.

The easiest denominator to use, and, in fact, the least complicated was the number of sites. Using this denominator, only one quarter of the sites had any representation from a DC taxi driver. This is significant, given that the discussions were about aspects of their livelihood, and the drivers will be the most closely affected by the regulatory changes. This is a good, solid statistic from which to investigate the influence of web 2.0 on local policy enactment. However, it doesn’t begin to show the lack of representation the way that a denominator such as number of posts, number of posters, or number of opinions would have. But each one of these alternative denominators has its own set of headaches. Does it matter if one poster expresses an opinion once and another expresses another, slightly different opinion more than once? If everyone agrees, what should the denominator be? What about responses that contain links that are now defunct or insider references that aren’t meaningful to me? Should I consider measures of social capital, endorsements, social connectedness, or the backgrounds of individual posters?

The simplest figure also doesn’t show one of the most striking aspects of this finding; the relative markedness of these posts. In the context of predominantly short, snarky and clever responses, one of the comments began with a formal “Dear DC city councilmembers and intelligent taxpayers,” and the other spread over three dense, winding posts in large paragraph form.

This brings up an important aspect of social media; that of social action. If every comment is a social action with social intentions, what are the intentions of the posters and how can these be identified? I don’t believe that the majority of posts left were intended as a voice in local politics, but the comments from the drivers clearly were. The majority of posts represent attempts to warrant social capital using humor, not attempts to have a voice in local politics. And they repeated phrases that are often repeated in web 2.0 discussions about the DC taxi situation, but rarely repeated elsewhere. This observation, of course, is pretty meaningless without being anchored to the data itself, both quantitatively and qualitatively. And it makes for some interesting ‘next steps’ in a project that is certainly not short of ‘next steps.’

The main point I want to make here is about the nature of variables in social media research. Compared to a survey, where you ask a question, determined in advance, and have a set of answers to work with in your analysis, you are free to choose your own variables for your analysis. Each choice brings with it a set of constraints and advantages, and some fit your data better than others. But the path to analysis can be a more difficult path to take, and more justification about the choices you make is important. To augment this, a quantitative analysis, which can sometimes have very arbitrary or less clear choices included in it, is best supplemented with a qualitative analysis that delves into the answers themselves and why they fit the coding structure you have imposed.

In all of this, I have quite a bit of work out ahead of me.

Posted in commentary, data, ethnography of communication, research, research methodology, social media, sociolinguistics
Tagged commentary, data, data analysis, discourse analysis, intercultural, linguistics, linguistics in action, Natural Language Processing, qualitative, quantitative, register, research methodology, sentiment analysis, social media analysis, text analysis, text analytics
1 Comment

October 25, 2012

I think I’m using “big data” incorrectly

I think I’m using the term “big data” incorrectly. When I talk about big data, I’m referring to the massive amount of freely available information that researchers can collect from the internet. My expectation is that the researchers must choose which firehose best fits their research goals, collect and store the data, and groom it to the point of usability before using it to answer targeted questions or examining it for answers in need of a question.

The first element of this that makes it “big data” to me, is that the data is freely available and not subject to any privacy violations. It can be difficult to collect and store, because of its sheer size, but it is not password protected. For this reason, I would not consider Facebook to be a source for “big data.” I believe that the overwhelming majority of Facebook users impose some privacy controls, and the resulting, freely available information cannot be assigned any kind of validity. There are plenty of measures of inclusion for online research, and ignorance about privacy rules or shear exhibitionism are not a target qualities by any of these standards.

The second crucial element to my definition of “big data” is structure. My expectation is that it is in any researchers interest to understand the genesis and structure of their data as much as possible, both for the sake of grooming, and for the sake of assigning some sense of validity to their findings. Targeted information will be layed out and signaled very differently in different online environments, and the researcher must work to develop both working delimiters to find probable working targets and a sense of context for the data.

The third crucial element is representativeness. What do these findings represent? Under what conditions? “Big data” has a wide array of answers to these questions. First, it is crucial to note that it is not representative of the general population. It represents only the networked members of a population who were actively engaging with an online interface within the captured window of time in a way that left a trace or produced data. Because of this, we look at individual people by their networks, and not by their representativeness. Who did they influence, and to what degree could they influence those people? And we look at other units of analysis, such as the website that the people were contributing on, the connectedness of that website, and the words themselves, and their degree of influence, both directly an indirectly.

Given those elements of understanding, we are able to provide a framework from which the analysis of the data itself is meaningful and useful.

I’m aware that my definition is not the generally accepted definition. But for the time being I will continue to use it for two reasons:

1. Because I haven’t seen any other terms that better fit
2. Because I think that it is critically important that any talk about data use is tied to measures that encourage the researcher to think about the meaning and value of their data

It’s my hope that this is a continuing discussion. In the meantime, I will trudge on in idealistic ignorance.

Posted in big data, commentary, data, research, research methodology, social media
Tagged commentary, data, data analysis, facebook, Natural Language Processing, qualitative, quantitative, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
Leave a comment

October 21, 2012

Adventures in Digital Puberty

My digital enthusiasm hit a roadblock lately. My oldest daughter discovered the addictive world of social gaming. What began with her checking out an ad on TV for a gaming website soon evolved into pops of smuggled light in a dark room after bedtime. I looked into this gaming website, and I was able to read all kinds of horror stories about it. Parents told tales of bullying, of graphic talk and advances in chatrooms, and of kids receiving points for dating.

Once you consider some features of this site- the chatrooms, the constant clothes changing (into mostly skimpy outfits), the pursuit of cash and fame, and the encouragement to “date,” it’s easy to see this place as a playground for the perverted. It didn’t help that my first questions about the site were answered by my daughter with a speech about the site’s value as a teaching tool. Apparently they give quizzes, and they give you the answers if you get the questions wrong. So, for example, she learned from this site who Brad Pitt is married to. Although I am a big fan of learning tools, I’m not sure I’d characterize celebrity gossip as useful or necessary knowledge…

I know that some parents would (& do) forbid their kids from going to the site. My first reaction was to limit her time there as much as possible. But today I swallowed my prejudice and jumped in.

The truth is that if I did just dismiss this site altogether, she would still find ways to visit it. I would much rather that she not hide her activity, but instead have me to talk to about what she encounters on the site. So I told her about my experiences trying out chatrooms when I was younger and about what I’d read about this site. We talked in detail about the different features of her site. I offered to sit down with her any time she wanted to talk about things she saw. We talked about bullying, we talked about the possibility of people not being who they say they are, and we talked about making connections online. We talked about her favorite parts of the site and the parts that made her uncomfortable. She told me about the friends she made and what brought them together. And I pledged to talk to her about it again any time she wanted.

She was full of questions and of stories and examples, and I was really struck that I never would have heard any of it had I not gotten over my initial set of worries and discussed this with her. And what would that have meant? She wouldn’t have had a chance to vet her strategies for safety and bullying with me, and she wouldn’t feel comfortable sharing some of her stranger encounters. She would be left without my guidance when determining what was acceptable to her.

From time to time, we parents need a kick in the pants to remind us that raising kids isn’t about creating copies of ourselves, but about providing guidance and safety for them as they develop. She is a different person, growing among a different set of influences. And that’s okay with me.

I did, however, discuss all of this with her as we headed out to the woods to take a gadget free walk among the fall colors!

Posted in commentary, digital parenting, social media
Leave a comment

October 7, 2012

Getting to know your data

On Friday, I had the honor of participating in a microanalysis video discussion group with Fred Erickson. As he was introducing the process to the new attendees, he said something that really caught my attention. He said that videos and field notes are not data until someone decides to use them for research.

As someone with a background in survey research, the question of ‘what is data?’ was never really on my radar before graduate school. Although it’s always been good practice to know where your data comes from and what it represents in order to glean any kind of validity from your work, data was unquestioningly that which you see in a spreadsheet or delimited file, with cases going down and variables going across. If information could be formed like this, it was data. If not, it would need some manipulation. I remember discussing this with Anna Trester a couple of years ago. She found it hard to understand this limited framework, because, for her, the world was a potential data source. I’ve learned more about her perspective in the last couple of years, working with elements that I never before would have characterized as data, including pictures, websites, video footage of interactions, and now fieldwork as a participant observer.

Dr Erickson’s observation speaks to some frustration I’ve had lately, trying to understand the nature of “big data” sets. I’ve seen quite a bit of people looking for data, any data, to analyze. I could see the usefulness of this for corpus linguists, who use large bodies of textual data to study language use. A corpus linguist is able to use large bodies of text to see how we use words, which is a systematically patterned phenomena that goes much deeper than a dictionary definition could. I could also see the usefulness of large datasets in training programs to recognize genre, a really critical element in automated text analysis.

But beyond that, it is deeply important to understand the situated nature of language. People don’t produce text for the sake of producing text. Each textual element represents an intentioned social action on the part of the writer, and social goals are accomplished differently in different settings. In order for studies of textual data to produce valid conclusions with social commentary, contextual elements are extremely important.

Which leads me to ask if these agnostic datasets are being used solely as academic exercises by programmers and corpus linguists or if our hunger for data has led us to take any large body of information and declare it to be useful data from which to excise valid conclusions? Worse, are people using cookie cutter programs to investigate agnostic data sets like this without considering the wider validity?

I urge anyone looking to create insight from textual data to carefully get to know their data.

Posted in big data, commentary, data, ethnography of communication, research, research methodology, social media, sociolinguistics, survey methodology
Tagged commentary, data, data analysis, discourse analysis, linguistics, linguistics in action, MLC, Natural Language Processing, qualitative, quantitative, register, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
Leave a comment

October 1, 2012

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

I had wanted to write these notes up into paragraphs, but I think the notes will be more timely, relevant and readable if I share them as they are. This was a really great conference- very relevant and timely- based on a really great issue of Public Opinion Quarterly. As I was reminded at the DC African Festival (a great festival, lots of fun, highly recommended) on Saturday, “In order to understand the future you must embrace the past.”

DC AAPOR Annual Public Opinion Quarterly Special Issue Conference

75^th Anniversary Edition

The Past, Present and Future of Survey Methodology and Public Opinion Research

Look out for slides from the event here: http://www.dc-aapor.org/pastevents.php

Note: Of course, I took more notes in some sessions than others…

Peter Miller:

– Adaptive design- tracking changes in estimates across mailing waves and tracking response bias, is becoming standard practice at Census

– Check out Howard Schuman’s article tracking attitudes toward Christopher Columbus

Ended up doing some field research in the public library, reading children’s books

Stanley Presser:

– Findings have no meaning independent of the method with which they were collected

– Balance of substance and method make POQ unique (this was a repeated theme)

Robert Groves:

– The survey was the most important invention in Social Science in the 20^th century – quote credit?

– 3 era’s of Survey research (boundaries somewhat arbritrary)

1930-1960
- Foundation laid, practical development
1960-1990
- Founders pass on their survey endeavors to their protégés
- From face to face to phone and computer methods
- Emergence & Dominance of Dillman method
- Growth of methodological research
- Total Survey Error perspective dominates
- Big increase in federal surveys
- Expansion of survey centers & private sector organizations
- Some articles say survey method dying because of nonresponse and inflating costs. This is a perennial debate. Groves speculated that around every big election time, someone finds it in their interest to doubt the polls and assigns a jr reporter to write a piece calling the polls into question.
1990à
- Influence of other fields, such as social cognitive psychology
- Nonresponse up, costs up à volunteer panels
- Mobile phones decrease cost effectiveness of phone surveys
- Rise of internet only survey groups
- Increase in surveys
- Organizational/ business/ management skills more influential than science/ scientists
- Now: software platforms, culture clash with all sides saying “Who are these people? Why do they talk so funny? Why don’t they know what we know?”
- Future
  - Rise of organic data
  - Use of administrative data
  - Combining data sets
  - Proprietary data sets
  - Multi-mode
  - More statistical gymnastics

Mike Brick:

Society’s demand for information is Insatiable
Re: Heckathorn/ Respondent Driven samples
- Adaptive/ indirect sampling is better
- Model based methods
  - Missing data problem
  - Cost the main driver now
  - Estimation methods
  - Future
    - Rise of multi-frame surveys
    - Administrative records
    - Sampling theory w/nonsampling errors at design & data collection stages
      - Sample allocation
      - Responsive & adaptive design
      - Undercoverage bias can’t be fixed at the back end
        
        *Biggest problem we face*
        
        Worse than nonresponse
        
        Doug Rivers (2007)
        
        Math sampling
        
        Web & volunteer samples
        
        1^st shot at a theory of nonprobability sampling
        
        Quota sampling failed in 2 high profile examples
        
        Problem: sample from interviews/ biased
        
        But that’s FIXABLE
        
        Observational
        
        Case control & eval studies
        
        Focus on single treatment effect
        
        “tougher to measure everything than to measure one thing”

Mick Couper:

– Mode an outdated concept

Too much variety and complexity
Modes are multidimensional
- Degree of interviewer involvement
- Degree of contact
- Channels of communication
- Level of privacy
- Technology (used by whom?)
- Synchronous vs. asynchronous
More important to look at dimensions other than mode
Mode is an attribute of a respondent or item
Basic assumption of mixed mode is that there is no difference in responses by mode, but this is NOT true
- We know of many documented, nonignorable, nonexplainable mode differences
- Not “the emperor has no clothes” but “the emperor is wearing suggestive clothes”
- Dilemma: differences not Well understood
  - Sometimes theory comes after facts
  - That’s where we are now- waiting for the theory to catch up (like where we are on nonprobability sampling)

– So, the case for mixed mode collection so far is mixed

Mail w/web option has been shown to have a lower response rate than mail only across 24-26 studies, at least!!
- (including Dillman, JPSM, …)
- Why? What can we do to fix this?
- Sequential modes?
  - Evidence is really mixed
  - The impetus for this is more cost than response rate
  - No evidence that it brings in a better mix of people

– What about Organic data?

Cheap, easily available
But good?
Disadvantages:
- One var at a time
- No covariates
- Stability of estimates over time?
- Potential for mischief
  - E.g. open or call-in polls
  - My e.g. #muslimrage
Organic data wide, thin
Survey data narrow, deep

– Face to face

Benchmark, gold standard, increasingly rare

– Interviewers

Especially helpful in some cases
- Nonobservation
- Explaining, clarifying

– Future

Technical changes will drive dev’t
Modes and combinations of modes will proliferate
Selection bias The Biggest Threat
Further proliferation of surveys
- Difficult for us to distinguish our work from “any idiot out there doing them”

– Surveys are tools for democracy

Shouldn’t be restricted to tools for the elite
BUT
There have to be some minimum standards

– “Surveys are tools and methodologists are the toolmakers”

Nora Cate Schaeffer:

– Jen Dykema read & summarized 78 design papers- her summary is available in the appendix of the paper

– Dynamic interactive displays for respondent in order to help collect complex data

– Making decisions when writing questions

See flow chart in paper
- Some decisions are nested
Question characteristics
- E.g. presence or absence of a feature
  - E.g. response choices

Sunshine Hillygus:

– Political polling is “a bit of a bar trick”

The best value in polls is in understanding why the election went the way it did

– Final note: “The things we know as a field are going to be important going forward, even if it’s not in the way they’ve been used in the past”

Lori Young and Diana Mutz:

– Biggest issues:

Diversity
Selective exposure
Interpersonal communication

– 2 kinds of search, influence of each

Collaborative filter matching, like Amazon
- Political targeting
- Contentious issue: 80% of people said that if they knew a politician was targeting them they wouldn’t vote for that candidate
  - My note: interesting to think about peoples relationships with their superficial categories of identity- it’s taken for granted so much in social science research, yet not by the people within the categories

– Search engines: the new gatekeepers

Page rank & other algorithms
No one knows what influence personalization of search results will have
Study on search learning: gave systematically different input to train engines are (given same start point), results changes Fast and Substantively

Rob Santos:

– Necessity mother of invention

Economic pressure
Reduce costs
Entrepreneurial spirit
Profit
Societal changes
- Demographic diversification
  - Globalization
  - Multi-lingual
  - Multi-cultural
  - Privacy concerns
  - Declining participation

– Bottom line: we adapt. Our industry Always Evolves

– We’re “in the midst of a renaissance, reinventing ourselves”

Me: That’s framing for you! Wow!

– On the rise:

Big Data
Synthetic Data
- Transportation industry
- Census
- Simulation studies
  - E.g. How many people would pay x amount of income tax under y policy?
Bayesian Methods
- Apply to probability and nonprobability samples
New generation
- Accustomed to and EXPECT rapid technological turnover
- Fully enmeshed in social media

– 3 big changes:

Non-probability sampling
- “Train already left the station”
- Level of sophistication varies
- Model based inference
- Wide public acceptance
- Already a proliferation
Communication technology
- Passive data collection
  - Behaviors
    - E.g. pos (point of service) apps
    - Attitudes or opinions
  - Real time collection
    - Prompted recall (apps)
    - Burden reduction
      - Gamification
Big Data
- What is it?
- Data too big to store
  - (me: “think “firehoses”)
  - Volume, velocity, variety
  - Fuzzy inferences
  - Not necessarily statistical
  - Coursenes insights

– We need to ask tough questions

(theme of next AAPOR conference is just that)
We need to question probability samples, too
- Flawed designs abound
- High nonresponse & noncoverage
- Can’t just scrutinize nonprobability samples
Nonprobability designs
- Some good, well accepted methods
- Diagnostics for measurement
  - How to measure validity?
  - What are the clues?
  - How to create a research agenda to establish validity?
Expanding the players
- Multidisciplinary
  - Substantive scientists
  - Math stats
  - Modelers
  - Econometricians
We need
- Conversations with practitioners
- Better listening skills

– AAPOR’s role

Create forum for conversation
Encourage transparency
Engage in outreach
Understanding limitations but learning approaches

– We need to explore the utility of nonprobability samples

– Insight doesn’t have to be purely from statistical inferences

– The biggest players in big data to date include:

Computational scientists
Modelers/ synthetic data’ers

– We are not a “one size fits all” society, and our research tools should reflect that

My big questions:

– “What are the borders of our field?”

– “What makes us who we are, if we don’t do surveys even primarily?”

Linguistic notes:

– Use of we/who/us

– Metaphors: “harvest” “firehose”

– Use of specialized vocabulary

– Use of the word “comfortable”

– Interview as a service encounter?

Other notes:

– This reminds me of Colm O’Muircheartaigh- from that old JPSM distinguished lecture

Embracing diversity
Allowing noise
Encouraging mixed methods

I wish his voice was a part of this discussion…

Posted in commentary, data, event, research, research methodology, skills, survey methodology
Tagged AAPOR, data, data analysis, design, quantitative, research methodology
7 Comments

September 28, 2012

A brave new vision of the future of social science

I’ve been typing and organizing my notes from yesterday’s dc-aapor event on the past, present and future of survey research (which I still plan to share soon, after a little grooming). The process has been a meditative one.

I’ve been thinking about how I would characterize these same phases- the past, present and future… and then I had a vision of sorts on the way home today that I’d like to share. I’m going to take a minute to be a little post apocalyptic and let the future build itself. You can think of it as a daydream or thought experiment…

The past, I would characterize as the grand discovery of surveys as a tool for data collection; the honing and evolution of that tool in conjunction with its meticulous scientific development and the changing landscape around it; and the growth to dominance and proliferation of the method. The past was an era of measurement, of the total survey error model, of social Science.

The present I would characterize as a rapid coming together, or a perfect storm that is swirling data and ideas and disciplines of study and professions together in a grand sweeping wind. I see the survey folks trudging through the wind, waiting for the storm to pass, feet firmly anchored to solid ground.

The future is essentially the past, turned on its head. The pieces of the past are present, but mixed together and redistributed. Instead of examining the ways in which questions elicit usable data, we look at the data first and develop the questions from patterns in the data. In this era, data is everywhere, of various quality, character and genesis, and the skill is in the sense making.

This future is one of data driven analytic strategies, where research teams intrinsically need to be composed of a spectrum of different, specialized skills.

The kings of this future will be the experts in natural language processing, those with the skill of finding and using patterns in language. All language is patterned. Our job will be to find those patterns and then to discover their social meaning.

The computer scientists and coders will write the code to extract relevant subsets of data, and describe and learn patterns in the data. The natural language processing folks will hone the patterns by grammar and usage. The netnographers will describe and interpret the patterns, the data visualizers will make visual or interactive sense of the patterns, the sociologists will discover constructions of relative social groupings as they emerge and use those patterns. The discourse analysts will look across wider patterns of language and context dependency. The statisticians will make formulas to replicate, describe and evaluate the patterns, and models to predict future behaviors. Data science will be a crucial science built on the foundations of traditional and nontraditional academic disciplines.

How many people does it take to screw in this lightbulb? It depends on the skills of the people or person on the ladder.

Where do surveys fit in to this scheme? To be honest, I’m not sure. The success of surveys seems to rest in part on the failure of faster, cheaper methods with a great deal more inherent error.

This is not the only vision possible, but it’s a vision I saw while commuting home at the end of a damned long week… it’s a vision where naturalistic data is valued and experimentation is an extension of research, where diversity is a natural assumption of the model and not a superimposed dynamic, where the data itself and the patterns within it determine what is possible from it. It’s a vision where traditional academics fit only precariously; a future that could just as easily be ruled out by the constraints of the past as it could be adopted unintentionally, where meaning makers rush to be the rigs in the newest gold rush and theory is as desperately pursued as water sources in a drought.

Posted in commentary, data, ethnography of communication, event, research, research methodology, skills, social media, sociolinguistics, survey methodology
2 Comments

September 16, 2012

The Bones of Solid Research?

What are the elements that make research “research” and not just “observation?” Where are the bones of the beast, and do all strategies share the same skeleton?

Last Thursday, in my Ethnography of Communication class, we spent the first half hour of class time taking field notes in the library coffee shop. Two parts of the experience struck me the hardest.

1.) I was exhausted. Class came at the end of a long, full work day, toward the end of a week that was full of back to school nights, work, homework and board meetings. I began my observation by ordering a (badly needed) coffee. My goal as I ordered was to see how few words I had to utter in order to complete the transaction. (In my defense, I am usually relatively talkative and friendly…) The experience of observing and speaking as little as possible reminded me of one of the coolest things I’d come across in my degree study: Charlotte Linde, SocioRocketScientist at NASA

2.) Charlotte Linde, SocioRocketScientist at NASA. Dr Linde had come to speak with the GU Linguistics department early in my tenure as a grad student. She mentioned that her thesis had been about the geography of communication- specifically: How did the layout of an (her?) apartment building help shape communication within it?

This idea had struck me, and stayed with me, but it didn’t really make sense until I began to study Ethnography of Communication. In the coffee shop, I structured my fieldnotes like a map and investigated it in terms of zones of activities. Then I investigated expectations and conventions of communication in each zone. As a follow-up to this activity, I’ll either return to the same shop or head to another coffee shop to do some contrastive mapping.

The process of Ethnography embodies the dynamic between quantitative and qualitative methods for me. When I read ethnographic research, I really find myself obsessing over ‘what makes this research?’ and ‘how is each statement justified?’ Survey methodology, which I am still doing every day at work, is so deeply structured that less structured research is, by contrast, a bit bewildering or shocking. Reading about qualitative methodology makes it seem so much more dependable and structured than reading ethnographic research papers does.

Much of the process of learning ethnography is learning yourself; your priorities, your organization, … learning why you notice what you do and evaluate it the way you do… Conversely, much of the process of reading ethnographic research seems to involve evaluation or skepticism of the researcher, the researcher’s perspective and the researcher’s interpretation. As a reader, the places where the researcher’s perspective varies from mine is clear and easy to see, as much as my own perspective is invisible to me.

All of this leads me back to the big questions I’m grappling with. Is this structured observational method the basis for all research? And how much structure does observation need to have in order to qualify as research?

I’d be interested to hear what you think of these issues!

Posted in commentary, data, ethnography of communication, research, research methodology, sociolinguistics, Strategic Communications, survey methodology
Tagged commentary, data, data analysis, discourse analysis, intercultural, linguistics, linguistics in action, MLC, qualitative, quantitative, research methodology
Leave a comment

August 9, 2012

Could our attitude toward marketing determine our field’s future?

In our office, we call it the “cocktail party question:” What do you do for a living? For those of us who work in the area of survey research, this can be a particularly difficult question to answer. Not only do people rarely know much about our work, but they rarely have a great deal of interest in it. I like to think of myself as a survey methodologist, but it is easier in social situations to discuss the focus of my research than my passion for methodology. I work at the American Institute of Physics, so I describe my work as “studying people who study physics.” Usually this description is greeted with an uncomfortable laugh, and the conversation progresses elsewhere. Score!

But the wider lack of understanding of survey research can have larger implications than simply awkward social situations. It can also cause tension with clients who don’t understand our work, our process, or where and how we add expertise to the process. Toward this end, I once wrote a guide for working with clients that separated out each stage in the survey process and detailed what expertise the researcher brings to the stage and what expertise we need from the client. I hoped that it would be a way of both separating and affirming the roles of client and researcher and advertising our firm and our field. I have not ye had the opportunity to use this piece, because of the nature of my current projects, but I’d be happy to share it with anyone who is interested in using or adapting it.

I think about that piece often as I see more talk about big data and social media analysis. Data seems to be everywhere and free, and I wonder what affect this buzz will have on a body of research consumers who might not have respected the role of the researchers from the get-go. We worried when Survey Monkey and other automated survey tools came along, but the current bevvy of tools and attitudes could have an exponentially larger impact on our practice.

Survey researchers often thumb their nose at advertising, despite the heavy methodological overlap. Oftentimes there is a knee-jerk reaction against marketing speak. Not only do survey methodologists often thumb their/our noses at the goal and importance of advertising, but they/we often thumb their/our nose at what appears to be evidence of less rigorous methodology. This has led us to a ridiculous point where data and analyses have evolved quickly with the demand and heavy use of advertising and market researchers and evolved strikingly little in more traditional survey areas, like polling and educational research. Much of the rhetoric about social media analysis, text analysis, social network analysis and big data is directed at the marketing and advertising crowd. Translating it to a wider research context and communicating it to a field that is often not eager to adapt to it can be difficult. And yet the exchange of ideas between the sister fields has never been more crucial to our mutual survival and relevance.

One of the goals of this blog has been to approach the changing landscape of research from a methodologically sound, interdisciplinary perspective that doesn’t suffer from the artificial walls and divisions. As I’ve worked on the blog, my own research methodology has evolved considerably. I’m relying more heavily on mixed methods and trying to use and integrate different tools into my work. I’ve learned quite a bit from researchers with a wide variety of backgrounds, and I often feel like I’m belted into a car with the windows down, hurtling down the highways of progress at top speed and trying to control the airflow. And then I often glimpse other survey researchers out the window, driving slowly, sensibly along the access road alongside the highway. I wonder if my mentors feel the change of landscape as viscerally as I do. I wonder how to carry forward the anchors and quality controls that led to such high quality research in the survey realm. I wonder about the future. And the present. About who’s driving, and who in what car is talking to who? Using what gps?

Mostly I wonder: could our negative attitude toward advertising and market research drive us right into obscurity? Are we too quick to misjudge the magnitude of the changes afoot?

This post is meant to be provocative, and I hope it inspires some good conversation.

Posted in commentary, data, research, research methodology, social media, Strategic Communications, survey methodology
Tagged commentary, data, data analysis, Natural Language Processing, quantitative, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
Leave a comment

Free Range Research

An aspiring postdisciplinarian surfs through the ebbs and flows of the changing research environment

Category Archives: commentary

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

The Bones of Solid Research?