October 31, 2012

“Not everything that can be counted counts”

“Not everything that counts can be counted, and not everything that can be counted counts” – sign in Einstein’s Princeton office

This quote is from one of my favorite survey reminder postcards of all time, along with an image from from the Emilio Segre visual archives. The postcard layout was an easy and pleasant decision made in association with a straightforward survey we have conducted for nearly a quarter century. …If only social media analysis could be so easy, pleasant or straightforward!

I am in the process of conducting an ethnography of DC taxi drivers. I was motivated to do this study because of the persistent disconnect between the experiences and reports of the taxi drivers and riders I hear from regularly and the snarky (I know this term does not seem technical, but it is absolutely data motivated!) riders who dominate participatory media sources online. My goal at this point of the project is to chase down the disconnect in media participation and see how it maps to policy deliberations and offline experiences. This week I decided to explore ways of quantifying the disconnect.

Inspired by this article in jedem (the eJournal of eDemocracy and Open Government), I decided to start my search using framework based in Social Network Analysis (SNA), in order to use elements of connectedness, authority and relevance as a base. Fortunately, SNA frameworks are widely available to analysts on a budget in the form of web search engines! I went through the first 22 search results for a particular area of interest to my study: the mandatory GPS policy. Of these 22 sites, only 11 had active web 2.0 components. Across all of these sites, there were just two comments from drivers. Three of the sites that didn’t have any comments from drivers did have one post each that sympathized with or defended DC taxi drivers. The remaining three sites had no responses from taxi drivers and no sympathetic responses in defense of the drivers. Barring a couple of comments that were difficult to divine, the rest of the comments were negative comments about DC taxi drivers or the DC taxi industry. This matched my expectations, and, predictably, didn’t match any of my interviews or offline investigations.

The question at this point is one of denominator.

The easiest denominator to use, and, in fact, the least complicated was the number of sites. Using this denominator, only one quarter of the sites had any representation from a DC taxi driver. This is significant, given that the discussions were about aspects of their livelihood, and the drivers will be the most closely affected by the regulatory changes. This is a good, solid statistic from which to investigate the influence of web 2.0 on local policy enactment. However, it doesn’t begin to show the lack of representation the way that a denominator such as number of posts, number of posters, or number of opinions would have. But each one of these alternative denominators has its own set of headaches. Does it matter if one poster expresses an opinion once and another expresses another, slightly different opinion more than once? If everyone agrees, what should the denominator be? What about responses that contain links that are now defunct or insider references that aren’t meaningful to me? Should I consider measures of social capital, endorsements, social connectedness, or the backgrounds of individual posters?

The simplest figure also doesn’t show one of the most striking aspects of this finding; the relative markedness of these posts. In the context of predominantly short, snarky and clever responses, one of the comments began with a formal “Dear DC city councilmembers and intelligent taxpayers,” and the other spread over three dense, winding posts in large paragraph form.

This brings up an important aspect of social media; that of social action. If every comment is a social action with social intentions, what are the intentions of the posters and how can these be identified? I don’t believe that the majority of posts left were intended as a voice in local politics, but the comments from the drivers clearly were. The majority of posts represent attempts to warrant social capital using humor, not attempts to have a voice in local politics. And they repeated phrases that are often repeated in web 2.0 discussions about the DC taxi situation, but rarely repeated elsewhere. This observation, of course, is pretty meaningless without being anchored to the data itself, both quantitatively and qualitatively. And it makes for some interesting ‘next steps’ in a project that is certainly not short of ‘next steps.’

The main point I want to make here is about the nature of variables in social media research. Compared to a survey, where you ask a question, determined in advance, and have a set of answers to work with in your analysis, you are free to choose your own variables for your analysis. Each choice brings with it a set of constraints and advantages, and some fit your data better than others. But the path to analysis can be a more difficult path to take, and more justification about the choices you make is important. To augment this, a quantitative analysis, which can sometimes have very arbitrary or less clear choices included in it, is best supplemented with a qualitative analysis that delves into the answers themselves and why they fit the coding structure you have imposed.

In all of this, I have quite a bit of work out ahead of me.

Posted in commentary, data, ethnography of communication, research, research methodology, social media, sociolinguistics
Tagged commentary, data, data analysis, discourse analysis, intercultural, linguistics, linguistics in action, Natural Language Processing, qualitative, quantitative, register, research methodology, sentiment analysis, social media analysis, text analysis, text analytics
1 Comment

October 25, 2012

I think I’m using “big data” incorrectly

I think I’m using the term “big data” incorrectly. When I talk about big data, I’m referring to the massive amount of freely available information that researchers can collect from the internet. My expectation is that the researchers must choose which firehose best fits their research goals, collect and store the data, and groom it to the point of usability before using it to answer targeted questions or examining it for answers in need of a question.

The first element of this that makes it “big data” to me, is that the data is freely available and not subject to any privacy violations. It can be difficult to collect and store, because of its sheer size, but it is not password protected. For this reason, I would not consider Facebook to be a source for “big data.” I believe that the overwhelming majority of Facebook users impose some privacy controls, and the resulting, freely available information cannot be assigned any kind of validity. There are plenty of measures of inclusion for online research, and ignorance about privacy rules or shear exhibitionism are not a target qualities by any of these standards.

The second crucial element to my definition of “big data” is structure. My expectation is that it is in any researchers interest to understand the genesis and structure of their data as much as possible, both for the sake of grooming, and for the sake of assigning some sense of validity to their findings. Targeted information will be layed out and signaled very differently in different online environments, and the researcher must work to develop both working delimiters to find probable working targets and a sense of context for the data.

The third crucial element is representativeness. What do these findings represent? Under what conditions? “Big data” has a wide array of answers to these questions. First, it is crucial to note that it is not representative of the general population. It represents only the networked members of a population who were actively engaging with an online interface within the captured window of time in a way that left a trace or produced data. Because of this, we look at individual people by their networks, and not by their representativeness. Who did they influence, and to what degree could they influence those people? And we look at other units of analysis, such as the website that the people were contributing on, the connectedness of that website, and the words themselves, and their degree of influence, both directly an indirectly.

Given those elements of understanding, we are able to provide a framework from which the analysis of the data itself is meaningful and useful.

I’m aware that my definition is not the generally accepted definition. But for the time being I will continue to use it for two reasons:

1. Because I haven’t seen any other terms that better fit
2. Because I think that it is critically important that any talk about data use is tied to measures that encourage the researcher to think about the meaning and value of their data

It’s my hope that this is a continuing discussion. In the meantime, I will trudge on in idealistic ignorance.

Posted in big data, commentary, data, research, research methodology, social media
Tagged commentary, data, data analysis, facebook, Natural Language Processing, qualitative, quantitative, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
Leave a comment

October 21, 2012

Adventures in Digital Puberty

My digital enthusiasm hit a roadblock lately. My oldest daughter discovered the addictive world of social gaming. What began with her checking out an ad on TV for a gaming website soon evolved into pops of smuggled light in a dark room after bedtime. I looked into this gaming website, and I was able to read all kinds of horror stories about it. Parents told tales of bullying, of graphic talk and advances in chatrooms, and of kids receiving points for dating.

Once you consider some features of this site- the chatrooms, the constant clothes changing (into mostly skimpy outfits), the pursuit of cash and fame, and the encouragement to “date,” it’s easy to see this place as a playground for the perverted. It didn’t help that my first questions about the site were answered by my daughter with a speech about the site’s value as a teaching tool. Apparently they give quizzes, and they give you the answers if you get the questions wrong. So, for example, she learned from this site who Brad Pitt is married to. Although I am a big fan of learning tools, I’m not sure I’d characterize celebrity gossip as useful or necessary knowledge…

I know that some parents would (& do) forbid their kids from going to the site. My first reaction was to limit her time there as much as possible. But today I swallowed my prejudice and jumped in.

The truth is that if I did just dismiss this site altogether, she would still find ways to visit it. I would much rather that she not hide her activity, but instead have me to talk to about what she encounters on the site. So I told her about my experiences trying out chatrooms when I was younger and about what I’d read about this site. We talked in detail about the different features of her site. I offered to sit down with her any time she wanted to talk about things she saw. We talked about bullying, we talked about the possibility of people not being who they say they are, and we talked about making connections online. We talked about her favorite parts of the site and the parts that made her uncomfortable. She told me about the friends she made and what brought them together. And I pledged to talk to her about it again any time she wanted.

She was full of questions and of stories and examples, and I was really struck that I never would have heard any of it had I not gotten over my initial set of worries and discussed this with her. And what would that have meant? She wouldn’t have had a chance to vet her strategies for safety and bullying with me, and she wouldn’t feel comfortable sharing some of her stranger encounters. She would be left without my guidance when determining what was acceptable to her.

From time to time, we parents need a kick in the pants to remind us that raising kids isn’t about creating copies of ourselves, but about providing guidance and safety for them as they develop. She is a different person, growing among a different set of influences. And that’s okay with me.

I did, however, discuss all of this with her as we headed out to the woods to take a gadget free walk among the fall colors!

Posted in commentary, digital parenting, social media
Leave a comment

October 13, 2012

I conducted my first diversity training today…

One of the perks of my grad program is learning how to conduct diversity training.

Today I was able to put that skill to use for the first time. I conducted a workshop for a local parents group about Talking with your Kids about Race and Diversity. I co-facilitated it with Elvira Magomedova, a recent graduate from the MLC program who has more experience and more of a focus in this area. It was a really interesting and rewarding experience.

We did 4 activities:

1. We introduced ourselves by telling our immigration stories. I saw this last week at an open house at my daughter’s middle school, and it profoundly reminded me about the personal ways in which we all embody global history and the immigrant nature of the US. Between feuding clans in Ireland, narrow escapes from the holocaust and traveling singers in Europe, this exercise is both powerful and fun. Characters and events really come alive, and everyone is left on a more equal footing.

2. For the 2nd activity, we explored the ways in which we identify ourselves. We each put a circle in the center of a sheet of paper, an then we added four bubble spokes with groups or cultures or ways in which we identify ourselves. The exercise came from Cultural Awareness Learning Module One. At the bottom of the page, we explored these relationships more deeply, e.g. “I’m a parent, but I’m not a stay at home parent” or “I’m Muslim, but I’m not practicing my religion.” We spoke in depth about our pages in pairs and then shared some with the group.

3. This is a fun activity for parents and kid alike. We split into two groups, culture A and culture B. Each culture has a list of practices, e.g. standing close or far, making eye contact or not, extensive vs minimal greetings or leavetaking, shaking or not shaking hands, … The groups learn, practice, and then mingle. This is a profoundly awkward activity!

After mingling, we get back into the group and discuss the experience. It soon becomes obvious that people take differences in “culture” personally. People complain that it seemed like their interlocuters were just trying to get away from them, or seemed overly interested in them, or…. They also complain about how hard it is to adjust your practices to act in the prescribed way.

This exercise is a good way for people to understand the ways in which conflicting cultural norms play out, and it helps parents to understand how to work out misunderstandings with their kids.

4. Finally, my daughter made a slide show of people from all over the world. The people varied in countless physical ways from each other, and we used them to stimulate conversation about physical differences. As adults, we tend to ascribe a bevvy of sociological baggage to these physical differences, but the reality is that, unless we’re Steven Colbert, there are striking physical differences between people. As parents, we are often taken aback when our kids speak openly about differences that we’ve grown accustomed to not talking about. It’s natural and normal to wonder how to handle these observations.

The upshot of this conversation is that describing anyone by a single physical category doesn’t really make sense. If you’re talking about a physical description of someone, you have a number of physical features to comment on. Whereas referring to anyone by a single physical feature could be offensive, a more detailed description is simply a more accurate physical description. We don’t have to use judgmental words, like “good hair,” but that shouldn’t stop us from talking about curly, straight, wavy, thick or thin. We can talk about people in terms of their height or body shape, face shape, hair texture, color or style, eye shape or color, mouth shape, ear size, nose style, skin tone, and so much more. Artificial racial or ethnic groupings don’t *really* describe what someone looks like, talks like, or has experienced.

More than this, once we have seen people in any kind of action, we have their actions and our relationship with them to use as resources. Given all of those resources, choosing race or ethnicity as a first descriptive level with our kids, or even using that descriptor and stopping, sends the message to the kids that that is the only feature that matters. It draws boundaries before it begins conversations. It passes “us and them” along.

Race and ethnicity are one way to describe a person, but they are far from the only way. And they, more than any other way, carry the most baggage. Does that mean they should be avoided or declared taboo?

This week in my Ethnography of Communication class, we each went to Gallaudet, the deaf university in DC, and observed. One of my classmates commented about her discomfort with her lack of fluency in ASL, or American Sign Language. Her comment reminded me of my kids and their cousins. My kids speak English, and only a little bit of Amharic and Tigrinya. Some of their cousins only spoke Tigrinya when they met. Some only spoke Swedish. Some spoke English with very different accents. But the language barriers never stopped them from playing with each other.

In fact, we talk about teaching our kids about diversity, but our kids should be the ones to teach us!

Here are the main lessons I’ve learned from my kids:

1. Don’t cut yourself off from people because you don’t share a common language. Communication actually runs much deeper than language. I think, for example, of one of my sisters inlaw. When we first met, we didn’t have a common language. But the more I was able to get to know her over time, the more we share. I really cherish my relationship with her, and I wouldn’t have it if I had let my language concerns get in the way of communicating with her.

2. People vary a lot, strikingly, in physical ways. These are worthy of comment, okay to notice, and important parts of what make people unique.

3. If you cut yourself off from discomfort or potential differences, you draw a line between you and many of the people around you.

4. It is okay to be wrong, or to still be learning. Learning is a lifelong process. Just because we’re adults doesn’t mean we have to have it all down pat. Don’t be afraid to fail, to mess up. Your fear will get you nowhere. How could you have learned anything if you were afraid of messing up?

In sum, this experience was a powerful one and an interesting one. I sincerely hope that the conversations we began will continue.

—

* Edited to Add:

Thandie Newton TED talk, Embracing Otherness

Chimamanda Adichie TED talk: The danger of a single story

GREAT letter with loads of resources: http://goodmenproject.com/ethics-values/why-i-dont-want-to-talk-about-race/

an interesting article that we read in class: why white parents don’t talk about race

another interesting article: Lippi Green 1997 Teaching Children How to Discriminate

Posted in diversity training, ethnography of communication, event, skills
Tagged diversity, intercultural
4 Comments

October 11, 2012

Repeating language: what do we repeat, and what does it signal?

Yesterday I attended a talk by Jon Kleinberg entitled “Status, Power & Incentives in Social Media” in Honor of the UMD Human-Computer Interaction Lab’s 30th Anniversary.

This talk was dense and full of methods that are unfamiliar to me. He first discussed logical representations of human relationships, including orientations of sentiment and status, and then he ventured into discursive evidence of these relationships. Finally, he introduced formulas for influence in social media and talked about ways to manipulate the formulas by incentivizing desired behavior and deincentivizing less desired behavior.

In Linguistics, we talk a lot about linguistic accommodation. In any communicative event, it is normal for participant’s speech patterns to converge in some ways. This can be through repetition of words or grammatical structures. Kleinberg presented research about the social meaning of linguistic accommodation, showing that participants with less power tend to accommodate participants with more power more than participants with more power accommodate participants with less power. This idea of quantifying social influence is a very powerful notion in online research, where social influence is a more practical and useful research goal than general representativeness.

I wonder what strategies we use, consciously and unconsciously, when we accommodate other speakers. I wonder whether different forms of repetition have different underlying social meanings.

At the end of the talk, there was some discussion about both the constitution of iconic speech (unmarked words assembled in marked ways) and the meaning of norm flouting.

These are very promising avenues for online text research, and it is exciting to see them play out.

Posted in big data, data, event, research, research methodology, social media, sociolinguistics, Strategic Communications
Tagged data, data analysis, discourse analysis, linguistics, linguistics in action, Natural Language Processing, research methodology, sentiment analysis, social media analysis, text analysis, text analytics
Leave a comment

October 7, 2012

Getting to know your data

On Friday, I had the honor of participating in a microanalysis video discussion group with Fred Erickson. As he was introducing the process to the new attendees, he said something that really caught my attention. He said that videos and field notes are not data until someone decides to use them for research.

As someone with a background in survey research, the question of ‘what is data?’ was never really on my radar before graduate school. Although it’s always been good practice to know where your data comes from and what it represents in order to glean any kind of validity from your work, data was unquestioningly that which you see in a spreadsheet or delimited file, with cases going down and variables going across. If information could be formed like this, it was data. If not, it would need some manipulation. I remember discussing this with Anna Trester a couple of years ago. She found it hard to understand this limited framework, because, for her, the world was a potential data source. I’ve learned more about her perspective in the last couple of years, working with elements that I never before would have characterized as data, including pictures, websites, video footage of interactions, and now fieldwork as a participant observer.

Dr Erickson’s observation speaks to some frustration I’ve had lately, trying to understand the nature of “big data” sets. I’ve seen quite a bit of people looking for data, any data, to analyze. I could see the usefulness of this for corpus linguists, who use large bodies of textual data to study language use. A corpus linguist is able to use large bodies of text to see how we use words, which is a systematically patterned phenomena that goes much deeper than a dictionary definition could. I could also see the usefulness of large datasets in training programs to recognize genre, a really critical element in automated text analysis.

But beyond that, it is deeply important to understand the situated nature of language. People don’t produce text for the sake of producing text. Each textual element represents an intentioned social action on the part of the writer, and social goals are accomplished differently in different settings. In order for studies of textual data to produce valid conclusions with social commentary, contextual elements are extremely important.

Which leads me to ask if these agnostic datasets are being used solely as academic exercises by programmers and corpus linguists or if our hunger for data has led us to take any large body of information and declare it to be useful data from which to excise valid conclusions? Worse, are people using cookie cutter programs to investigate agnostic data sets like this without considering the wider validity?

I urge anyone looking to create insight from textual data to carefully get to know their data.

Posted in big data, commentary, data, ethnography of communication, research, research methodology, social media, sociolinguistics, survey methodology
Tagged commentary, data, data analysis, discourse analysis, linguistics, linguistics in action, MLC, Natural Language Processing, qualitative, quantitative, register, research methodology, sentiment analysis, social media analysis, text analysis, text analytics, twitter
Leave a comment

October 1, 2012

Notes on the Past, Present and Future of Survey Methodology from #dcaapor

I had wanted to write these notes up into paragraphs, but I think the notes will be more timely, relevant and readable if I share them as they are. This was a really great conference- very relevant and timely- based on a really great issue of Public Opinion Quarterly. As I was reminded at the DC African Festival (a great festival, lots of fun, highly recommended) on Saturday, “In order to understand the future you must embrace the past.”

DC AAPOR Annual Public Opinion Quarterly Special Issue Conference

75^th Anniversary Edition

The Past, Present and Future of Survey Methodology and Public Opinion Research

Look out for slides from the event here: http://www.dc-aapor.org/pastevents.php

Note: Of course, I took more notes in some sessions than others…

Peter Miller:

– Adaptive design- tracking changes in estimates across mailing waves and tracking response bias, is becoming standard practice at Census

– Check out Howard Schuman’s article tracking attitudes toward Christopher Columbus

Ended up doing some field research in the public library, reading children’s books

Stanley Presser:

– Findings have no meaning independent of the method with which they were collected

– Balance of substance and method make POQ unique (this was a repeated theme)

Robert Groves:

– The survey was the most important invention in Social Science in the 20^th century – quote credit?

– 3 era’s of Survey research (boundaries somewhat arbritrary)

1930-1960
- Foundation laid, practical development
1960-1990
- Founders pass on their survey endeavors to their protégés
- From face to face to phone and computer methods
- Emergence & Dominance of Dillman method
- Growth of methodological research
- Total Survey Error perspective dominates
- Big increase in federal surveys
- Expansion of survey centers & private sector organizations
- Some articles say survey method dying because of nonresponse and inflating costs. This is a perennial debate. Groves speculated that around every big election time, someone finds it in their interest to doubt the polls and assigns a jr reporter to write a piece calling the polls into question.
1990à
- Influence of other fields, such as social cognitive psychology
- Nonresponse up, costs up à volunteer panels
- Mobile phones decrease cost effectiveness of phone surveys
- Rise of internet only survey groups
- Increase in surveys
- Organizational/ business/ management skills more influential than science/ scientists
- Now: software platforms, culture clash with all sides saying “Who are these people? Why do they talk so funny? Why don’t they know what we know?”
- Future
  - Rise of organic data
  - Use of administrative data
  - Combining data sets
  - Proprietary data sets
  - Multi-mode
  - More statistical gymnastics

Mike Brick:

Society’s demand for information is Insatiable
Re: Heckathorn/ Respondent Driven samples
- Adaptive/ indirect sampling is better
- Model based methods
  - Missing data problem
  - Cost the main driver now
  - Estimation methods
  - Future
    - Rise of multi-frame surveys
    - Administrative records
    - Sampling theory w/nonsampling errors at design & data collection stages
      - Sample allocation
      - Responsive & adaptive design
      - Undercoverage bias can’t be fixed at the back end
        
        *Biggest problem we face*
        
        Worse than nonresponse
        
        Doug Rivers (2007)
        
        Math sampling
        
        Web & volunteer samples
        
        1^st shot at a theory of nonprobability sampling
        
        Quota sampling failed in 2 high profile examples
        
        Problem: sample from interviews/ biased
        
        But that’s FIXABLE
        
        Observational
        
        Case control & eval studies
        
        Focus on single treatment effect
        
        “tougher to measure everything than to measure one thing”

Mick Couper:

– Mode an outdated concept

Too much variety and complexity
Modes are multidimensional
- Degree of interviewer involvement
- Degree of contact
- Channels of communication
- Level of privacy
- Technology (used by whom?)
- Synchronous vs. asynchronous
More important to look at dimensions other than mode
Mode is an attribute of a respondent or item
Basic assumption of mixed mode is that there is no difference in responses by mode, but this is NOT true
- We know of many documented, nonignorable, nonexplainable mode differences
- Not “the emperor has no clothes” but “the emperor is wearing suggestive clothes”
- Dilemma: differences not Well understood
  - Sometimes theory comes after facts
  - That’s where we are now- waiting for the theory to catch up (like where we are on nonprobability sampling)

– So, the case for mixed mode collection so far is mixed

Mail w/web option has been shown to have a lower response rate than mail only across 24-26 studies, at least!!
- (including Dillman, JPSM, …)
- Why? What can we do to fix this?
- Sequential modes?
  - Evidence is really mixed
  - The impetus for this is more cost than response rate
  - No evidence that it brings in a better mix of people

– What about Organic data?

Cheap, easily available
But good?
Disadvantages:
- One var at a time
- No covariates
- Stability of estimates over time?
- Potential for mischief
  - E.g. open or call-in polls
  - My e.g. #muslimrage
Organic data wide, thin
Survey data narrow, deep

– Face to face

Benchmark, gold standard, increasingly rare

– Interviewers

Especially helpful in some cases
- Nonobservation
- Explaining, clarifying

– Future

Technical changes will drive dev’t
Modes and combinations of modes will proliferate
Selection bias The Biggest Threat
Further proliferation of surveys
- Difficult for us to distinguish our work from “any idiot out there doing them”

– Surveys are tools for democracy

Shouldn’t be restricted to tools for the elite
BUT
There have to be some minimum standards

– “Surveys are tools and methodologists are the toolmakers”

Nora Cate Schaeffer:

– Jen Dykema read & summarized 78 design papers- her summary is available in the appendix of the paper

– Dynamic interactive displays for respondent in order to help collect complex data

– Making decisions when writing questions

See flow chart in paper
- Some decisions are nested
Question characteristics
- E.g. presence or absence of a feature
  - E.g. response choices

Sunshine Hillygus:

– Political polling is “a bit of a bar trick”

The best value in polls is in understanding why the election went the way it did

– Final note: “The things we know as a field are going to be important going forward, even if it’s not in the way they’ve been used in the past”

Lori Young and Diana Mutz:

– Biggest issues:

Diversity
Selective exposure
Interpersonal communication

– 2 kinds of search, influence of each

Collaborative filter matching, like Amazon
- Political targeting
- Contentious issue: 80% of people said that if they knew a politician was targeting them they wouldn’t vote for that candidate
  - My note: interesting to think about peoples relationships with their superficial categories of identity- it’s taken for granted so much in social science research, yet not by the people within the categories

– Search engines: the new gatekeepers

Page rank & other algorithms
No one knows what influence personalization of search results will have
Study on search learning: gave systematically different input to train engines are (given same start point), results changes Fast and Substantively

Rob Santos:

– Necessity mother of invention

Economic pressure
Reduce costs
Entrepreneurial spirit
Profit
Societal changes
- Demographic diversification
  - Globalization
  - Multi-lingual
  - Multi-cultural
  - Privacy concerns
  - Declining participation

– Bottom line: we adapt. Our industry Always Evolves

– We’re “in the midst of a renaissance, reinventing ourselves”

Me: That’s framing for you! Wow!

– On the rise:

Big Data
Synthetic Data
- Transportation industry
- Census
- Simulation studies
  - E.g. How many people would pay x amount of income tax under y policy?
Bayesian Methods
- Apply to probability and nonprobability samples
New generation
- Accustomed to and EXPECT rapid technological turnover
- Fully enmeshed in social media

– 3 big changes:

Non-probability sampling
- “Train already left the station”
- Level of sophistication varies
- Model based inference
- Wide public acceptance
- Already a proliferation
Communication technology
- Passive data collection
  - Behaviors
    - E.g. pos (point of service) apps
    - Attitudes or opinions
  - Real time collection
    - Prompted recall (apps)
    - Burden reduction
      - Gamification
Big Data
- What is it?
- Data too big to store
  - (me: “think “firehoses”)
  - Volume, velocity, variety
  - Fuzzy inferences
  - Not necessarily statistical
  - Coursenes insights

– We need to ask tough questions

(theme of next AAPOR conference is just that)
We need to question probability samples, too
- Flawed designs abound
- High nonresponse & noncoverage
- Can’t just scrutinize nonprobability samples
Nonprobability designs
- Some good, well accepted methods
- Diagnostics for measurement
  - How to measure validity?
  - What are the clues?
  - How to create a research agenda to establish validity?
Expanding the players
- Multidisciplinary
  - Substantive scientists
  - Math stats
  - Modelers
  - Econometricians
We need
- Conversations with practitioners
- Better listening skills

– AAPOR’s role

Create forum for conversation
Encourage transparency
Engage in outreach
Understanding limitations but learning approaches

– We need to explore the utility of nonprobability samples

– Insight doesn’t have to be purely from statistical inferences

– The biggest players in big data to date include:

Computational scientists
Modelers/ synthetic data’ers

– We are not a “one size fits all” society, and our research tools should reflect that

My big questions:

– “What are the borders of our field?”

– “What makes us who we are, if we don’t do surveys even primarily?”

Linguistic notes:

– Use of we/who/us

– Metaphors: “harvest” “firehose”

– Use of specialized vocabulary

– Use of the word “comfortable”

– Interview as a service encounter?

Other notes:

– This reminds me of Colm O’Muircheartaigh- from that old JPSM distinguished lecture

Embracing diversity
Allowing noise
Encouraging mixed methods

I wish his voice was a part of this discussion…

Posted in commentary, data, event, research, research methodology, skills, survey methodology
Tagged AAPOR, data, data analysis, design, quantitative, research methodology
7 Comments

Free Range Research

An aspiring postdisciplinarian surfs through the ebbs and flows of the changing research environment

Monthly Archives: October 2012

I conducted my first diversity training today…

Notes on the Past, Present and Future of Survey Methodology from #dcaapor