A brave new vision of the future of social science

I’ve been typing and organizing my notes from yesterday’s dc-aapor event on the past, present and future of survey research (which I still plan to share soon, after a little grooming). The process has been a meditative one.

I’ve been thinking about how I would characterize these same phases- the past, present and future… and then I had a vision of sorts on the way home today that I’d like to share. I’m going to take a minute to be a little post apocalyptic and let the future build itself. You can think of it as a daydream or thought experiment…

The past, I would characterize as the grand discovery of surveys as a tool for data collection; the honing and evolution of that tool in conjunction with its meticulous scientific development and the changing landscape around it; and the growth to dominance and proliferation of the method. The past was an era of measurement, of the total survey error model, of social Science.

The present I would characterize as a rapid coming together, or a perfect storm that is swirling data and ideas and disciplines of study and professions together in a grand sweeping wind. I see the survey folks trudging through the wind, waiting for the storm to pass, feet firmly anchored to solid ground.

The future is essentially the past, turned on its head. The pieces of the past are present, but mixed together and redistributed. Instead of examining the ways in which questions elicit usable data, we look at the data first and develop the questions from patterns in the data. In this era, data is everywhere, of various quality, character and genesis, and the skill is in the sense making.

This future is one of data driven analytic strategies, where research teams intrinsically need to be composed of a spectrum of different, specialized skills.

The kings of this future will be the experts in natural language processing, those with the skill of finding and using patterns in language. All language is patterned. Our job will be to find those patterns and then to discover their social meaning.

The computer scientists and coders will write the code to extract relevant subsets of data, and describe and learn patterns in the data. The natural language processing folks will hone the patterns by grammar and usage. The netnographers will describe and interpret the patterns, the data visualizers will make visual or interactive sense of the patterns, the sociologists will discover constructions of relative social groupings as they emerge and use those patterns. The discourse analysts will look across wider patterns of language and context dependency. The statisticians will make formulas to replicate, describe and evaluate the patterns, and models to predict future behaviors. Data science will be a crucial science built on the foundations of traditional and nontraditional academic disciplines.

How many people does it take to screw in this lightbulb? It depends on the skills of the people or person on the ladder.

Where do surveys fit in to this scheme? To be honest, I’m not sure. The success of surveys seems to rest in part on the failure of faster, cheaper methods with a great deal more inherent error.

This is not the only vision possible, but it’s a vision I saw while commuting home at the end of a damned long week… it’s a vision where naturalistic data is valued and experimentation is an extension of research, where diversity is a natural assumption of the model and not a superimposed dynamic, where the data itself and the patterns within it determine what is possible from it. It’s a vision where traditional academics fit only precariously; a future that could just as easily be ruled out by the constraints of the past as it could be adopted unintentionally, where meaning makers rush to be the rigs in the newest gold rush and theory is as desperately pursued as water sources in a drought.


The Bones of Solid Research?

What are the elements that make research “research” and not just “observation?” Where are the bones of the beast, and do all strategies share the same skeleton?

Last Thursday, in my Ethnography of Communication class, we spent the first half hour of class time taking field notes in the library coffee shop. Two parts of the experience struck me the hardest.

1.) I was exhausted. Class came at the end of a long, full work day, toward the end of a week that was full of back to school nights, work, homework and board meetings. I began my observation by ordering a (badly needed) coffee. My goal as I ordered was to see how few words I had to utter in order to complete the transaction. (In my defense, I am usually relatively talkative and friendly…) The experience of observing and speaking as little as possible reminded me of one of the coolest things I’d come across in my degree study: Charlotte Linde, SocioRocketScientist at NASA

2.) Charlotte Linde, SocioRocketScientist at NASA. Dr Linde had come to speak with the GU Linguistics department early in my tenure as a grad student. She mentioned that her thesis had been about the geography of communication- specifically: How did the layout of an (her?) apartment building help shape communication within it?

This idea had struck me, and stayed with me, but it didn’t really make sense until I began to study Ethnography of Communication. In the coffee shop, I structured my fieldnotes like a map and investigated it in terms of zones of activities. Then I investigated expectations and conventions of communication in each zone. As a follow-up to this activity, I’ll either return to the same shop or head to another coffee shop to do some contrastive mapping.

The process of Ethnography embodies the dynamic between quantitative and qualitative methods for me. When I read ethnographic research, I really find myself obsessing over ‘what makes this research?’ and ‘how is each statement justified?’ Survey methodology, which I am still doing every day at work, is so deeply structured that less structured research is, by contrast, a bit bewildering or shocking. Reading about qualitative methodology makes it seem so much more dependable and structured than reading ethnographic research papers does.

Much of the process of learning ethnography is learning yourself; your priorities, your organization, … learning why you notice what you do and evaluate it the way you do… Conversely, much of the process of reading ethnographic research seems to involve evaluation or skepticism of the researcher, the researcher’s perspective and the researcher’s interpretation. As a reader, the places where the researcher’s perspective varies from mine is clear and easy to see, as much as my own perspective is invisible to me.

All of this leads me back to the big questions I’m grappling with. Is this structured observational method the basis for all research? And how much structure does observation need to have in order to qualify as research?

I’d be interested to hear what you think of these issues!

Unlocking patterns in language

In linguistics study, we quickly learn that all language is patterned. Although the actual words we produce vary widely, the process of production does not. The process of constructing baby talk was found to be consistent across kids from 15 different languages. When any two people who do not speak overlapping languages come together and try to speak, the process is the same. When we look at any large body of data, we quickly learn that just about any linguistic phenomena is subject to statistical likelihood. Grammatical patterns govern the basic structure of what we see in the corpus. Variations in language use may tweak these patterns, but each variation is a patterned tweak with its own set of statistical likelihoods. Variations that people are quick to call bastardizations are actually patterned departures from what those people consider to be “standard” english. Understanding “differences not defecits” is a crucially important part of understanding and processing language, because any variation, even texting shorthand, “broken english,” or slang, can be better understood and used once its underlying structure is recognized.

The patterns in language extend beyond grammar to word usage. The most frequent words in a corpus are function words such as “a” and “the,” and the most frequent collocations are combinations like “and the” or “and then it.” These patterns govern the findings of a lot of investigations into textual data. A certain phrase may show up as a frequent member of a dataset simply because it is a common or lexicalized expression, and another combination may not appear because it is more rare- this could be particularly problematic, because what is rare is often more noticeable or important.

Here are some good starter questions to ask to better understand your textual data:

1) Where did this data come from? What was it’s original purpose and context?

2) What did the speakers intend to accomplish by producing this text?

3) What type of data or text, or genre, does this represent?

4) How was this data collected? Where is it from?

5) Who are the speakers? What is their relationship to eachother?

6) Is there any cohesion to the text?

7) What language is the text in? What is the linguistic background of the speakers?

8) Who is the intended audience?

9) What kind of repetition do you see in the text? What about repetition within the context of a conversation? What about repetition of outside elements?

10) What stands out as relatively unusual or rare within the body of text?

11) What is relatively common within the dataset?

12) What register is the text written in? Casual? Academic? Formal? Informal?

13) Pronoun use. Always look at pronoun use. It’s almost always enlightening.

These types of questions will take you much further into your dataset that the knee-jerk question “What is this text about?”

Now, go forth and research! …And be sure to report back!