Methodology will only get you so far

I’ve been working on a post about humility as an organizational strategy. This is not that post, but it is also about humility.

I like to think of myself as a research methodologist, because I’m more interested in research methods than any specific area of study. The versatility of methodology as a concentration is actually one of the biggest draws for me. I love that I’ve been able to study everything from fMRI subjects and brain surgery patients to physics majors and teachers, taxi drivers and internet activists. I’ve written a paper on Persepolis as an object of intercultural communication and a paper on natural language processing of survey responses, and I’m currently studying migration patterns and communication strategies.

But a little dose of humility is always a good thing.

Yesterday I hosted the second in a series of online research, offline lunches that I’ve been coordinating. The lunches are intended as a way to get people from different sectors and fields who are conducting research on the internet together to talk about their work across the artificial boundaries of field and sector. These lunches change character as the field and attendees change.

I’ve been following the field of online research for many years now, and it has changed dramatically and continually before my eyes. Just a year ago Seth Grimes Sentiment Analysis Symposia were at the forefront of the field, and now I wonder if he is thinking of changing the title and focus of his events. Two years ago tagging text corpora with grammatical units was a standard midstep in text analysis, and now machine algorithms are far more common and often much more effective, demonstrating that grammar in use is far enough afield from grammar in theory to generate a good deal of error. Ten years ago qualitative research was often more focused on the description of platforms than the behaviors specific to them, and now the specific innerworkings of platform are much more of an aside to a behavioral focus.

The Association of Internet Researchers is currently having their conference in Denver (#ir14), generating more than 1000 posts per day under the conference hashtag and probably moving the field far ahead of where it was earlier this week.

My interest and focus has been on the methodology of internet research. I’ve been learning everything from qualitative methods to natural language processing and social network analysis to bayesian methods. I’ve been advocating for a world where different kinds of methodologists work together, where qualitative research informs algorithms and linguists learn from the differences between theoretical grammar and machine learned grammar, a world where computer scentists work iteratively with qualitative researchers. But all of these methods fall short because there is an elephant in the methodological room. This elephant, ladies and gentleman, is made of content. Is it enough to be a methodological specialist, swinging from project to project, grazing on the top layer of content knowledge without ever taking anything down to its root?

As a methodologist, I am free to travel from topic area to topic area, but I can’t reach the root of anything without digging deeper.

At yesterday’s lunch we spoke a lot about data. We spoke about how the notion of data means such different things to different researchers. We spoke about the form and type of data that different researchers expect to work with, how they groom data into the forms they are most comfortable with, how the analyses are shaped by the data type, how data science is an amazing term because just about anything could be data. And I was struck by the wide-openness of what I was trying to do. It is one thing to talk about methodology within the context of survey research or any other specific strategy, but what happens when you go wider? What happens when you bring a bunch of methodologists of all stripes together to discuss methodology? You lack the depth that content brings. You introduce a vast tundra of topical space to cover. But can you achieve anything that way? What holds together this wide realm of “research?”

We speak a lot about the lack of generalizable theories in internet research. Part of the hope for qualitative research is that it will create generalizable findings that can drive better theories and improve algorithmic efforts. But that partnership has been slow, and the theories have been sparse and lightweight. Is it possible that the internet is a space where theory alone just doesn’t cut it? Could it be that methodologists need to embrace content knowledge to a greater degree in order to make any of the headway we so desperately want to make?

Maybe the missing piece of the puzzle is actually the picture painted on the pieces?

comic

The data Rorschach test, or what does your research say about you?

Sure, there is a certain abundance of personality tests: inkblot tests, standardized cognitive tests, magazine quizzes, etc. that we could participate in. But researchers participate in Rorschach tests of our own every day. There are a series of questions we ask as part of the research process, like:

What data do we want to collect or use? (What information is valuable to us? What do we call data?)

What format are we most comfortable with it in? (How clean does it have to be? How much error are we comfortable with? Does it have to resemble a spreadsheet? How will we reflect sources and transformations? What can we equate?)

What kind of analyses do we want to conduct? (This is usually a great time for our preexisting assumptions about our data to rear their heads. How often do we start by wondering if we can confirm our biases with data?!)

What results do we choose to report? To whom? How will we frame them?

If nothing else, our choices regarding our data reflect many of our values as well as our professional and academic experiences. If you’ve ever sat in on a research meeting, you know that “you want to do WHAT with which data?!” feeling that comes when someone suggests something that you had never considered.

Our choices also speak to the research methods that we are most comfortable with. Last night I attended a meetup event about Natural Language Processing, and it quickly became clear that the mathematician felt most comfortable when the data was transformed into numbers, the linguist felt most comfortable when the data was transformed into words and lexical units, and the programmer was most comfortable focusing on the program used to analyze the data. These three researchers confronted similar tasks, but their three different methods that will yield very different results.

As humans, we have a tendency to make assumptions about the people around us, either by assuming that they are very different or very much the same. Those of you who have seen or experienced a marriage or serious long-term partnership up close are probably familiar with the surprised feeling we get when we realize that one partner thinks differently about something that we had always assumed they would not differ on. I remember, for example, that small feeling that my world was upside down just a little bit when I opened a drawer in the kitchen and saw spoons and forks together in the utensil organizer. It had simply never occurred to me that anyone would mix the two, especially not my own husband!

My main point here is not about my husband’s organizational philosophy. It’s about the different perspectives inherently tied up in the research process. It can be hard to step outside our own perspective enough to see what pieces of ourselves we’ve imposed on our research. But that awareness is an important element in the quality control process. Once we can see what we’ve done, we can think much more carefully about the strengths and weaknesses of our process. If you believe there is only one way, it may be time to take a step back and gain a wider perspective.

Statistical Text Analysis for Social Science: Learning to Extract International Relations from the News

I attended another great CLIP event today, Statistical Text Analysis for Social Science: Learning to Extract International Relations from the News, by Brendan O’Connor, CMU. I’d love to write it up, but I decided instead to share my notes. I hope they’re easy to follow. Please feel free to ask any follow-up questions!

 

Computational Social Science

– Then: 1890 census tabulator- hand cranked punch card tabulator

– Now: automated text analysis

 

Goal: develop methods of predicting, etc conflicts

– events = data

– extracting events from news stories

– information extraction from large scale news data

– goal: time series of country-country interactions

– who did what to whom? in what order?

Long history of manual coding of this kind of data for this kind of purpose

– more recently: rule based pattern extraction, TABARI

– —> developing event types (diplomatic events, aggressions, …) from verb patterns – TABARI hand engineered 15,000 coding patterns over the course of 2 decades —> very difficult, validity issues, changes over time- all developed by political scientists Schrodt 1994- in MUCK (sp?) days – still a common poli sci methodology- GDELT project- software, etc. w/pre & postprocessing

http://gdelt.utdallas.edu

– Sources: mainstream media news, English language, select sources

 

THIS research

– automatic learning of event types

– extract events/ political dynamics

→ use Bayesian probabilistic methods

– using social context to drive unsupervised learning about language

– data: Gigaword corpus (news articles) – a few extra sources (end result mostly AP articles)

– named entities- dictionary of country names

– news biases difficult to take into account (inherent complication of the dataset)(future research?)

– main verb based dependency path (so data is pos tagged & then sub/obj tagged)

– 3 components: source (acting country)/ recipient (recipient country)/ predicate (dependency path)

– loosely Dowty 1990

– International Relations (IR) is heavily concerned with reciprocity- that affects/shapes coding, goals, project dynamics (e.g. timing less important than order, frequency, symmetry)

– parsing- core NLP

– filters (e.g. Georgia country vs. Georgia state) (manual coding statements)

– analysis more focused on verb than object (e.g. text following “said that” excluded)

– 50% accuracy finding main verb (did I hear that right? ahhh pos taggers and their many joys…)

– verb: “reported that” – complicated: who is a valid source? reported events not necessarily verified events

– verb: “know that” another difficult verb

 The models:

– dyads = country pairs

– each w/ timesteps

– for each country pair a time series

– deduping necessary for multiple news coverage (normalizing)

– more than one article cover a single event

– effect of this mitigated because measurement in the model focuses on the timing of events more than the number of events

1st model

– independent contexts

– time slices

– figure for expected frequency of events (talking most common, e.g.)

2nd model

– temporal smoothing: assumes a smoothness in event transitions

– possible to put coefficients that reflect common dynamics- what normally leads to what? (opportunity for more research)

– blocked Gibbs sampling

– learned event types

– positive valence

– negative valence

– “say” ← some noise

– clusters: verbal conflict, material conflict, war terms, …

How to evaluate?

– need more checks of reasonableness, more input from poli sci & international relations experts

– project end goal: do political sci

– one evaluative method: qualitative case study (face validity)

– used most common dyad Israeli: Palestinian

– event class over time

– e.g. diplomatic actions over time

– where are the spikes, what do they correspond with? (essentially precision & recall)

– another event class: police action & crime response

– Great point from audience: face validity: my model says x, then go to data- can’t develop labels from the data- label should come from training data not testing data

– Now let’s look at a small subset of words to go deeper

– semantic coherence?

– does it correlate with conflict?

– quantitative

– lexical scale evaluation

– compare against TABARI (lucky to have that as a comparison!!)

– another element in TABARI: expert assigned scale scores – very high or very low

– validity debatable, but it’s a comparison of sorts

– granularity invariance

– lexical scale impurity

Comparison sets

– wordnet – has synsets – some verb clusters

– wordnet is low performing, generic

– wordnet is a better bar than beating random clusters

– this model should perform better because of topic specificity

 

“Gold standard” method- rarely a real gold standard- often gold standards themselves are problematic

– in this case: militarized interstate dispute dataset (wow, lucky to have that, too!)

Looking into semi-supervision, to create a better model

 speaker website:

http://brenocon.com

 

Q &A:

developing a user model

– user testing

– evaluation from users & not participants or collaborators

– terror & protest more difficult linguistic problems

 

more complications to this project:

– Taiwan, Palestine, Hezbollah- diplomatic actors, but not countries per se

Planning a second “Online Research, Offline Lunch”

In August we hosted the first Online Research, Offline Lunch for researchers involved in online research in any field, discipline or sector in the DC area. Although Washington DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. These lunches provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives. We kept the first gathering small. But the enthusiasm for this small event was quite large, and it was a great success! We had interesting conversations, learned a lot, made some valuable connections, and promised to meet again.

Many expressed interest in the lunches but weren’t able to attend. If you have any specific scheduling requests, please let me know now. Although I certainly can’t accommodate everyone’s preferences, I will do my best to take them into account.

Here is a form that can be used to add new people to the list. If you’re already on the list you do not need to sign up again. Please feel free to share the form with anyone else who may be interested: