The surprising unpredictability of language in use

This morning I recieved an e-mail from an international professional association that I belong to. The e-mail was in English, but it was not written by an American. As a linguist, I recognized the differences in formality and word use as signs that the person who wrote the e-mail is speaking from a set of experiences with English that differ from my own. Nothing in the e-mail was grammatically incorrect (although as a linguist I am hesitant to judge any linguistic differences as correct or incorrect, especially out of context).

Then later this afternoon I saw a tweet from Twitter on the correct use of Twitter abbreviations (RT, MT, etc.). If the growth of new Twitter users has indeed leveled off then Twitter is lucky, because the more Twitter grows the less they will be able to influence the language use of their base.

Language is a living entity that grows, evolves and takes shape based on individual experiences and individual perceptions of language use. If you think carefully about your experiences with language learning, you will quickly see that single exposures and dictionary definitions teach you little, but repeated viewings across contexts teach you much more about language.

Language use is patterned. Every word combination has a likelihood of appearing together, and that likelihood varies based on a host of contextual factors. Language use is complex. We use words in a variety of ways across a variety of contexts. These facts make language interesting, but they also obscure language use from casual understanding. The complicated nature of language in use interferes with analysts who build assumptions about language into their research strategies without realizing that their assumptions would not stand up to careful observation or study.

I would advise anyone involved in the study of language use (either as a primary or secondary aspect of their analysis) to take language use seriously. Fortunately, linguistics is fun and language is everywhere. So hop to it!

Advertisements

Reflections and Notes from the Sentiment Analysis Symposium #SAS14

The Sentiment Analysis Symposium took place in NY this week in the beautiful offices of the New York Academy of Sciences. The Symposium was framed as a transition into a new era of sentiment analysis, an era of human analytics or humetrics.

The view from the New York Academy of Sciences is really stunning!

The view from the New York Academy of Sciences is really stunning!

Two main points that struck me during the event. One is that context is extremely important for developing high quality analytics, but the actual shape that “context” takes varies greatly. The second is a seeming disconnect between the product developers, who are eagerly developing new and better measures, and the customers, who want better usability, more customer support, more customized metrics that fit their preexisting analytic frameworks and a better understanding of why social media analysis is worth their time, effort and money.

Below is a summary of some of the key points. My detailed notes from each of the speakers, can be viewed here. I attended both the more technical Technology and Innovation Session and the Symposium itself.

Context is in. But what is context?

The big takeaway from the Technology and Innovation session, which was then carried into the second day of the Sentiment Analysis Symposium was that context is important. But context was defined in a number of different ways.

 

New measures are coming, and old measures are improving.

The innovative new strategies presented at the Symposium made for really amazing presentations. New measures include voice intonation, facial expressions via remote video connections, measures of galvanic skin response, self tagged sentiment data from social media sharing sites, a variety of measures from people who have embraced the “quantified self” movement, metadata from cellphone connections (including location, etc.), behavioral patterning on the individual and group level, and quite a bit of network analysis. Some speakers showcased systems that involved a variety of linked data or highly visual analytic components. Each of these measures increase the accuracy of preexisting measures and complicate their implementation, bringing new sets of challenges to the industry.

Here is a networked representation of the emotion transition dynamics of 'Hopeful'

Here is a networked representation of the emotion transition dynamics of ‘Hopeful’

This software package is calculating emotional reactions to a Youtube video that is both funny and mean

This software package is calculating emotional reactions to a Youtube video that is both funny and mean

Meanwhile, traditional text-based sentiment analyses are also improving. Both the quality of machine learning algorithms and the quality of rule based systems are improving quickly. New strategies include looking at text data pragmatically (e.g. What are common linguistics patterns in specific goal directed behavior strategies?), gaining domain level specificity, adding steps for genre detection to increase accuracy and looking across languages. New analytic strategies are integrated into algorithms and complementary suites of algorithms are implemented as ensembles. Multilingual analysis is a particular challenge to ML techniques, but can be achieved with a high degree of accuracy using rule based techniques. The attendees appeared to agree that rule based systems are much more accurate that machine learning algorithms, but the time and expertise involved has caused them to come out of vogue.

 

“The industry as a whole needs to grow up”

I suspect that Chris Boudreaux of Accenture shocked the room when he said “the industry as a whole really needs to grow up.” Speaking off the cuff, without his slides after a mishap and adventure, Boudreaux gave the customer point of view toward social media analytics. He said said that social media analysis needs to be more reliable, accessible, actionable and dependable. Companies need to move past the startup phase to a new phase of accountability. Tools need to integrate into preexisting analytic structures and metrics, to be accessible to customers who are not experts, and to come better supported.

Boudreaux spoke of the need for social media companies to better understand their customers. Instead of marketing tools to their wider base of potential customers, the tools seem to be developed and marketed solely to market researchers. This has led to a more rapid adoption among the market research community and a general skepticism or ambivalence across other industries, who don’t see how using these tools would benefit them.

The companies who truly value and want to expand their customer base will focus on the usability of their dashboards. This is an area ripe for a growing legion of usability experts and usability testing. These dashboards cannot restrict API access and understanding to data scientist experts. They will develop, market and support these dashboards through productive partnerships with their customers, generating measures that are specifically relevant to them and personalized dashboards that fit into preexisting metrics and are easy for the customers to understand and react to in a very practical and personalized sense.

Some companies have already started to work with their customers in more productive ways. Crimson Hexagon, for example, employs people who specialize in using their dashboard. These employees work with customers to better understand and support their use of the platform and run studies of their own using the platform, becoming an internal element in the quality feedback loop.

 

Less Traditional fields for Social Media Analysis:

There was a wide spread of fields represented at the Symposium. I spoke with someone involved in text analysis for legal reasons, including jury analyses. I saw an NYPD name tag. Financial services were well represented. Publishing houses were present. Some health related organizations were present, including neuroscience specialists, medical practitioners interested in predicting early symptoms of diseases like Alzheimer’s, medical specialists interested in helping improve the lives of people with diseases like Autism (e.g. with facial emotion recognition devices), pharmaceutical companies interested in understanding medical literature on a massive scale as well as patient conversation about prescriptions and participation in medical trials. There were traditional market research firms, and many new startups with a wide variety of focuses and functions. There were also established technology companies (e.g. IBM and Dell) with innovation wings and many academic departments. I’m sure I’ve missed many of the entities present or following remotely.

The better research providers can understand the potential breadth of applications  of their research, the more they can improve the specific areas of interest to these communities.

 

Rethinking the Public Image of Sentiment Analysis:

There was some concern that “social” is beginning to have too much baggage to be an attractive label, causing people to think immediately of top platforms such as Facebook and Twitter and belying the true breadth of the industry. This prompted a movement toward other terms at the symposium, including human analytics, humetrics, and measures of human engagement.

 

Accuracy

Accuracy tops out at about 80%, because that’s the limit of inter-rater reliability in sentiment analysis. Understanding the more difficult data is an important challenge for social media analysts. It is important for there to be honesty with customers and with each other about the areas where automated tagging fails. This particular area was a kind of elephant in the room- always present, but rarely mentioned.

Although an 80% accuracy rate is really fantastic compared to no measure at all, and it is an amazing accomplishment given the financial constraints that analysts encounter, it is not an accuracy rate that works across industries and sectors. It is important to consider the “fitness for use” of an analysis. For some industries, an error is not a big deal. If a company is able to respond to 80% of the tweets directed at them in real-time, they are doing quite well, But when real people or weightier consequences are involved, this kind of error rate is blatantly unacceptable. These are the areas where human involvement in the analysis is absolutely critical. Where, honestly speaking, are algorithms performing fantastically, and where are they falling short? In the areas where they fall short, human experts should be deployed, adding behavioral and linguistic insight to the analysis.

One excellent example of Fitness for Use was the presentation by Capital Market Exchange. This company operationalizes sentiment as expert opinion. They mine a variety of sources for expert opinions about investing, and then format the commonalities in an actionable way, leading to a substantial improvement above market performance for their investors. They are able to gain a great deal of market traction that pure sentiment analysts have not by valuing the preexisting knowledge structures in their industry.

 

Targeting the weaknesses

It is important that the field look carefully at areas where algorithms do and do not work. The areas where they don’t represent whole fields of study, many of which have legions of social media analysts at the ready. This includes less traditional areas of linguistics, such as Sociolinguistics, Conversation Analysis (e.g. looking at expected pair parts) and Discourse Analysis (e.g. understanding identity construction), as well as Ethnography (with fast growing subfields, such as Netnography), Psychology and Behavioral Economics. Time to think strategically to better understand the data from new perspectives. Time to more seriously evaluate and invest in neutral responses.

 

Summing Up

Social media data analysis, large scale text analysis and sentiment analysis have enjoyed a kind of honeymoon period. With so many new and fast growing data sources, a plethora of growing needs and applications, and a competitive and fast growing set of analytic strategies, the field has been growing at an astronomical rate. But this excitement has to be balanced out with the practical needs of the marketplace. It is time for growing technologies to better listen to and accommodate the needs of the customer base. This shift will help ensure the viability of the field and free developers up to embrace the spirit of intellectual creativity.

This is an exciting time for a fast growing field!

Thank you to Seth Grimes for organizing such a great event.

 

Great readings that might shake you to your academic core? I’m compiling a list

In the spirit of research readings that might shake you to your academic core, I’m compiling a list. Please reply to this thread with any suggestions you have to add. They can be anything from short blog posts (microblog?) to research articles to books. What’s on your ‘must read’ list?

Here are a couple of mine to kick us off:

 

Charles Goodwin’s Professional Vision paper

I don’t think I’ve referred to any paper as much as this one. It’s about the way our professional training shapes the way we see the things around us. Shortly after reading this paper I was in the gym thinking about commonalities between the weight stacks and survey scales. I expect myself to be a certain relative strength, and when that doesn’t correlate with the place where I need to place my pin I’m a little thrown off.

It also has a deep analysis of the Rodney King verdict.

 

Revitalizing Chinatown Into a Heterotopia by Jia Lou

This article is based on a geosemiotic analysis of DC’s Chinatown. It is one of the articles that helped me to see that data really can come in all forms

 

After method: Mess in Social Science Research by John Law

This is the book that inspired this list. It also inspired this blog post.

 

On Postapocalyptic Research Methods and Failures, Honesty and Progress in Research

I’m reading a book that I like to call “post-apocalyptic research methodology.” It’s ‘After Method: Mess in Social Science Research’ by John Law. At this point the book reads like a novel. I can’t quite imagine where he’ll take his premise, but I’m searching for clues and turning pages. In the meantime, I’ve been thinking quite a bit about failure, honesty, uncertainty and humility in research.

How is the current research environment like a utopian society?

The research process is often idealized in public spaces. Whether the goal of the researcher is to publish a paper based on their research, present to an audience of colleagues or stakeholders about their research, or market the product of their research, all researchers have a vested interest in the smoothness of the research process. We expect to approach a topic, perform a series of time-tested methods or develop innovative new methods with strong historical traditions, apply these methods as neatly as possible, and end up with a series of strong themes that describe the majority of our data. However, in Law’s words “Parts of the world are caught in our ethnographies, our histories and our statistics. But other parts are not, and if they are then this is because they have been distorted into clarity.” (p. 2) We think of methods as a neutral middle step and not a political process, and this way of thinking allows us to focus on reliability and validity as surface measures and not inherent questions. “Method, as we usually imagine it, is a system for offering more or less bankable guarantees.” (p. 9)

Law points out that research methods are, in practice, very limited in the social sciences “talk of method still tends to summon up a relatively limited repertoire of responses.” (p. 3) Law also points out that every research method is inherently political. Every research method involves a way of seeing or a way of looking at the data, and that perspective maps onto the findings it yields. Different perspectives yield different findings, whether they are subtly or dramatically different. Law’s central assertion is that methods don’t just describe social realities but also help to create them. Recognizing the footprint of our own methods is a step toward better understanding our data and results.

In practice, the results that we focus on are largely true. They describe a large portion of the data, ascribing the rest of the data to noise or natural variation. When more of our data is described in our results, we feel more confident about our data and our analysis.

Law argues that this smoothed version of reality is far enough from the natural world that it should perk our ears. Research works to create a world that is simple and falls into place neatly and resembles nothing we know, “’research methods’ passed down to us after a century of social science tend to work on the assumption that the world is properly to be understood as a set of fairly specific, determinate, and more or less identifiable processes.” (p. 5) He suggests instead that we should recognize the parts that don’t fit, the areas of uncertainty or chaos, and the areas where our methods fail. “While standard methods are often extremely good at what they do, they are badly adapted to the study of the ephemeral, the indefinite and the irregular.” (p. 4). “Regularities and standardizations are incredibly powerful tools, but they set limits.” (p. 6)

Is the Utopia starting to fall apart?

The current research environment is a bit different from that of the past. More people are able to publish research at any stage without peer review using media like blogs. Researchers are able to discuss their research while it is in progress using social media like Twitter. There is more room to fail publicly than there ever has been before, and this allows for public acknowledgment of some of the difficulties and challenges that researcher’s face.

Building from ashes

Law briefly introduces his vision on p. 11 “My hope is that we can learn to live in a way that is less dependent on the automatic. To live more in and through slow method, or vulnerable method, or quiet method. Multiple method. Modest method. Uncertain method. Diverse method.”

Many modern discussions of about management talk about the value of failure as an innovative tool. Some of the newer quality control measures in aviation and medicine hinge on the recognition of failure and the retooling necessary to prevent or limit the recurrences of specific types of events. The theory behind these measures is that failure is normal and natural, and we could never predict the many ways in which failure could happen. So, instead of exclusively trying to predict or prohibit failure, failures should be embraced as opportunities to learn.

Here we can ask: what can researchers learn from the failures of the methods?

The first lesson to accompany any failure is humility. Recognizing our mistakes entails recognizing areas where we fell short, where our efforts were not enough. Acknowledging that our research training cannot be universal, that applying research methods isn’t always straightforward and simple, and that we cannot be everything to everyone could be an important stage of professional development.

How could research methodology develop differently if it were to embrace the uncertain, the chaotic and the places where we fall short?

Another question: What opportunities to researchers have to be publicly humble? How can those spaces become places to learn and to innovate?

Note: This blog post is dedicated to Dr Jeffrey Keefer @ NYU, who introduced me to this very cool book and has done some great work to bring researchers together

Methodology will only get you so far

I’ve been working on a post about humility as an organizational strategy. This is not that post, but it is also about humility.

I like to think of myself as a research methodologist, because I’m more interested in research methods than any specific area of study. The versatility of methodology as a concentration is actually one of the biggest draws for me. I love that I’ve been able to study everything from fMRI subjects and brain surgery patients to physics majors and teachers, taxi drivers and internet activists. I’ve written a paper on Persepolis as an object of intercultural communication and a paper on natural language processing of survey responses, and I’m currently studying migration patterns and communication strategies.

But a little dose of humility is always a good thing.

Yesterday I hosted the second in a series of online research, offline lunches that I’ve been coordinating. The lunches are intended as a way to get people from different sectors and fields who are conducting research on the internet together to talk about their work across the artificial boundaries of field and sector. These lunches change character as the field and attendees change.

I’ve been following the field of online research for many years now, and it has changed dramatically and continually before my eyes. Just a year ago Seth Grimes Sentiment Analysis Symposia were at the forefront of the field, and now I wonder if he is thinking of changing the title and focus of his events. Two years ago tagging text corpora with grammatical units was a standard midstep in text analysis, and now machine algorithms are far more common and often much more effective, demonstrating that grammar in use is far enough afield from grammar in theory to generate a good deal of error. Ten years ago qualitative research was often more focused on the description of platforms than the behaviors specific to them, and now the specific innerworkings of platform are much more of an aside to a behavioral focus.

The Association of Internet Researchers is currently having their conference in Denver (#ir14), generating more than 1000 posts per day under the conference hashtag and probably moving the field far ahead of where it was earlier this week.

My interest and focus has been on the methodology of internet research. I’ve been learning everything from qualitative methods to natural language processing and social network analysis to bayesian methods. I’ve been advocating for a world where different kinds of methodologists work together, where qualitative research informs algorithms and linguists learn from the differences between theoretical grammar and machine learned grammar, a world where computer scentists work iteratively with qualitative researchers. But all of these methods fall short because there is an elephant in the methodological room. This elephant, ladies and gentleman, is made of content. Is it enough to be a methodological specialist, swinging from project to project, grazing on the top layer of content knowledge without ever taking anything down to its root?

As a methodologist, I am free to travel from topic area to topic area, but I can’t reach the root of anything without digging deeper.

At yesterday’s lunch we spoke a lot about data. We spoke about how the notion of data means such different things to different researchers. We spoke about the form and type of data that different researchers expect to work with, how they groom data into the forms they are most comfortable with, how the analyses are shaped by the data type, how data science is an amazing term because just about anything could be data. And I was struck by the wide-openness of what I was trying to do. It is one thing to talk about methodology within the context of survey research or any other specific strategy, but what happens when you go wider? What happens when you bring a bunch of methodologists of all stripes together to discuss methodology? You lack the depth that content brings. You introduce a vast tundra of topical space to cover. But can you achieve anything that way? What holds together this wide realm of “research?”

We speak a lot about the lack of generalizable theories in internet research. Part of the hope for qualitative research is that it will create generalizable findings that can drive better theories and improve algorithmic efforts. But that partnership has been slow, and the theories have been sparse and lightweight. Is it possible that the internet is a space where theory alone just doesn’t cut it? Could it be that methodologists need to embrace content knowledge to a greater degree in order to make any of the headway we so desperately want to make?

Maybe the missing piece of the puzzle is actually the picture painted on the pieces?

comic

The data Rorschach test, or what does your research say about you?

Sure, there is a certain abundance of personality tests: inkblot tests, standardized cognitive tests, magazine quizzes, etc. that we could participate in. But researchers participate in Rorschach tests of our own every day. There are a series of questions we ask as part of the research process, like:

What data do we want to collect or use? (What information is valuable to us? What do we call data?)

What format are we most comfortable with it in? (How clean does it have to be? How much error are we comfortable with? Does it have to resemble a spreadsheet? How will we reflect sources and transformations? What can we equate?)

What kind of analyses do we want to conduct? (This is usually a great time for our preexisting assumptions about our data to rear their heads. How often do we start by wondering if we can confirm our biases with data?!)

What results do we choose to report? To whom? How will we frame them?

If nothing else, our choices regarding our data reflect many of our values as well as our professional and academic experiences. If you’ve ever sat in on a research meeting, you know that “you want to do WHAT with which data?!” feeling that comes when someone suggests something that you had never considered.

Our choices also speak to the research methods that we are most comfortable with. Last night I attended a meetup event about Natural Language Processing, and it quickly became clear that the mathematician felt most comfortable when the data was transformed into numbers, the linguist felt most comfortable when the data was transformed into words and lexical units, and the programmer was most comfortable focusing on the program used to analyze the data. These three researchers confronted similar tasks, but their three different methods that will yield very different results.

As humans, we have a tendency to make assumptions about the people around us, either by assuming that they are very different or very much the same. Those of you who have seen or experienced a marriage or serious long-term partnership up close are probably familiar with the surprised feeling we get when we realize that one partner thinks differently about something that we had always assumed they would not differ on. I remember, for example, that small feeling that my world was upside down just a little bit when I opened a drawer in the kitchen and saw spoons and forks together in the utensil organizer. It had simply never occurred to me that anyone would mix the two, especially not my own husband!

My main point here is not about my husband’s organizational philosophy. It’s about the different perspectives inherently tied up in the research process. It can be hard to step outside our own perspective enough to see what pieces of ourselves we’ve imposed on our research. But that awareness is an important element in the quality control process. Once we can see what we’ve done, we can think much more carefully about the strengths and weaknesses of our process. If you believe there is only one way, it may be time to take a step back and gain a wider perspective.

Statistical Text Analysis for Social Science: Learning to Extract International Relations from the News

I attended another great CLIP event today, Statistical Text Analysis for Social Science: Learning to Extract International Relations from the News, by Brendan O’Connor, CMU. I’d love to write it up, but I decided instead to share my notes. I hope they’re easy to follow. Please feel free to ask any follow-up questions!

 

Computational Social Science

– Then: 1890 census tabulator- hand cranked punch card tabulator

– Now: automated text analysis

 

Goal: develop methods of predicting, etc conflicts

– events = data

– extracting events from news stories

– information extraction from large scale news data

– goal: time series of country-country interactions

– who did what to whom? in what order?

Long history of manual coding of this kind of data for this kind of purpose

– more recently: rule based pattern extraction, TABARI

– —> developing event types (diplomatic events, aggressions, …) from verb patterns – TABARI hand engineered 15,000 coding patterns over the course of 2 decades —> very difficult, validity issues, changes over time- all developed by political scientists Schrodt 1994- in MUCK (sp?) days – still a common poli sci methodology- GDELT project- software, etc. w/pre & postprocessing

http://gdelt.utdallas.edu

– Sources: mainstream media news, English language, select sources

 

THIS research

– automatic learning of event types

– extract events/ political dynamics

→ use Bayesian probabilistic methods

– using social context to drive unsupervised learning about language

– data: Gigaword corpus (news articles) – a few extra sources (end result mostly AP articles)

– named entities- dictionary of country names

– news biases difficult to take into account (inherent complication of the dataset)(future research?)

– main verb based dependency path (so data is pos tagged & then sub/obj tagged)

– 3 components: source (acting country)/ recipient (recipient country)/ predicate (dependency path)

– loosely Dowty 1990

– International Relations (IR) is heavily concerned with reciprocity- that affects/shapes coding, goals, project dynamics (e.g. timing less important than order, frequency, symmetry)

– parsing- core NLP

– filters (e.g. Georgia country vs. Georgia state) (manual coding statements)

– analysis more focused on verb than object (e.g. text following “said that” excluded)

– 50% accuracy finding main verb (did I hear that right? ahhh pos taggers and their many joys…)

– verb: “reported that” – complicated: who is a valid source? reported events not necessarily verified events

– verb: “know that” another difficult verb

 The models:

– dyads = country pairs

– each w/ timesteps

– for each country pair a time series

– deduping necessary for multiple news coverage (normalizing)

– more than one article cover a single event

– effect of this mitigated because measurement in the model focuses on the timing of events more than the number of events

1st model

– independent contexts

– time slices

– figure for expected frequency of events (talking most common, e.g.)

2nd model

– temporal smoothing: assumes a smoothness in event transitions

– possible to put coefficients that reflect common dynamics- what normally leads to what? (opportunity for more research)

– blocked Gibbs sampling

– learned event types

– positive valence

– negative valence

– “say” ← some noise

– clusters: verbal conflict, material conflict, war terms, …

How to evaluate?

– need more checks of reasonableness, more input from poli sci & international relations experts

– project end goal: do political sci

– one evaluative method: qualitative case study (face validity)

– used most common dyad Israeli: Palestinian

– event class over time

– e.g. diplomatic actions over time

– where are the spikes, what do they correspond with? (essentially precision & recall)

– another event class: police action & crime response

– Great point from audience: face validity: my model says x, then go to data- can’t develop labels from the data- label should come from training data not testing data

– Now let’s look at a small subset of words to go deeper

– semantic coherence?

– does it correlate with conflict?

– quantitative

– lexical scale evaluation

– compare against TABARI (lucky to have that as a comparison!!)

– another element in TABARI: expert assigned scale scores – very high or very low

– validity debatable, but it’s a comparison of sorts

– granularity invariance

– lexical scale impurity

Comparison sets

– wordnet – has synsets – some verb clusters

– wordnet is low performing, generic

– wordnet is a better bar than beating random clusters

– this model should perform better because of topic specificity

 

“Gold standard” method- rarely a real gold standard- often gold standards themselves are problematic

– in this case: militarized interstate dispute dataset (wow, lucky to have that, too!)

Looking into semi-supervision, to create a better model

 speaker website:

http://brenocon.com

 

Q &A:

developing a user model

– user testing

– evaluation from users & not participants or collaborators

– terror & protest more difficult linguistic problems

 

more complications to this project:

– Taiwan, Palestine, Hezbollah- diplomatic actors, but not countries per se