Do you ever think about interfaces? Because I do. All the time.

Did you ever see the movie Singles? It came out in the early 90s, shortly before the alternative scene really blew up and I dyed [part of] my hair blue and thought seriously about piercings. Singles was a part of the growth of the alternative movement. In the movie, there is a moment when one character says to another “Do you ever think about traffic? Because I do. All the time.” I spent quite a bit of time obsessing over that line, about what it meant, and, more deeply, what it signaled.

I still think about that line. As I drove toward the turnoff to my mom’s street during our 4th of July vacation, I saw what looked like the turn lane for her street, but it was actually an intersection- less left- turning split immediately preceding the real left turn lane for her street. It threw me off every time, and I kept remembering that romantic moment in Singles when the two characters were getting to know each other’s quirks, and the man was talking about traffic. And it was okay, even cool, to be quirky and think or talk about traffic, even during a romantic moment.

I don’t think about traffic often. But I am no less quirky. Lately, I tend to think about interfaces. Before my first brush with NLP (Natural Language Processing), I thought quite a bit about alternatives to e-mail. Since I discovered the world of text analytics, I have been thinking quite a bit about ways to integrate the knowledge across different fields about methods for text analysis and the needs of quantitative and qualitative researchers. I want to think outside of the sentiment box, because I believe that sentiment analysis does not fully address the underlying richness of textual data. I want to find a way to give researchers what they need, not what they think they want. Recently, my thinking on this topic has flipped. Instead of thinking from the data end, or the analytic possibilities end, or about what programs already exist and what they do, I have started to think about interfaces. This feels like a real epiphany. Once we think about the problem from an interface, or user experience perspective, we can better utilize existing technology and harness user expectations.

Have you read the new Imagine book about how creativity works? I believe that this strategy is the natural step after spending time zoning out on the web, thinking, or not thinking, about research. The more time you cruise, the better feel you develop for what works and what doesn’t, the more you learn what to expect. Interfaces are simply the masks we put on datasets of all sorts. The data could be the world wide web as a whole, results from a site or time period, a database of merchandise, or even a set of open ended survey responses. The goal is to streamline the searching interface and then make it available for use on any number of datasets. We use NLP every day when we search the internet, or shop. We understand it intuitively. Why don’t we extend that understanding to text analysis?

I find myself thinking about what this interface should look like and what I want this program to do.

Not traffic, not as romantic. But still quirky and all-encompassing.

Question Writing is an Art

As a survey researcher, I like to participate in surveys with enough regularity to keep current on any trends in methodology. As a web designer, an aspect of successful design is a seamlessness with the visitor’s expectations. So if the survey design realm has moved toward submit buttons on the upper right hand corner of individual pages, your idea (no matter how clever) to put a submit button on the upper left can result in a disconnect on the part of the user that will effect their behavior on the page. In fact, the survey design world has evolved quite a bit in the last few years, and it is easy to design something that reflects poorly on the quality of your research endeavor. But these design concerns are less of an issue than they have been, because most researchers are using templates.

Yet there is still value in keeping current.

And sometimes we encounter questions that lend themselves to an explanation of the importance of question writing. These questions are a gift for a field that is so difficult to describe in terms of knowledge and skills!

Here is a question I encountered today (I won’t reveal the source):

How often do you purchase potato chips when you eat out at any quick service and fast food restaurants?

2x a week or more
1x a week
1x every 2-3 weeks
1x a month
1x every 2-3 months
Less than 1x every 3 months
Never

This is a prime example of a double barreled question, and it is also an especially difficult question to answer. In my care, I rarely eat at quick service restaurants, especially sandwich places, like this one, that offer potato chips. When I do eat at them, I am tempted to order chips. About half the time I will give in to the temptation with a bag of sunchips, which I’m pretty sure are not made of potato.

In bigger firms that have more time to work through, this information would come out in the process of a cognitive interview or think aloud during the pretesting phase. Many firms, however, have staunchly resisted these important steps in the surveying process, because of their time and expense. It is important to note that the time and expense involved with trying to make usable answers out of poorly written questions can be immense.

I have spent some time thinking about alternatives to cognitive testing, because I have some close experience with places that do not use this method. I suspect that this is a good place for text analytics, because of the power of reaching people quickly and potentially cheaply (depending on your embedded TA processes). Although oftentimes we are nervous about web analytics because of their representativeness, the bar for representativeness is significantly lower in the pretesting stage than in the analysis phase.

But, no matter what pretesting model you choose, it is important to look closely at the questions that you are asking. Are you asking a single question, or would these questions be better separated out into a series?

How often do you eat at quick service sandwich restaurants?

When you eat at quick service restaurants, do you order [potato] chips?

What kind of [potato] chips do you order?

The lesson of all of this is that question writing is important, and the questions we write in surveys will determine the kind of survey responses we receive and the usability of our answers.

To go big, first think small

We use language all of the time. Because of this, we are all experts in language use. As native speakers of a language, we are experts in the intricacies of that language.

Why, then, do people study linguistics? Aren’t we all linguists?

Absolutely not.

We are experts in *using* language, but we are not experts in the methods we employ. Believe it or not, much of the process of speaking and hearing is not conscious. If it was, we would be sensorally overwhelmed with the sheer volume of words around us. Instead, listening comprehension involves a process of merging what we expect to hear with what we gauge to be the most important elements of what we do hear. The process of speaking involves merging our estimates of what the people we communicate with know and expect to hear with our understanding of the social expectations surrounding our words and our relationships and distilling these sources into a workable expression. The hearer will reconstruct elements of this process using cues that are sometimes conscious and sometimes not.

We often think of language as simple and mechanistic, but it is not simple at all. As conversational analysts, our job is to study conversation that we have access to in an attempt to reconstruct the elements that constituted the interaction. Even small chunks of conversation encode quite a bit of information.

The process of conversation analysis is very much contrary to our sense of language as regular language users. This makes the process of explaining our research to people outside our field difficult. It is difficult to justify the research, and it is difficult to explain why such small pieces of data can be so useful, when most other fields of research rely on greater volumes of data.

In fact, a greater volume of data can be more harmful than helpful in conversation analysis. Conversation is heavily dependent on its context; on the people conversing, their relationship, their expectations, their experiences that day, the things on their mind, what they expect from each other and the situation, their understanding of language and expectations, and more. The same sentence can have greatly different meanings once those factors are taken into account.

At a time when there is so much talk of the glory of big data, it is especially important to keep in mind the contributions of small data. These contributions are the ones that jeopardize the utility and promise of big data, and if these contributions can be captured in creative ways, they will be the true promise of the field.

Not what language users expect to see, but rather what we use every day, more or less consciously.

Data Journalism, like photography, “involves selection, filtering, framing, composition and emphasis”

Beautiful:

“Creating a good piece of data journalism or a good data-driven app is often more like an art than a science. Like photography, it involves selection, filtering, framing, composition and emphasis. It involves making sources sing and pursuing truth – and truth often doesn’t come easily. ” -Jonathan Gray

Whole article:

http://www.guardian.co.uk/news/datablog/2012/may/31/data-journalism-focused-critical

Truly, at a time when the buzz about big data is at such a peak, it is nice to hear a voice of reason and temper! Folks: big data will not do all that it is talked up to do. It will, in fact, do something surprising and different. And that something will come from the interdisciplinary thought leaders in fields like natural language processing and linguistics. That *something,* not the data itself, will be the new oil.

Remotely following AAPOR conference #aapor

The AAPOR 2012 conference began today in sunny Orlando, Florida. This is my my favorite conference of the year, and I am sorry to miss it. Fortunately, the Twitter action is bringing a lot of the action to homeviewers like us!

https://twitter.com/#!/search/realtime/%23AAPOR

I will keep retweeting some of the action. For those of you who may be concerned that this represents a new era of heavy tweeting for me, rest assured- it wont!

And for anyone who has been wondering what happened to me and my blog, please stay tuned. I am working on an exciting new project that I will eagerly share about in due time.

Searching for Social Meanings in Social Media

This next CLIP event looks really fantastic!

 

Please join us on Wednesday at 11AM in AV Williams room 3258 for the University of Maryland Computational Linguistics and Information Processing (CLIP) colloquium!

 

May 2: Jacob Eisenstein: Searching for social meanings in social media

 

Social interaction is increasingly conducted through online platforms such as Facebook and Twitter, leaving a recorded trace of millions of individual interactions. While some have focused on the supposed deficiencies of social media with respect to more traditional communication channels, language in social media features the same rich connections with personal and group identity, style, and social context. However, social media’s unique set of linguistic affordances causes social meanings to be expressed in new and perhaps surprising ways. This talk will describe research that builds on large-scale social media corpora using analytic tools from statistical machine learning. I will focus on some of the ways in which social media data allow us to go beyond traditional sociolinguistic methods, but I will also discuss lessons from the sociolinguistics literature that the new generation of “big data” research might do well to heed.

 

This research includes collaborations with David Bamman, Brendan O’Connor, Tyler Schnoebelen, Noah A. Smith, and Eric P. Xing.

 

Bio: Jacob Eisenstein is an Assistant Professor in the School of Interactive Computing at Georgia Tech. He works on statistical natural language processing, focusing on social media analysis, discourse, and non-verbal communication. Jacob was a Postdoctoral researcher at Carnegie Mellon and the University of Illinois. He completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award.

 

Location of AV Williams:

http://www.umd.edu/CampusMaps/bld_detail.cfm?bld_code=AVW

http://maps.google.com/maps?q=av+williams

 

Webpage for CLIP events:

https://wiki.umiacs.umd.edu/clip/index.php/Events#Colloquia

 

Facebook Measures Happiness in Status Updates?

From Flowing data:

http://flowingdata.com/2009/10/05/facebook-measures-happiness-in-status-updates/

Does anyone have a link to the original report?

I really wish I had more of a window into the methodology of this one!

A couple of questions:

What is happiness?

How can it be measured or signaled? What kinds of data are representing happiness? Is this just an expanded or open ended sentiment analysis? Is the technology such that this would be a valid study?

Are Facebook statuses a sensible place to investigate happiness?

What is this study representing? Constituting? Perpetuating?

 

Edited to Add: http://blog.facebook.com/blog.php?post=150162112130