Great readings that might shake you to your academic core? I’m compiling a list

In the spirit of research readings that might shake you to your academic core, I’m compiling a list. Please reply to this thread with any suggestions you have to add. They can be anything from short blog posts (microblog?) to research articles to books. What’s on your ‘must read’ list?

Here are a couple of mine to kick us off:

 

Charles Goodwin’s Professional Vision paper

I don’t think I’ve referred to any paper as much as this one. It’s about the way our professional training shapes the way we see the things around us. Shortly after reading this paper I was in the gym thinking about commonalities between the weight stacks and survey scales. I expect myself to be a certain relative strength, and when that doesn’t correlate with the place where I need to place my pin I’m a little thrown off.

It also has a deep analysis of the Rodney King verdict.

 

Revitalizing Chinatown Into a Heterotopia by Jia Lou

This article is based on a geosemiotic analysis of DC’s Chinatown. It is one of the articles that helped me to see that data really can come in all forms

 

After method: Mess in Social Science Research by John Law

This is the book that inspired this list. It also inspired this blog post.

 

On Postapocalyptic Research Methods and Failures, Honesty and Progress in Research

I’m reading a book that I like to call “post-apocalyptic research methodology.” It’s ‘After Method: Mess in Social Science Research’ by John Law. At this point the book reads like a novel. I can’t quite imagine where he’ll take his premise, but I’m searching for clues and turning pages. In the meantime, I’ve been thinking quite a bit about failure, honesty, uncertainty and humility in research.

How is the current research environment like a utopian society?

The research process is often idealized in public spaces. Whether the goal of the researcher is to publish a paper based on their research, present to an audience of colleagues or stakeholders about their research, or market the product of their research, all researchers have a vested interest in the smoothness of the research process. We expect to approach a topic, perform a series of time-tested methods or develop innovative new methods with strong historical traditions, apply these methods as neatly as possible, and end up with a series of strong themes that describe the majority of our data. However, in Law’s words “Parts of the world are caught in our ethnographies, our histories and our statistics. But other parts are not, and if they are then this is because they have been distorted into clarity.” (p. 2) We think of methods as a neutral middle step and not a political process, and this way of thinking allows us to focus on reliability and validity as surface measures and not inherent questions. “Method, as we usually imagine it, is a system for offering more or less bankable guarantees.” (p. 9)

Law points out that research methods are, in practice, very limited in the social sciences “talk of method still tends to summon up a relatively limited repertoire of responses.” (p. 3) Law also points out that every research method is inherently political. Every research method involves a way of seeing or a way of looking at the data, and that perspective maps onto the findings it yields. Different perspectives yield different findings, whether they are subtly or dramatically different. Law’s central assertion is that methods don’t just describe social realities but also help to create them. Recognizing the footprint of our own methods is a step toward better understanding our data and results.

In practice, the results that we focus on are largely true. They describe a large portion of the data, ascribing the rest of the data to noise or natural variation. When more of our data is described in our results, we feel more confident about our data and our analysis.

Law argues that this smoothed version of reality is far enough from the natural world that it should perk our ears. Research works to create a world that is simple and falls into place neatly and resembles nothing we know, “’research methods’ passed down to us after a century of social science tend to work on the assumption that the world is properly to be understood as a set of fairly specific, determinate, and more or less identifiable processes.” (p. 5) He suggests instead that we should recognize the parts that don’t fit, the areas of uncertainty or chaos, and the areas where our methods fail. “While standard methods are often extremely good at what they do, they are badly adapted to the study of the ephemeral, the indefinite and the irregular.” (p. 4). “Regularities and standardizations are incredibly powerful tools, but they set limits.” (p. 6)

Is the Utopia starting to fall apart?

The current research environment is a bit different from that of the past. More people are able to publish research at any stage without peer review using media like blogs. Researchers are able to discuss their research while it is in progress using social media like Twitter. There is more room to fail publicly than there ever has been before, and this allows for public acknowledgment of some of the difficulties and challenges that researcher’s face.

Building from ashes

Law briefly introduces his vision on p. 11 “My hope is that we can learn to live in a way that is less dependent on the automatic. To live more in and through slow method, or vulnerable method, or quiet method. Multiple method. Modest method. Uncertain method. Diverse method.”

Many modern discussions of about management talk about the value of failure as an innovative tool. Some of the newer quality control measures in aviation and medicine hinge on the recognition of failure and the retooling necessary to prevent or limit the recurrences of specific types of events. The theory behind these measures is that failure is normal and natural, and we could never predict the many ways in which failure could happen. So, instead of exclusively trying to predict or prohibit failure, failures should be embraced as opportunities to learn.

Here we can ask: what can researchers learn from the failures of the methods?

The first lesson to accompany any failure is humility. Recognizing our mistakes entails recognizing areas where we fell short, where our efforts were not enough. Acknowledging that our research training cannot be universal, that applying research methods isn’t always straightforward and simple, and that we cannot be everything to everyone could be an important stage of professional development.

How could research methodology develop differently if it were to embrace the uncertain, the chaotic and the places where we fall short?

Another question: What opportunities to researchers have to be publicly humble? How can those spaces become places to learn and to innovate?

Note: This blog post is dedicated to Dr Jeffrey Keefer @ NYU, who introduced me to this very cool book and has done some great work to bring researchers together

Methodology will only get you so far

I’ve been working on a post about humility as an organizational strategy. This is not that post, but it is also about humility.

I like to think of myself as a research methodologist, because I’m more interested in research methods than any specific area of study. The versatility of methodology as a concentration is actually one of the biggest draws for me. I love that I’ve been able to study everything from fMRI subjects and brain surgery patients to physics majors and teachers, taxi drivers and internet activists. I’ve written a paper on Persepolis as an object of intercultural communication and a paper on natural language processing of survey responses, and I’m currently studying migration patterns and communication strategies.

But a little dose of humility is always a good thing.

Yesterday I hosted the second in a series of online research, offline lunches that I’ve been coordinating. The lunches are intended as a way to get people from different sectors and fields who are conducting research on the internet together to talk about their work across the artificial boundaries of field and sector. These lunches change character as the field and attendees change.

I’ve been following the field of online research for many years now, and it has changed dramatically and continually before my eyes. Just a year ago Seth Grimes Sentiment Analysis Symposia were at the forefront of the field, and now I wonder if he is thinking of changing the title and focus of his events. Two years ago tagging text corpora with grammatical units was a standard midstep in text analysis, and now machine algorithms are far more common and often much more effective, demonstrating that grammar in use is far enough afield from grammar in theory to generate a good deal of error. Ten years ago qualitative research was often more focused on the description of platforms than the behaviors specific to them, and now the specific innerworkings of platform are much more of an aside to a behavioral focus.

The Association of Internet Researchers is currently having their conference in Denver (#ir14), generating more than 1000 posts per day under the conference hashtag and probably moving the field far ahead of where it was earlier this week.

My interest and focus has been on the methodology of internet research. I’ve been learning everything from qualitative methods to natural language processing and social network analysis to bayesian methods. I’ve been advocating for a world where different kinds of methodologists work together, where qualitative research informs algorithms and linguists learn from the differences between theoretical grammar and machine learned grammar, a world where computer scentists work iteratively with qualitative researchers. But all of these methods fall short because there is an elephant in the methodological room. This elephant, ladies and gentleman, is made of content. Is it enough to be a methodological specialist, swinging from project to project, grazing on the top layer of content knowledge without ever taking anything down to its root?

As a methodologist, I am free to travel from topic area to topic area, but I can’t reach the root of anything without digging deeper.

At yesterday’s lunch we spoke a lot about data. We spoke about how the notion of data means such different things to different researchers. We spoke about the form and type of data that different researchers expect to work with, how they groom data into the forms they are most comfortable with, how the analyses are shaped by the data type, how data science is an amazing term because just about anything could be data. And I was struck by the wide-openness of what I was trying to do. It is one thing to talk about methodology within the context of survey research or any other specific strategy, but what happens when you go wider? What happens when you bring a bunch of methodologists of all stripes together to discuss methodology? You lack the depth that content brings. You introduce a vast tundra of topical space to cover. But can you achieve anything that way? What holds together this wide realm of “research?”

We speak a lot about the lack of generalizable theories in internet research. Part of the hope for qualitative research is that it will create generalizable findings that can drive better theories and improve algorithmic efforts. But that partnership has been slow, and the theories have been sparse and lightweight. Is it possible that the internet is a space where theory alone just doesn’t cut it? Could it be that methodologists need to embrace content knowledge to a greater degree in order to make any of the headway we so desperately want to make?

Maybe the missing piece of the puzzle is actually the picture painted on the pieces?

comic

Planning a second “Online Research, Offline Lunch”

In August we hosted the first Online Research, Offline Lunch for researchers involved in online research in any field, discipline or sector in the DC area. Although Washington DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. These lunches provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives. We kept the first gathering small. But the enthusiasm for this small event was quite large, and it was a great success! We had interesting conversations, learned a lot, made some valuable connections, and promised to meet again.

Many expressed interest in the lunches but weren’t able to attend. If you have any specific scheduling requests, please let me know now. Although I certainly can’t accommodate everyone’s preferences, I will do my best to take them into account.

Here is a form that can be used to add new people to the list. If you’re already on the list you do not need to sign up again. Please feel free to share the form with anyone else who may be interested:

 

Data science can be pretty badass, but…

Every so often I’m reminded of the power of data science. Today I attended a talk entitled ‘Spatiotemporal Crime Prediction Using GPS & Time-tagged Tweets” by Matt Gerber of the UVA PTL. The talk was a UMD CLIP event (great events! Go if you can!).

Gerber began by introducing a few of the PTL projects, which include:

  • Developing automatic detection methods for extremist recruitment in the Dark Net
  • Turning medical knowledge from large bodies of unstructured texts into medical decision support models
  • Many other cool initiatives

He then introduced the research at hand: developing predictive models for criminal activity. The control model in this case use police report data from a given period of time to map incidents onto a map of Chicago using latitude and longitude. He then superimposed a grid on the map and collapsed incidents down into a binary presence vs absence model. Each square in the grid would either have one or more crimes (1) or not have any crimes (-1). This was his training data. He built a binary classifier and then used logistic regression to compute probabilities and layered a kernel density estimator on top. He used this control model to compare with a model built from unstructured text. The unstructured text consisted of GPS tagged Twitter data (roughly 3% of tweets) from the Chicago area. He drew the same grid using longitude and latitude coordinates and tossed all of the tweets from each “neighborhood” (during the same one month training window) into the boxes. Then, using essentially a one box=one document for a document based classifier, he subjected each document to topic modeling (using LDA & MALLET). He focused on crime related words and topics to build models to compare against the control models. He found that the predictive value of both models was similar when compared against actual crime reports from days within the subsequent month.

This is a basic model. The layering can be further refined and better understood (there was some discussion about the word “turnup,” for example). Many more interesting layers can be built into it in order to improve its predictive power, including more geographic features, population densities, some temporal modeling to accommodate the periodic nature of some crimes (e.g. most robberies happen during the work week, while people are away from their homes), a better accommodation for different types of crime, and a host of potential demographic and other variables.

I would love to dig deeper into this data to gain a deeper understanding of the conversation underlying the topic models. I imagine there is quite a wealth of deeper information to be gained as well as a deeper understanding of what kind of work the models are doing. It strikes me that each assumption and calculation has a heavy social load attached to it. Each variable and each layer that is built into the model and roots out correlations may be working to reinforce certain stereotypes and anoint them with the power of massive data. Some questions need to be asked. Who has access to the internet? What type of access? How are they using the internet? Are there substantive differences between tweets with and without geotagging? What varieties of language are the tweeters using? Do classifiers take into account language variation? Are the researchers simply building a big data model around the old “bad neighborhood” notions?

Data is powerful, and the predictive power of data is fascinating. Calculations like these raise questions in new ways, remixing old assumptions into new correlations. Let’s not forget to question new methods, put them into their wider sociocultural contexts and delve qualitatively into the data behind the analyses. Data science can be incredibly powerful and interesting, but it needs a qualitative and theoretical perspective to keep it rooted. I hope to see more, deeper interdisciplinary partnerships soon, working together to build powerful, grounded, and really interesting research!

 

Rethinking Digital Democracy- More reflections from #SMSociety13

What does digital democracy mean to you?

I presented this poster: Rethinking Digital Democracy v4 at the Social Media and Society conference last weekend, and it demonstrated only one of many images of digital democracy.

Digital democracy was portrayed at this conference as:

having a voice in the local public square (Habermas)

making local leadership directly accountable to constituents

having a voice in an external public sphere via international media sources

coordinating or facilitating a large scale protest movement

the ability to generate observable political changes

political engagement and/or mobilization

a working partnership between citizenry, government and emergency responders in crisis situations

a systematic archival of government activity brought to the public eye. “Archives can shed light on the darker places of the national soul”(Wilson 2012)

One presenter had the most systematic representation of digital democracy. Regarding the recent elections in Nigeria, he summarized digital democracy this way: “social media brought socialization, mobilization, participation and legitimization to the Nigerian electoral process.”
Not surprisingly, different working definitions brought different measures. How do you know that you have achieved digital democracy? What constitutes effective or successful digital democracy? And what phenomena are worthy of study and emulation? The scope of this question and answer varies greatly among some of the examples raised during the conference, which included:

citizens in the recent Nigerian election

citizens who tweet during a natural disaster or active crisis situation

citizens who changed the international media narrative regarding the recent Kenyan elections and ICC indictment

Arab Spring actions, activities and discussions
“The power of the people of greater than the people in power” a perfect quote related to Arab revolutions on a slide from Mona Kasra

the recent Occupy movement in the US

tweets to, from and about the US congress

and many more that I wasn’t able to catch or follow…

In the end, I don’t have a suggestion for a working definition or measures, and my coverage here really only scratches the surface of the topic. But I do think that it’s helpful for people working in the area to be aware of the variety of events, people, working definitions and measures at play in wider discussions of digital democracy. Here are a few question for researchers like us to ask ourselves:

What phenomenon are we studying?

How are people acting to affect their representation or governance?

Why do we think of it as an instance of digital democracy?

Who are “the people” in this case, and who is in a position of power?

What is our working definition of digital democracy?

Under that definition, what would constitute effective or successful participation? Is this measurable, codeable or a good fit for our data?

Is this a case of internal or external influence?

And, for fun, a few interesting areas of research:

There is a clear tension between the ground-up perception of the democratic process and the degree of cohesion necessary to affect change (e.g. Occupy & the anarchist framework)

Erving Goffman’s participant framework is also further ground for research in digital democracy (author/animator/principal <– think online petition and e-mail drives, for example, and the relationship between reworded messages, perceived efficacy and the reception that the e-mails receive).

It is clear that social media helps people have a voice and connect in ways that they haven’t always been able to. But this influence has yet to take any firm shape either among researchers or among those who are practicing or interested in digital democracy.

I found this tweet particularly apt, so I’d like to end on this note:

“Direct democracy is not going to replace representative government, but supplement and extend representation” #YES #SMSociety13

— Ray MacLeod (@RayMacLeod) September 14, 2013

 

 

Reflections on Digital Dualism & Social Media Research from #SMSociety13

I am frustrated by both Digital Dualism and the fight against Digital Dualism.

Digital dualism is the belief that online and offline are different worlds. It shows up relatively harmlessly when someone calls a group of people who are on their devices “antisocial,” but it is much more harmful in the way it pervades the language we use about online communication (e.g. “real” vs. “virtual”).

Many researchers have done important work countering digital dualism. For example, at the recent Social Media & Society conference, Jeffrey Keefer briefly discussed his doctoral work in which he showed that the support that doctoral students offered each other online was both very real and very helpful. I think it’s a shame that anyone ever doubted the power of a social network during such a challenging time, and I’m happy to see that argument trounced! Wooooh, go Jeffrey! (now a well-deserved Dr Keefer!)

Digital dualism is a false distinction, but it is built in part on a distinction that is also very real and very important. Online space and offline spare are different spaces. People can act in either to achieve their goals in very real ways, but, although both are very real, they are very different. The set of qualities with which the two overlap and differ and even blur into each other changes every day. For example, “real name” branding online and GPS enabled in-person gaming across college campuses continue to blur boundaries.

But the private and segmented aspects of online communication are important as well. Sometimes criticism of online space is based on this segmentation, but communities of interest are longstanding phenomena. A book club is expected to be a club for people with a shared interest in books. A workplace is a place for people with shared professional interests. A swim team is for people who want to swim together. And none of these relationships would be confused with the longstanding close personal relationships we share with friends and family. When online activities are compared with offline ones, often people are falsely comparing interest related activities online with the longstanding close personal ties we share with friends and family. In an effort to counter this, some have take moves to make online communication more unified and holistic. But they do this at the expense of one of the greatest strengths of online communication.

Let’s discuss my recent trip to Halifax for this conference as an example.

My friends and family saw this picture:

Voila! Rethinking Digital Democracy! More of a "Hey mom, here's my poster!" shot than a "Read and engage with my argument!" shot

Voila! Rethinking Digital Democracy! More of a “Hey mom, here’s my poster!” shot than a “Read and engage with my argument!” shot

My dad saw this one:

Not bad for airport fare, eh?

Not bad for airport fare, eh?

This picture showed up on Instagram:

2013-09-16 15.27.43

It’s a glass wall, but it looks like water!

People on Spotify might have followed the music I listened to, and people on Goodreads may have followed my inflight reading.

My Twitter followers and those following the conference online saw this:

Talking about remix culture! Have I landed in heaven? #SMSociety13 #heaveninhalifax #niiice

— Casey Langer Tesfaye (@FreeRangeRsrch) September 15, 2013

And you have been presented with a different account altogether

This fractioning makes sense to me, because I wouldn’t expect any one person to share this whole set of interests. I am able to freely discuss my area of interest with others who share the same interests.

Another presenter gave an example of LGBT youth on Facebook. The lack of anonymity can make it very hard for people who want to experiment or speak freely about a taboo topic to do so without it being taken out of context. Private and anonymous spaces that used to abound online are increasingly harder to find.

In my mind this harkens back a little to the early days of social media research, when research methods were deeply tied to descriptions of platforms and online activity on them. As platforms rose and fell, this research was increasingly useless. Researchers had to move their focus to online actions without trying to route them in platform or offline activity. Is social media research being hindered in similar ways, by answering old criticisms instead of focusing on current and future potential?  Social media needs to move away from these artificial roots. Instead of countering silly claims about social media being antisocial or anything more than real communication, we should focus our research activities on the ways in which people communicate online and the situated social actions and behaviors in online situations. This means, don’t try to ferret out people from usernames, or sort out who is behind a username. Don’t try to match across platforms. Don’t demand real names.

Honestly, anyone who is subjected to social feeds that contain quite a bit of posts outside their area of interest should be grateful to refocus and move on! People of abstract Instagram should be thrilled not to have seen a bowl of seafood chowder, and my family and friends should be thrilled not to have to hear me ramble on about digital dualism or context collapse!

I would love to discuss this further. If you’ve been waiting to post a comment on this blog, this is a great time for you to jump in and join the conversation!

Language use & gaps in STEM education

Today our microanalytic research group focused on videos of STEM education.

 

Watching STEM classes reminds me of a field trip a fellow researcher and I took to observe a physics class that used project based learning. Project based learning is a more hands on and collaborative teaching approach which is gaining popularity among physics educators as an alternative to traditional lecture. We observed an optics lab at a local university, and after the class we spoke about what we had observed. Whereas the other researcher had focused on the optics and math, I had been captivated by the awkwardness of the class. I had never envisioned the PJBL process to be such an awkward one!

 

The first video that we watched today involved a student interchangeably using the terms chart and graph and softening their use with the term “thing.” There was some debate among the researchers as to whether the student had known the answer but chosen to distance himself from the response or whether the student was hedging because he was uncertain. The teacher responded by telling the student not to talk about things, but rather to talk to her in math terms.

 

What does it mean to understand something in math? The math educators in the room made it clear that a lack of the correct terminology signaled that the student didn’t necessarily understand the subject matter. There was no way for the teacher to know whether the student knew the difference between a chart and a graph from their use of the terms. The conversation on our end was not about the conceptual competence that the student showed. He was at the board, working through the problem, and he had begun his interaction with a winding description of the process necessary (as he imagined it) to solve the problem. It was clear that he did understand the problem and the necessary steps to solve it on some level (whether correct or not), but that level of understanding was not one that mattered.

 

I was surprised at the degree to which the use of mathematical language was framed as a choice on the part of the student. The teacher asked the student to use mathematical language with her. One math educator in our group spoke about students “getting away with fudging answers.” One researcher said that the correct terms “must be used,” and another commented about the lack of correct terms as indication that the student did “not have a proper understanding” of the material. All of this talk seems to bely the underlying truth that the student chose to use inexact language for a reason (whether social or epistemic).

 

The next video we watched showed a math teacher working through a problem. I was really struck by her lack of enthusiasm. I noticed her sighs, her lack of engagement with the students even when directly addressing them, and her tone when reading the problem from the textbook. Despite her apparent lack of enthusiasm, her mood appeared considerably brighter when she finished working through the problem. I found this interesting, because physics teachers usually report that their favorite part of their job is watching the students’ “a-ha!” moments. Maybe the rewards of technical problem solving are a motivator for both students and teachers alike? But the process of technical problem solving itself is rarely as motivating.

 

All of this leads me to one particularly interesting question. How do people in STEM learning settings distance themselves from the material? What discursive tools do they use? Who uses these discursive tools? And does the use of these tools change over time? I wonder in particular whether discursive distancing, which often parallels female discursive patterns, is more common among females than males as they progress through their education? Is there any kind of quantitative correlate to the use of discursive distancing? Is it more common among people who believe that they aren’t good at STEM? Is discursive distancing less common among people who pursue STEM careers? Is there a correlation between distancing and test scores?

 

Awkwardness in STEM education is fertile ground for qualitative researchers. To what extent is the learning or solving process emphasized and to what extent is the answer valued above all else? How is mathematical language socialized? The process of solving technical problems is a messy and uncomfortable one. It rarely goes smoothly, and in fact challenges often lead to more challenges. The process of trying and failing or trying and learning is not a sexy or attractive one, and there is rampant concern that focusing on the process of learning robs students of the ability to demonstrate their knowledge in a way that matters to people who speak the traditional languages of math and science.

 

We spoke a little about the phenomena of connected math. It sounds to me very closely parallel to project based learning initiatives in physics. I was left wondering why such a similar teaching process could be valued as a teaching tool for all students in one field and relegated to a teaching tool for struggling students in another neighboring field. I wonder about the similarities and differences between the outcomes of these methods. Much of this may rest on politics, and I suspect that the politics are rooted in deeply held and less questioned beliefs about STEM education.

 

STEM education initiatives have grown quite a bit in recent years, and it’s clear that there is quite a bit of interesting research left to be done.

Upcoming DC Event: Online Research Offline Lunch

ETA: Registration for this event is now CLOSED. If you have already signed up, you will receive a confirmation e-mail shortly. Any sign-ups after this date will be stored as a contact list for any future events. Thank you for your interest! We’re excited to gather with such a diverse and interesting group.

—–

Are you in or near the DC area? Come join us!

Although DC is a great meeting place for specific areas of online research, there are few opportunities for interdisciplinary gatherings of professionals and academics. This lunch will provide an informal opportunity for a diverse set of online researchers to listen and talk respectfully about our interests and our work and to see our endeavors from new, valuable perspectives.

Date & Time: August 6, 2013, 12:30 p.m.

Location: Near Gallery Place or Metro Center. Once we have a rough headcount, we’ll choose an appropriate location. (Feel free to suggest a place!)

Please RSVP using this form:

Representativeness, qual & quant, and Big Data. Lost in translation?

My biggest challenge in coming from a quantitative background to a qualitative research program was representativeness. I came to class firmly rooted in the principle of Representativeness, and my classmates seemed not to have any idea why it mattered so much to me. Time after time I would get caught up in my data selection. I would pose the wider challenge of representativeness to a colleague, and they would ask “representative of what? why?”

 

In the survey research world, the researcher begins with a population of interest and finds a way to collect a representative sample of the population for study. In the qualitative world that accompanies survey research units of analysis are generally people, and people are chosen for their representativeness. Representativeness is often constructed by demographic characteristics. If you’ve read this blog before, you know of my issues with demographics. Too often, demographic variables are used as a knee jerk variable instead of better considered variables that are more relevant to the analysis at hand. (Maybe the census collects gender and not program availability, for example, but just because a variable is available and somewhat correlated doesn’t mean that it is in fact a relevant variable, especially when the focus of study is a population for whom gender is such an integral societal difference.)

 

And yet I spent a whole semester studying 5 minutes of conversation between 4 people. What was that representative of? Nothing but itself. It couldn’t have been exchanged for any other 5 minutes of conversation. It was simply a conversation that this group had and forgot. But over the course of the semester, this piece of conversation taught me countless aspects of conversation research. Every time I delved back into the data, it became richer. It was my first step into the world of microanalysis, where I discovered that just about anything can be a rich dataset if you use it carefully. A snapshot of people at a lecture? Well, how are their bodies oriented? A snapshot of video? A treasure trove of gestures and facial expressions. A piece of graffiti? Semiotic analysis! It goes on. The world of microanalysis is built on the practice of layered noticing. It goes deeper than wide.

 

But what is it representative of? How could a conversation be representative? Would I need to collect more conversations, but restrict the participants? Collect conversations with more participants, but in similar contexts? How much or how many would be enough?

 

In the world of microanalysis, people and objects constantly create and recreate themselves. You consistently create and recreate yourself, but your recreations generally fall into a similar range that makes you different from your neighbors. There are big themes in small moments. But what are the small moments representative of? Themselves. Simply, plainly, nothing more and nothing else. Does that mean that they don’t matter? I would argue that there is no better way to understand the world around us in deep detail than through microanalysis. I would also argue that macroanalysis is an important part of discovering the wider patterns in the world around us.

 

Recently a NY Times blog post by Quentin Hardy has garnered quite a bit of attention.

Why Big Data is Not Truth: http://bits.blogs.nytimes.com/2013/06/01/why-big-data-is-not-truth/

This post has really struck a chord with me, because I have had a hard time understanding Hardy’s complaint. Is big data truth? Is any data truth? All data is what it is; a collection of some sort, collected under a specific set of circumstances. Even data that we hope to be more representative has sampling and contextual limitations. Responsible analysts should always be upfront about what their data represents. Is big data less truthful than other kinds of data? It may be less representative than, say, a systematically collected political poll. But it is what it is: different data, collected under different circumstances in a different way. It shouldn’t be equated with other data that was collected differently. One true weakness of many large scale analyses is the blindness to the nature of the data, but that is a byproduct of the training algorithms that are used for much of the analysis. The algorithms need large training datasets, from anywhere. These sets often are developed through massive web crawlers. Here, context gets dicey. How does a researcher represent the data properly when they have no idea what it is? Hopefully researchers in this context will be wholly aware that, although their data has certain uses, it also has certain [huge] limitations.

 

I suspect that Hardy’s complaint is with the representations of massive datasets collected from webcrawlers as a complete truth from which any analyses could be run and all of the greater truths of the world could be revealed. On this note, Hardy is exactly right. Data simply is what it is, nothing more and nothing less. And any analysis that focuses on an unknown dataset is just that: an analysis without context. Which is not to say that all analyses need to be representative, but rather that all responsible analyses of good quality need to be self aware. If you do not know what the data represents and when and how it was collected, then you cannot begin to discuss the usefulness of any analysis of it.