World-building, big news and cherry blossoms

Something big has happened since my last blog post! In December 2025, I transitioned from the World Building phase of business development to launching my business. Welcome to the Community Stories and Conversation Project (TCSCP)! We host the TCSCP (pronounced “Talk Soup”) Network, which is both a supportive collaboration hub for independent researchers and a one-stop shop for people looking to engage research services. This network represents a shift in the accessibility of research services and a space with great potential to empower and support individual researchers and small businesses. I could not be more proud!

The network launched with the help of our twenty founding members, ensuring from its onset that it has some of the best minds in the business coming together to collaborate in our changing field, and we have been gaining members and collecting requests for services since our launch. We offer a wide range of services, including:

 • surveys • data analysis • focus groups • program evaluation •

 • data dashboards and strategy • project management •

• social media analysis • spreadsheet assistance • participant recruitment •

• proposal assistance • translation, adaptation, and translation evaluation •

• employee and customer satisfaction • market research and strategy •

• web usability ui/ux studies •

• multilingual, multicultural, and multinational •

 • leadership, training, mentorship, and coaching •

Please take some time to stop by our website, learn more about the network, join or request services, support us at Buy Me a Coffee, or have a conversation with me about the network.

We have survived so much in the last several years. This effort is part of a new era of growth, like cherry blossoms bursting forth after a cold winter. This is something to believe in. Together we will help shape the future of the research industry!

Endings, transitions and beginnings

This year has been one of heavy contradictions for me. It brought an end to 30 consecutive years of working in research in a structured 9-to-5 environment in offices or remotely for organizations, but it also brought so many unexpected opportunities and new beginnings. 

At the outset of the year, as my industry came under increasing threats of rapid cuts and dramatic changes, I was hungry to use my skills and life experiences in a different kind of way to affect those who were caught under the wheels of the rapid federal changes. A plan for a community conversation series seemed almost delivered to me through a series of flashbacks and revelations during an intense two-week period. Shortly afterward, I began developing partnerships and hosting these cathartic events. When I lost my job as a federal contractor amidst another flurry of cuts to contracts and personnel (the “April Fools RIFs to HHS), I was able to devote more time to the series.

These Community Conversation events provided a space for difficult conversations around the impact of the cuts and changes, as well as a way to learn and practice grounding techniques for managing anxiety, hold empowering discussions reenvisioning the support landscape for those affected, and share in soothing meditations. We left these spaces feeling heard, better connected, and more relaxed and restored.

Early in this journey, I was interviewed by another community advocate.

https://www.youtube.com/embed/hJKpDYUbPt0?si=YZpizdPYm73jgwk-

By the end of November, I had conducted 15 Community Conversation events and three other career transition events, including a Careerchangeapalooza that proudly featured Career Change guru Rishan Mohammed of HiringCoach.ai. Some of the Community Conversations evolved into a youth-driven theme of Multigenerational Conversations about Mental Health and Wellness, and one event was more of a large-scale discussion forum with 10 breakout rooms. I led these events through partnerships with Transfiguration Parish, DC-AAPOR, AAPOR, and The Salt Sanctuary of MD, and as self-hosted events at local libraries and online. This work led to other opportunities I never would have imagined: co-leading a peer support group with a former FDA client, leading a weekly meditation series with the Salt Sanctuary and partnering with Brook Grove Retirement Community, where my daughter and I led weekly meditations, imagination sessions, and focus groups, and held countless conversations with residents and those in the rehab facility.

This was also a time for pro bono work, as I led and contributed to several qualitative studies in service of various partnerships and helped to prepare a statewide listening campaign on behalf of a consortium of local community advocate groups.

I felt deeply connected to my research and professional communities throughout this time. I joined MAFN, which turned out to be an amazingly supportive professional community, from monthly in-person networking events to online communities of practice. I joined the MRX PROs, with weekly sessions, discussion and camaraderie. I participated in AAPOR and DC-AAPOR events and attended the AAPOR conference with the help of colleagues. I learned more about the job-hunting landscape through the Insights Career Network. I met with countless peers in one-on-one networking sessions, learning about the passions and challenges of my colleagues and envisioning future collaborations. All of this happened against the backdrop of my unemployment. This morning marked the end of an era for me, as I attended my last mandatory unemployment session.

This period also led to something new and quite exciting! In September, I founded an LLC that is set to launch next week!

The coming year will be different. Some of these partnerships and Community Conversation events will continue, and a couple of new partnerships are on the horizon. The business will bloom and grow as a collective, and I’ll grow as a business owner through a business incubator program called Founders Rising!, and I’ll trade pro-bono work for paid consulting work. But this new year will be built on the foundation of a creative, supportive, challenging and transformational time unparalleled in my professional career. I’ve shared tears and laughs and intellectual excitement and so much more with my community members, colleagues, and friends and family this year, and more than anything, I feel so much gratitude to be at this particular point.

The COVID pandemic and lockdown brought another transformational period for so many of us, and we are still reckoning with its aftermath. The aftermath of this year will also linger. But may we continue to build on this new foundation to elevate each other through whatever challenges come our way in the future, stronger- as always, together!

Where the magic happens

The key to getting things done is doing them on your own terms.

This is a motto of mine; words I live by. A commute can be a pain, but what if I left a little earlier, took the scenic route, drank some good coffee and listened to a good book on the way? Dishes can be a pia, but with music? What if I focus on the bubbles?

As a researcher, I am often motivated by the power of noticing. As a moderator, this means providing space for the quietest voices to blossom. As an analyst, this means taking the time and care to represent all of the voices, not just the loudest or most eloquent.

I’ve often taken pride in my invisibility as a facilitator. I feel like I’ve done my job when I’m barely noticed, but the tone is set, the participants are at ease and the conversation stays on track through the subtlest of prompts and cues.

Today I’m kayaking. I enjoy racing through the water, but when I stop, I see birds hidden in tall grass, fish jumping, and the almost magical pops of light reflecting on water and trees.

My community work has also followed this model; amplifying quiet voices, endorsing those who seem tentative but I know to be insightful. Noticing.

This is my way of working, living and interacting in the world, and this is what drives me to do the work I’m doing now.

I have a voice that I have never hesitated to use. But I’ve learned that the world comes alive around me when I choose to observe. I trust that the same voice I use to advocate for others is well practiced and fully available when I need it, and with that trust I can fade back.

I’m in a transformative moment. I’m deciding what I want to build and that requires repeatedly doubling back to my principles. What do I stand for? What do I provide as naturally as I breathe or paddle? Who am I without institutional backing, when I’m free to create?

There is a large exodus in my field; people who have lost jobs and are beginning consultancies. For some, the path may feel more clear than for others. How can we support each other better? Connect more? Collaborate more? Grow stronger together? Are we all adrift? Could we paddle together?

When I finished my paddle today, I pulled onto shore and a park employee greeted me. I saw poop on both sides of the kayak and tried to point it out to her. She didn’t see me or hear me. She appeared to have already decided my words didn’t merit her attention. ‘Watch where you step!” I shouted, after a few attempts, and then watched her croc’d foot come down in a large pile of poop. In this world of paddlers, where we all sit under the same blanket of sky and listen to the sounds of birds that live freely amongst and between us, I choose to listen, to observe, to hear, to find pockets of magic and to step in poop as little as possible.

What is the real product here?

I was recently talking with a friend about my community conversation series. She told me that the real value in the sessions was in the data produced. I was shocked! Do community conversations produce data?

I have to say; this ruffled my feathers. The intention behind the session was always one of self expression, forming or reinforcing connections between people, fostering healing and resilience, and building community. If they were intended for data collection, I would have instituted a consent process and considered inviting ethical review. Data collection has a very different connotation in my field, and these are community based advocacy, not focus groups!

But I’ve been ruminating further on her words. Coming out of these sessions, there’s a clearer sense of what people are experiencing, how they are coping and what kinds of resources would be helpful to better support them at this time. And honestly, for any group that knows that some members are suffering, these are important outputs.

Are they data? No. Insights? No. Traditionally, they are none of these things. But they do provide valuable and necessary information that can be built upon to build better support systems and structures.

I’ve heard anecdotally from many groups of people affected by the sweeping government changes that they want to know what’s going on with their members and how to support them. I honestly believe that these community conversations are the answer to that; allowing both an opportunity to support people and an opportunity to explore a path forward through the chaos.

The value is on both one-off sessions and in repeated sessions within the same community. My mission is to build them in such a way that groups and people can benefit. It’s a slow process, as I figure out how to meet people where they are, and I’m always open to advice or interest!

Interesting in joining a session or getting involved?

Here is the mailing list:

https://docs.google.com/forms/d/e/1FAIpQLSfrIlrlCJzm5E4ahoR_JOh6E-KaB1nbGyJ2SmdQqKL99JHrOQ/viewform?usp=sharing&ouid=114493619372705360657

Here is the next online session:

https://www.eventbrite.co.uk/e/1382445414449?aff=oddtdtcreator

Navigating Career Changes from the Inside Out

Years ago on this blog, I wrote about approaching career changes from the inside out. I had accomplished the biggest career change of my life that way, by following my passions with my books, talks and extra research sessions and then blogging about them here.

Last week at the annual AAPOR conference in St Louis, an attendee in a session about Navigating Career Change asked about feeling unsatisfied with their work. This is a common motivation for switching jobs, but I chimed in from the audience as a voice of caution.

“I think about it like an unscratched itch,” I advised. “Maybe there is some part of you that your work life isn’t satisfying. But this is a horrible time to switch jobs so I advise you instead to find other ways to scratch that itch. You may still decide you’re ready for a change, but if, for example, you decide that you really do need your job to offer more space for creativity, you now have recent experiences to speak to as examples of you pursuing your creative endeavors.”

We expect our jobs to be our calling, our everything. And we give them everything. But we are so many things, and we need to exist beyond our work.

I’m at another point of career change. After nearly 30 years of working in my career- with the longest break being 3 weeks of maternity leave, I’ve lost my job as a government contractor as part of the deep federal cuts. I could look directly for another position, but I want to take my time with it. We only live once, and I want to take inventory of all of my itches before deciding how to scratch.

I want to build out this Community Conversations initiative, but I want to be thoughtful about it. I’m not trying to recreate what others have done, so much as build something new that fits our current needs. This requires intuition, reflection, patience, resilience, and determination. It means that some days are among the most fulfilling of my life, hosting cathartic community sessions or having really inspiring conversations with friends and colleagues, and some days I wonder why I’m adrift instead of staying on the career path.

This initiative was founded from the inside out, reflecting 

  1. my facilitation skills that I’ve learned through years of moderating, facilitating and community work, 
  2. my passion for building mental health that was cultivated through voices like Iyanla Vanzant, Pema Chodron, Rachel Cargle and the Nap Ministry,
  3. my profound interest in community based participatory research and the principles that guide it, and
  4. my love of strategic conversations, brainstorming and forming new ways to approach problems.

Forming an initiative from the inside out means that guiding our next steps is a continual process of self reflection. This means that a day spent at one of my favorite art galleries, taking pictures that I may able to use for an exhibit of my own one day, getting lost in the woods on its campus and finding new ways to engage with my surroundings is just as important as a day spent documenting the plan for the initiative, including the financial and communication aspects.

I always imagined my life in chapters, with a later chapter as a more wholeheartedly creative era. And I love the creativity I’m feeling now! But a change so dramatic as this requires some careful stewardship and navigation.

I’m not really sure where any of this is headed, but I’m confident that just as when I recreated my life before, these steps will lead me in the right direction. Because I’m scratching my itches!

Have you navigated big changes like this? Do you have unscratched itches? Do you have any advice or resources to offer? Please comment! Let’s continue the conversation.

Picture taken by me, in the grounds of the Glenstone Museum in Potomac, MD 5/22/2025

Something to believe in

At the beginning of Life of Pi, the main character says he has a story to make you believe in God. Does he? I suppose that depends on your belief system, but the story made for a great book and a gorgeous movie.

I am not someone who believed that things happen for a reason. It might have been true, but the potential truth of it offered no comfort to me. But this year has shaken my doubt. Let me tell you my story.

You know me as a researcher, fundamentally and to my core. I’ve been working in research since 1996, and it has been an adventure, a challenge, and a great love. I’ve had the pleasure of working on fMRI research in its early days, working in Neuropsychology departments at fantastic hospitals, getting to know the nonprofit space, doing research on and in the global and academic Physics and Astronomy communities, doing Usability studies in people’s homes and using eye tracking tools, working across a number of languages on study recruitment materials, working in HIV prevention and treatment, evaluating health communications materials and working with communities to cocreate research studies that serve them.

Oh, the places research can take you! The interest work! The amazing people I’ve met along the way!

This year, things began to change pretty dramatically for the research community around me. We’ve been seeing respected professionals and institutional studies let go and dismissed on a massive scale.

We all want to help support each other and the field in some way, but we’re being stripped of our collective voice. I began to obsess over what I could uniquely offer to help. For a solid week, the topic was omnipresent for me. I thought immediately of the community conversations I’ve led occasionally with my church community. But what could these conversations look like? The answer came in flashes from every corner of my memory. Things strung together in a way I never could have imagined prior.

I’m hearing about the stress and distress we’re suffering through, and I thought of the grounding activities I’ve honed with someone very close to me who’s been battling severe anxiety and depression. I thought of the community gatherings that Iyanla Vanzant used to host on Saturday mornings and the grounding exercises she taught. I thought of my love of meditation and the methods that have been useful to me. I thought of the Nap Ministry and the idea of restoring people to their optimum humanity. I thought of Rachel Cargle and Adrienne Marie Brown and their teaching about the importance of imagination and play.

I wanted to use my qualitative research skills, experience with facilitation, and these principles to create something new, grounded in principles of Community Based Participatory Research.

The goal is to create a space for people to listen and be heard, to heal and to learn healing skills, to dream of a different future and to understand what the community needs and how the community can best support each other. The sessions can be singular for a group or they can become a community building and nurturing series for that group.

I began collecting resources and developed a resource sheet for participants, and I developed a discussion guide that asked few questions and allowed mostly for listening, discussion, and progressive relaxation. These resources are intended to be flexible enough to work with any group.

The next step was to find communities who were interested. To date, I have conducted groups with my church community and a local professional group. The groups blew my mind. People entered with strong emotions, listened and supported each other, relaxed to the point of smiling and laughing and spoke about supporting each other and building community. The groups were very different from each other, each becoming what it needed to be. One group opted to make this into a series, with a second session planned for later this month. The other left me with a full page of ideaa that our professional councils can bring to fruition.

As the communities around me are increasingly affected, I’ve wanted to focus on expanding- but it’s been difficult to balance with a full time, intense job. Well, dear readers, after seeing the last of my clients RIF’d last week I was laid off this week. For me, this was a gift, because this community conversation series is my passion and my purpose.

In the coming weeks, I’m going to focus on ways to find more groups to facilitate, online and in-person. I’m working on designs as well, to raise funds for the project in some kind of way.

How can you help? If you’re interested, you’re welcome. I need support in locating and planning groups, developing a funding strategy and a plan for the merch. Let’s work together to build community, restore peace and purpose, and support and listen to each other.

Thank you for listening ❤️

“All that you touch
You Change.

All that you Change
Changes you.

The only lasting truth
is Change.

God
is Change.”


Octavia E. Butler

The Role of Gratitude in Research

Research, as most things in life, is best approached with gratitude. In this post, I’ll share a bit about what I’m grateful for, an exercise in gratitude, and some food for thought about the role of gratitude in research.

First, here is a window into what I’m feeling grateful for.

Grateful for the challenge of research

Research can provide a challenging career. While it is possible to find positions in research that are more repetitive, most positions afford many opportunities for learning about new subject matter and new methods. Each new research question provides fresh challenges to implement. And with the body of literature and informal sources available, there is always the ability to read more deeply about the work that others have done. I am grateful for the perpetual learning experiences that research has brought.

Grateful for the versatility of research

One of my favorite aspects of a career in research is the versatility. I’ve been able to work in neuropsychology, physics education, sociolinguistics, social media research, media measurement and in public health using a great variety of research methods.

Grateful for my colleagues

Over the years I’ve had the pleasure of working with people that I respect, learn from and genuinely enjoy. I’m grateful for their help, their wisdom, their curiosity, their enthusiasm, their support, their friendship, and their comforting awkwardness.

Gratitude for the research opportunities

I am grateful for the opportunity to study people. I am grateful for the people who agree to participate in research and who honestly share what is in their hearts or on their minds. Some opinions and experiences are easier to share than others. I am grateful for all of it. The qualitative work that I am currently involved with is often built on individual and group interviews that can be a powerful experience for the participant and the interviewer, and I am so grateful to the participants and the process for bringing this to fruition.

 

Now, let’s take a minute to Go Beyond the Gush. It is easy to get swept up in the everyday grind of research, whether because the research approval process seems unnecessarily repetitive or cumbersome, or data needs more wrangling than predicted, or the meetings seem endless and the emails, texts and phone calls seem constant, or the people working on a project are particularly difficult to corral, or the behavior that you need to observe in your research is particularly difficult to isolate, or… We can all get caught in the slog of research. But gratitude can help.

Here is an exercise:

Let’s take a minute to get very basic with this. First, think of the reasons why you enjoy your work. Then let’s take it back even further.

  • Be grateful to have a topic to research or to have the ability to find one. Be grateful for the ability to be curious and to find unanswered questions.
  • Be grateful to have the support to pursue this topic as a professional or as a student. Research costs time, money and many other resources.
  • Be grateful to have the skills to approach the topic. Think of all of the training that provided these skills. Think of the resources that are available to you to help you learn what you need.
  • Be grateful for your strength. You have the ability to tackle what comes your way.
  • Be grateful for the people who must come together to make this work happen. Sometimes we get stuck thinking of one person’s habits or quirks or in finding fault with the people around us. Some groups are more cohesive than others, and each person brings a different set of skills. Take a step back from that. Let go of it for a minute and take a fresh look. First see yourself as someone with strengths and weaknesses. Then see your colleagues in this light as well. Allow yourself to forgive yourself and others.
  • Be grateful for the challenges your work brings. Sometimes it seems to bring too many challenges. But those challenges are keeping you sharp. And in some way, they will offer you the opportunity to learn and grow.
  • Be grateful for research participants. These are the people who make our work possible by letting us into their world in some way. That is a privilege.

 

What do exercises like this gain you? A few things, really. Peace of mind. A break from the stress and an opportunity to just feel grateful. Perspective. A chance to put challenges that seem constant or insurmountable into a smaller box. The opportunity to see the people around us from a fresh perspective and hear them more clearly. A better insulation against the instability that affects us all. And an opportunity to see our research in context and think more broadly about the affect it has. The work we do affects peoples’ lives, but these basic mechanisms can become lost to us when we lose perspective. With fresh perspective and gratitude, we can better see these mechanisms in action and produce work that better respects all involved. No research exists in a vacuum, and the better we can understand the role our research plays in a wider context the better stewards we can be over this tremendous privilege we’ve been granted.

Thanks for listening.

The surprising unpredictability of language in use

This morning I recieved an e-mail from an international professional association that I belong to. The e-mail was in English, but it was not written by an American. As a linguist, I recognized the differences in formality and word use as signs that the person who wrote the e-mail is speaking from a set of experiences with English that differ from my own. Nothing in the e-mail was grammatically incorrect (although as a linguist I am hesitant to judge any linguistic differences as correct or incorrect, especially out of context).

Then later this afternoon I saw a tweet from Twitter on the correct use of Twitter abbreviations (RT, MT, etc.). If the growth of new Twitter users has indeed leveled off then Twitter is lucky, because the more Twitter grows the less they will be able to influence the language use of their base.

Language is a living entity that grows, evolves and takes shape based on individual experiences and individual perceptions of language use. If you think carefully about your experiences with language learning, you will quickly see that single exposures and dictionary definitions teach you little, but repeated viewings across contexts teach you much more about language.

Language use is patterned. Every word combination has a likelihood of appearing together, and that likelihood varies based on a host of contextual factors. Language use is complex. We use words in a variety of ways across a variety of contexts. These facts make language interesting, but they also obscure language use from casual understanding. The complicated nature of language in use interferes with analysts who build assumptions about language into their research strategies without realizing that their assumptions would not stand up to careful observation or study.

I would advise anyone involved in the study of language use (either as a primary or secondary aspect of their analysis) to take language use seriously. Fortunately, linguistics is fun and language is everywhere. So hop to it!

Reporting on the AAPOR 69th national conference in Anaheim #aapor

Last week AAPOR held it’s 69th annual conference in sunny (and hot) Anaheim California.

Palm Trees in the conference center area

My biggest takeaway from this year’s conference is that AAPOR is a very healthy organization. AAPOR attendees were genuinely happy to be at the conference, enthusiastic about AAPOR and excited about the conference material. Many participants consider AAPOR their intellectual and professional home base and really relished the opportunity to be around kindred spirits (often socially awkward professionals who are genuinely excited about our niche). All of the presentations I saw firsthand or heard about were solid and dense, and the presenters were excited about their work and their findings. Membership, conference attendance, journal and conference submissions and volunteer participation are all quite strong.

 

At this point in time, the field of survey research is encountering a set of challenges. Nonresponse is a growing challenge, and other forms of data and analysis are increasingly en vogue. I was really excited to see that AAPOR members are greeting these challenges and others head on. For this particular write-up, I will focus on these two challenges. I hope that others will address some of the other main conference themes and add their notes and resources to those I’ve gathered below.

 

As survey nonresponse becomes more of a challenge, survey researchers are moving from traditional measures of response quality (e.g. response rates) to newer measures (e.g. nonresponse bias). Researchers are increasingly anchoring their discussions about survey quality within the Total Survey Error framework, which offers a contextual basis for understanding the problem more deeply. Instead of focusing on an across the board rise in response rates, researchers are strategizing their resources with the goal of reducing response bias. This includes understanding response propensity (who is likely not to respond to the survey? Who is most likely to drop out of a panel study? What are some of the barriers to survey participation?), looking for substantive measures that correlate with response propensity (e.g. Are small, rural private schools less likely to respond to a school survey? Are substance users less likely to respond to a survey about substance abuse?), and continuous monitoring of paradata during the collection period (e.g. developing differential strategies by disposition code, focusing the most successful interviewers on the most reluctant cases, or concentrating collection strategies where they are expected to be most effective). This area of strategizing emerged in AAPOR circles a few years ago with discussions of nonresponse propensity modeling, a process which is surely much more accessible than it sounds, but it has really evolved into a practical and useful tool that can help any size research shop increase survey quality and lower costs.

 

Another big takeaway for me was the volume of discussions and presentations that spoke to the fast-emerging world of data science and big data. Many people spoke of the importance of our voice in the realm of data science, particularly with our professional focus on understanding and mitigating errors in the research process. A few practitioners applied error frameworks to analyses of organic data, and some talks were based on analyses of organic data. This year AAPOR also sponsored a research hack to investigate the potential for Instagram as a research tool for Feed the Hungry. These discussions, presentations and activities made it clear that AAPOR will continue to have a strong voice in the changing research environment, and the task force reports and initiatives from both the membership and education committees reinforced AAPOR’s ability to be right on top of the many changes afoot. I’m eager to see AAPOR’s changing role take shape.

“If you had asked social scientists even 20 years ago what powers they dreamed of acquiring, they might have cited the capacity to track the behaviors, purchases, movements, interactions, and thoughts of whole cities of people, in real time.” – N.A.  Christakis. 24 June 2011. New York Times, via Craig Hill (RTI)

 

AAPOR a very strong, well-loved organization and it is building a very strong future from a very solid foundation.

 

 

2014-05-16 15.38.17

 

MORE DETAILED NOTES:

This conference is huge, so I could not possibly cover all of it on my own, so I will try to share my notes as well as the notes and resources I can collect from other attendees. If you have any materials to share, please send them to me! The more information I am able to collect here, the better a resource it will be for people interested in the AAPOR or the conference-

 

Patrick Ruffini assembled the tweets from the conference into this storify

 

Annie, the blogger behind LoveStats, had quite a few posts from the conference. I sat on a panel with Annie on the role of blogs in public opinion research (organized by Joe Murphy for the 68th annual AAPOR conference), and Annie blew me away by live-blogging the event from the stage! Clearly, she is the fastest blogger in the West and the East! Her posts from Anaheim included:

Your Significance Test Proves Nothing

Do panel companies manage their panels?

Gender bias among AAPOR presenters

What I hate about you AAPOR

How to correct scale distribution errors

What I like about you AAPOR

I poo poo on your significance tests

When is survey burden the fault of the responders?

How many survey contacts is enough?

 

My full notes are available here (please excuse any formatting irregularities). Unfortunately, they are not as extensive as I would have liked, because wifi and power were in short supply. I also wish I had settled into a better seat and covered some of the talks in greater detail, including Don Dillman’s talk, which was a real highlights of the conference!

I believe Rob Santos’ professional address will be available for viewing or listening soon, if it is not already available. He is a very eloquent speaker, and he made some really great points, so this will be well worth your time.

 

Let’s talk about data cleaning

Data cleaning has a bad rep. In fact, it has long been considered the grunt work of the data analysis enterprise. I recently came across a piece of writing in the Harvard Business Review that lamented the amount of time data scientists spend cleaning their data. The author feared that data scientists’ skills were being wasted on the cleaning process when they could be using their time for the analyses we so desperately need them to do.

I’ll admit that I haven’t always loved the process of cleaning data. But my view of the process has evolved significantly over the last few years.

As a survey researcher, my cleaning process used to begin with a tall stack of paper forms. Answers that did not make logical sense during the checking process sparked a trip to the file folders to find the form in question. The forms often held physical evidence of a indecision on the part of the respondent, such as eraser marks or an explanation in the margin, which could not have been reflected properly by the data entry person. We lost this part of the process when we moved to web surveys. It sometimes felt like a web survey left the respondent no way to communicate with the researcher about their unique situations. Data cleaning lost its personalized feel and detective story luster and became routine and tedious.

Despite some of the affordances of the movement to web surveys, much of the cleaning process stayed routed in the old techniques. Each form has its own id number, and the programmers would use those id numbers for corrections

if id=1234567, set var1=5, set var7=62

At this point a “good programmer” would also document the changes for future collaborators

*this person was not actually a forest ranger, and they were born in 1962
if id=1234567, set var1=5, set var7=62

Making these changes grew tedious very quickly, and the process seemed to drag on for ages. The researcher would check the data for a potential errors, scour the records that could hold those errors for any kind of evidence of the respondent’s intentions, and then handle each form one at a time.

My techniques for cleaning data have changed dramatically since those days. My goal is to use id numbers as rarely as possible, but instead to ask myself questions like “how can I tell that these people are not forest rangers?” The answer to these questions evokes a subtley different technique:

* these people are not actually forest rangers
if var7=35 and var1=2 and var10 contains ‘fire fighter’, set var1=5)

This technique requires honing and testing (adjusting the precision and recall), but I’ve found it to be far more efficient, faster, more comprehensive and, most of all- more fun (oh hallelujah!). It makes me wonder whether we have perpetually undercut the quality of the data cleaning we do simply because we hold the process in such low esteem.

So far I have not discussed data cleaning for other types of data. I’m currently working on a corpus of Twitter data, and I don’t see much of a difference in the cleaning process. The data types and programming statements I use are different, but the process is very close. It’s an interesting and challenging process that involves detective work, a better and growing understanding of the intricacies of the dataset, a growing set of programming skills, and a growing understanding of the natural language use in your dataset. The process mirrors the analysis to such a degree that I’m not really sure why it would be such a bad thing for analysts to be involved in data cleaning.

I’d be interested to hear what my readers have to say about this. Is our notion of the value and challenge of data cleaning antiquated? Is data cleaning a burden that an analyst should bear? And why is there so little talk about data cleaning, when we could all stand to learn so much from each other in the way of data structuring code and more?