Cleaning and Clarifying

August 11, 2016

This year’s project has been an exercise in exploring what “data” means in the humanities and thinking about research as data collection. It has left me pondering how, on one hand, initial research and data collection and the data cleaning process(es) can be viewed as a single, messy procedure, or on the other hand, as three separate efforts. Clearly, even thinking about data is a messy process, and I’ve learned to find some comfort in that concept, rather than impatience or shame. However, I’m also committed to the idea that the heart of my project is learning to produce a quality, peer-reviewable, re-usable data set and am committed to learning best practice for that task, messy though it may be.

Data Cleaning Challenges

When I wrote my last two posts, I thought I had overcome perfectionist waffling and was just about ready to send my data set to Open Context for review. It was very exciting. However, after some very helpful exchanges with my faculty mentors, I realized there was (quite a bit) more data cleaning I needed to do.

Essentially, I had been conflating categories and columns in the spreadsheet and needed clearer indications of what information I was recording. For instance, instead of one column noting whether a monument is a church, I now use a controlled vocabulary labeling it “sacred” or “secular” instead of filling it in with “yes” or “no.” It’s a small change, but one that makes the monument data much clearer in answer to the research question of what the monument was used for. I’ve been plugging along in this manner, re-addressing the data in each column, and making sure that it makes sense and is somewhat consistent. (For instance, I had used “null” in some empty columns, “?” in some, and “*” in others for things I either didn’t have info for or wasn’t sure about; these are now being streamlined).

Emotional Labor in Digital Humanities

I spent much of the summer frustrated with how long my work is taking and how easily distracted I am in the wake of a long dissertation-writing process, yet I am incredibly grateful for faculty mentors and others in the Twitterverse who have provided the pep talks, explanations via email and the Commons, and answers to questions that keep me plugging along. A recent article that described similar experiences is Paige Morgan’s “Not Your DH Teddy Bear,” on emotional labor—what she defines as “managing people’s emotions so that they can make effective project decisions”— in digital humanities.* She discusses it from the perspective of someone doing that labor, the hand-holding and encouraging of researchers, whereas at the moment, I’m the lucky recipient. I suspect many of the MSUdai faculty members will be able to relate to Morgan’s experience of having to provide things that aren’t necessarily digital skills or instructions over the course of this year—advice, encouragement, very detailed emails, emoji tweets. We started a DAC group on digital labor a while back, and this article is an interesting take on that topic.

Learning Process Reflections

On that note, I had planned to title this post, “Coulda, Shoulda, Woulda,” and list everything I would do differently next time. But then I spent some time articulating what I actually did, reading my field notebooks, and skimming all the documentation I produced while I worked. The truth is, I think I did the best I could at the time. This is not to say I wouldn’t do anything differently (I’ve made a list!), but it was a moment of clarity to realize that gaining data knowledge is, in fact, a learning process, and I simply must grant to myself the same leeway and patience I extend to my students when they’re learning something new.

My extended data cleaning process means I will need to re-address what I’ll have ready at the institute. I will definitely have data to visualize, but the Open Context peer review for the data set will still take some time, as will working with KORA, so I’ll have to rethink my order of operations. I’m hoping to use some institute time to get advice from Dan, Eric, and Catherine about data cleaning and workflow.

Morgan’s article also conveys patience in the face of mistakes, saying “messiness can be synonymous with complexity and, in that regard, can be generative rather than unproductive, and generative of engagement, rather than just tidying.” So in the spirit of generative and productive chaos, my next post will detail some of the things I know now that will benefit my next data collection project.

Updates

Defended my dissertation. (Hooray!)
Crossed the hurdle of being afraid of messy data. (Bring on the chaos).
Had some productive emails and Commons conversations with my faculty mentors.
Explored other data sets in Open Context for comparison.
Got some great instructions for using and contributing to PeriodO for linked open data for time periods from Adam Rabinowitz on the DAC. (Read it here).
Applied for an ORCID number. (Required for PeriodO, and useful for all publications).
Cleaned the data. (Still cleaning, actually).
Started a 10-day mini refresher course on web design (HTML and tech terms) via Skillcrush.
Got a Twitter handle and Commons URL for a Cappadocia-related identity that are separate from my sorta-personal current accounts. (More on that at the institute!)

Goals for the Institute

Continue curating/collecting/cleaning the highest-quality data set that I can and make sure it’s worthy of peer review.
Start working on ways to visualize that data during the institute.
Clarify an online identity to disseminate Cappadocian data (this data set, as well as other images and open access resources) via the DAC and other venues.

Reference: Paige Morgan, “Not Your DH Teddy-Bear; or, Emotional Labor is Not Going Away,” in Digital Humanities In the Library / Of the Library. DH+lib: Where the Digital Humanities and Librarianship Meet. ed. Caitlin Christian-Lamb, Zach Coble, Thomas Padilla, Caro Pinto, Sarah Potvin, John Russell, Roxanne Shirazi, and Patrick Williams, (July 29, 2016). http://acrl.ala.org/dh/2016/07/29/not-your-dh-teddy-bear/

A.L. McMichael
@ByzCapp