What “R” qualitative research methods?

I recently stumbled upon a post on R-bloggers entitled “Qualitative Research in R.” This title got me pretty excited, since I’m generally excited about most things R and since I recently helped teach a qualitative methods course, which has had me thinking about adding more ethnographic and other qualitative elements to my work. I’d also heard of the RQDA package before but hadn’t had much luck in getting it to work

The article has some good advice on text mining and word clouds—however, I was disappointed to see text mining and word clouds described as qualitative research. This seems to me to be an example of the oversimplified assumption that qualitative = words and quantitative = numbers. Consider the following passages from the post that illustrate some of the problems with assuming that work is qualitative just because it involves text:

[Text mining is a] method which enables us to highlight the most frequently used keywords in a paragraph of texts or compilation of several text documents.

There’s no denying that text mining (as described here) involves words, which makes it tempting to describe it as qualitative research. However, it’s clear from this description that text mining (as described here) is also based on frequency counts. Even more advanced forms of text analysis (like topic modeling rely more on underlying quantitative analysis that happens to manifest itself in terms of words.

However, it should be obvious that these text analysis techniques do not understand the meaning of the words that they manipulate, instead treating them as arbitrary values that happen to correlate, coincide, etc. It’s similar to a phenomenon described by Stefan Fatsis in his wonderful book Word Freak, where highly-competitive Scrabble players start to see words not as units of meaning but as combinations of game pieces, board spaces, and point values, a phenomenon that allowed a New Zealander who doesn’t speak a word of French to win the 2015 Francophone World Championship of Scrabble.

Another passage shows that the same is true of word clouds (which my colleague Josh Rosenberg has written further about here):

Finally, the frequency table of the words (document matrix) will be visualized graphically by plotting in a word cloud with the help of the following R code.

Again, the R function that is composing the word cloud doesn’t understand those words any more than our Kiwi friend understood French—even if they are both able to do something pretty cool with language. I’d go so far as to argue that they aren’t really working with words, they’re just working with the numbers underneath.

This isn’t to say that automated text analysis is useless (on the contrary, I’m a pretty big fan) or that text analysis couldn’t inform qualitative research (on the contrary, I think digital humanists are doing a great job of exploring this). It’s simply to say that qualitative research is rich enough of a tradition that it would be a tremendous disservice to describe text mining alone as a qualitative research method.

Where are participants in American and Canadian teacher hashtags?

My dissertation research is focused on Regional Educational Twitter Hashtags (RETHs), which are teacher-focused hashtags that are associated with particular geographic regions, such as American states or Canadian provinces or territories. This isn’t the first time that I’ve done a project on this phenomenon, and it’s rewarding to come back to RETHs to answer some questions that I wasn’t able to examine the last time around. However, there’s enough data here that I can’t even tackle it all in my dissertation. I recently successfully defended my proposal (hooray!), and have been trying to work on other projects to let my head clear some before diving back into the dissertation.

So, I decided I would go back to something I’ve tried a few times in the past, mapping Twitter user locations. This is particularly interesting when it comes to RETHs, because the hashtags themselves suggest that participants are likely to be concentrated in a particular area: That is, we can guess that #miched users will be mostly/completely within Michigan, and #bced users ought to be mostly/completely within British Columbia. Is that really the case, though?

So, to take a look at this, I rounded up all of the ~1.6 million tweets composed during 2016 using 68 different RETHs (52 from the US and 16 from Canada). Then, I found all of the unique participants who had composed these tweets. Using the Twitter API, I collected the locations listed in their Twitter profiles and used the Data Science Toolkit to convert those locations into longitude and latitude. Once I eliminated the participants that didn’t list a profile or who listed a profile that the Data Science Toolkit couldn’t interpret in terms of longitude and latitude, I wound up with 91,332 unique participants. Once I had coordinates for them all, I used Leaflet to map them. So, where are participants tweeting from? Take some time to explore the map below (or click here to open it in a separate tab). I’ll offer a few thoughts below the map.

First, while activity is definitely concentrated in the US and Canada (as expected), there is activity from all over the world. I know that some of the points outside North America result from errors and misinformation. A handful of jokers list themselves as tweeting from Antarctica, and the Data Science Toolkit can misinterpret location strings in unexpected ways. However, some of our forthcoming research suggests that if you’re only concerned with identifying the state that someone is tweeting from (and if you ignore the fact that profile locations aren’t necessarily truthful), computational recognition of listed Twitter profiles is actually pretty accurate. If you’re interested, you can click on any single point to see what the actual location listed in the Twitter profile is, just to spot check some of the more interesting ones.

Second, the clustering function of Leaflet (the feature that aggregates points at certain spots on the map rather than showing them all) is really, really handy here. The first time I built the Leaflet, I had all 90,000 points mapped individually, which made the map a real pain to explore. I think that whenever I’ll be working with large numbers of points in the future, I’ll definitely be clustering.

Finally, the fact that there are over 90,000 unique accounts (definitely over 100,000 if you add back in those without clear geodata) associated with these 68 teacher-focused hashtags over 2016 is worth thinking about, too.

Most of all, playing with Leaflet reminds me of how simple, powerful, yet pretty it is. I’m hoping to continue to work with it as I keep working with digital data!

Copyright, Fair Use, and Creative Commons in 7 more videos

Last year, I wrote a post about a series of YouTube videos that I used to give a guest lecture on copyright, fair use, and Creative Commons. It went well enough that I was asked to come back this year and guest lecture again. I made some tweaks to the presentation this time around, switching the order of some of the videos, and replacing a couple of them. I think the flow of the presentation/videos is stronger this time around, since it starts by laying a stronger foundation for why we need intellectual property law (instead of jumping straight to the right to remix and immediately criticizing the law) and eases into a debate about fair use with an example that’s germane to teaching (if, like I may be able to, you can get away with showing remixed clips from French Star Wars when teaching).

Rather than write everything out here (since there is a lot of overlap with last year’s post), I’ll embed the slides for the new presentation here; you can also check out this link if you want to see my presentation notes, too.

Is there such a thing as “white hat” research ethics violations?

A couple of weeks ago, I posted about some of the ethical dilemmas involved in using public data for research, using an example of facial recognition researchers who used YouTube videos of people undergoing hormone replacement therapy to improve their algorithms’ ability to recognize faces from pre- and post-transition.

Since reading that article, I’ve seen the occasional tweet about another facial recognition project that purports to be able to infer sexual orientation from facial analysis. Like the other project, this one also uses public data and also has some worrying implications. Over the weekend, though, I sat down to read an article about the project and learned a few things that I hadn’t picked up from tweetskimming. First, as for much research, the “success” of this project is much higher in a controlled lab setting than in the real world. Second, and more importantly, the researchers claim to have carried out this project precisely in order to raise the alarm bells that they did:

Dr Kosinski says he conducted the research as a demonstration, and to warn policymakers of the power of machine vision. It makes further erosion of privacy “inevitable”; the dangers must be understood, he adds. Spouses might seek to know what sexuality-inferring software says about their partner (the word “gay” is 10% more likely to complete searches that begin “Is my husband…” than the word “cheating”). In parts of the world where being gay is socially unacceptable, or illegal, such software could pose a serious threat to safety. Dr Kosinski is at pains to make clear that he has invented no new technology, merely bolted together software and data that are readily available to anyone with an internet connection. He has asked The Economist not to reveal the identity of the dating website he used, in order to discourage copycats.

This description reminds me of “white hat hacking,” hacking that tries to break a system’s security but with the purpose of calling attention to the identified weaknesses so that the system can be strengthened. Dr. Kosinski seems to be doing something similar: Carrying out a project with troubling implications but with the purpose of calling our attention to those implications and helping us deal with them before someone else (perhaps a “black hat” with malicious intent) does the same thing.

When it comes to Twitter research, I’ll admit that I’ve done some of this same thing on a much smaller scale. It started at a colleague’s practice job talk, when he mentioned the steps that he took to anonymize the teenage tweeters that he was studying. I took that as a challenge, and the next time he quoted a tweet in a slide, I opened up Twitter, typed in a distinctive phrase, and managed to find out who at least one of the “anonymized” participants was. I’ve kept up this habit when reading papers or watching presentations that quote “anonymized” tweets, and I’ve usually come away with some success identifying one or more of the participants (of course, in one case, it was one of my own tweets that had been quoted by a fellow MSU researcher, so that one wasn’t terribly difficult). On one hand, doing this (i.e., intentionally breaking someone’s efforts at anonymizing research participants) kind of makes me a jerk. On the other hand, I’m trying in part to draw attention to the problems of truly anonymizing Twitter data.

So, white hat research ethics violations… are they a thing? Do we need more of them? Are they really research ethics violations? Plenty of food for thought.

Some thoughts on starting year 5 (and French comics)

The image below, from the French comic book Carnets de thèse (“thesis notes”), has been on my mind as I begin my fifth year of grad school. I bought Carnets de thèse as a present for myself for my last birthday, expecting it to be an educational glimpse into the French grad school experience and a dose of humor to get me through the rest of my own studies. Jeanne Dargan, the protagonist of the book, begins grad school with an excitement to begin her studies, a surefire plan to finish in three years, and a clear idea of what she wants to write her thesis on. However, as the picture below shows, the next five (not three) years end up being a slow descent into grad school madness punctuated by an annual tradition of changing her thesis topic. After finishing the book, it should come as no surprise to the reader that the author wrote and drew the book after leaving her own graduate studies. If they were anything like Jeanne’s, I can’t blame her.

I’m happy that I’m excited to be going into my fifth year, and not just because it means the end is in sight. Even though my dissertation topic has also probably changed once a year, I haven’t had to deal with the lack of support, the obstructionist secretaries, the unfairly-revoked salary, and the disintegrating relationships that turn Carnets de thèse into a very dark comedy (if it can still be called a comedy). I feel privileged, fortunate, and blessed to be able to say that, and I feel a great obligation to make sure the grad students I mentor in the future have an experience more like mine and less like Jeanne’s.

Public data and digital research ethics

The Verge recently posted an article that highlights some of the ethical dilemmas involved in collecting publicly-available data for research purposes. The article begins by describing the work of a researcher working on facial recognition of people before and after hormone replacement therapy:

On YouTube, he found a treasure trove. Individuals undergoing HRT often document their progress and post the results online, sometimes keeping regular diaries, and sometimes making time-lapse videos of the entire process. “I shared my videos because I wanted other trans people to see my transition,” says Danielle, who posted her transition video on YouTube years ago. “These types of transition montages were helpful to me, so I wanted to pay it forward,” she tells The Verge.

At first glance, YouTube videos seem like a perfect dataset for this sort of thing. They’re being freely provided and are generally available under a Creative Commons license. However, in the words of Dr. Ian Malcolm:

via GIPHY

Again, from the article:

Danielle, who is featured in the dataset and whose transition pictures appear in scientific papers because of it, says she was never contacted about her inclusion. “I by no means ‘hide’ my identity,” she told The Verge using an online messaging service. “But this feels like a violation of privacy.” She said she was gratified to know that there are limits on the use of the dataset (especially that it wasn’t sold to companies), but said this sort of biometric collection had “all sorts of implications for the trans community.”

The idea of having one’s picture—especially a transition picture—appear in scientific papers without ever having consented to it seems highly problematic. And yet, I’m obviously in favor of using publicly-available digital data for research; after all, I rely on public Twitter data for nearly all of my research. So, how can I continue to use this data while not crossing any lines? I don’t claim to do this perfectly, but here’s one way Josh Rosenberg, Leigh Graves Wolf, and I described our efforts in a recent article:

Twitter and other Internet data provide new ethical challenges for educational (and other) researchers. Inspired by medical research, the concept of human subjects research has long been the distinguishing factor in whether researchers are required to submit their work to institutional review boards (IRBs) for ethical review (Markham and Buchanan, 2012). However, data such as the collection of tweets described above frequently do not qualify as human subjects research; indeed, this study did not require review by an IRB according to the definitions set out by Michigan State University. However, Internet researchers are increasingly vocal in their arguments that existing ethical frameworks are not well suited to digital data (Markham and Buchanan, 2012) and that the limits established by the law are also inadequate for determining what constitutes ethical Internet research (Eynon et al., 2008).

In response to the absence of universal, clear guidelines for Internet data, we have taken explicit steps of our own to report our findings ethically. Most notably, we have tried to avoid the use of direct quotation throughout the paper, even when referring to particular tweets. Twitter’s search function is powerful enough that even a small but distinct quotation may be sufficient for identifying a particular tweet, and while tweets can be considered public documents, we feel that it is important to acknowledge that notions of publicity and privacy on the Internet are mediated by varying expectations, intentions, and contexts (Eynon et al., 2008; Markham and Buchanan, 2012) and that no one has provided explicit consent for their tweets to appear in this paper. When we have chosen to quote from tweets, we have made modifications such as excerpting tweets and removing URLs to personal blog posts in order to preserve as much anonymity as possible.

There’s a lot more that could be said—and that I ought to write— on this subject, but I was glad that I ran into this article to get my mind working on this subject again.

Using notebooks for beginning-of-semester planning

One of the first posts I published to this blog was a lament that I just couldn’t get notebooks to work for me. About a year ago, though, I finally found a routine that got notebooks working for me. I started off working my way through two copies of a Self Journal before borrowing some of the best elements of the Self Journal and adding some features I felt were missing in a homebrew, quasi-bullet journal style mashup. I’m still highly dependent on digital tools—mostly Things 3—for task management and reminders, but I’ve found a few areas where analog works better for me than digital:

  • Note taking: I have never been good at taking notes digitally because I feel like I have to have a comprehensive system. With a notebook, I can just give myself permission to scrawl a few things down on whatever page is handy and flip back to it when I need to, and that seems to do the trick.
  • Goal setting: Things 3 works great for task management, but I’ve been trying to work harder to actually set goals for myself. Writing goals down rather than entering them into an app feels like a better way of reflecting on what goals I want to set and getting them present in my mind.
  • Planning: Just as Things 3 works great for task management, digital calendars do the trick wonderfully when it comes to scheduling events. However, there are plenty of non-scheduled parts of my day that I need to plan, and there are way too many office hours that I’ve been late to because an event being on my calendar doesn’t mean it’s on my mind. Taking the time to write out a plan works much better for my awareness of that period of time.

With goal setting and planning in particular, I try to take the time to sketch out every day, every week, every month, and every semester. When those line up, it leads to a busy Sunday night, as it did this past weekend. This is a pretty big semester, though, since I have plans to defend my dissertation proposal and start applying for jobs, and it was great to think about how those two big goals and corresponding plans look at those four different scales. I’m really glad that I’ve found a way to fit notebooks into my routines!

A couple of podcasts on screencasting

I’ve posted before about teaching CEP 813, a class on electronic assessment that features a unit on game-based assessment in Minecraft. This unit is by far the most intense in terms of technical support, and we had a major hiccup earlier this month that caused some frustration for the whole class (and instructional team). After some tinkering, we were able to figure out how to make it possible for everyone to complete the assignment. While the solution wasn’t terribly complex, it was definitely easier to show rather than tell, so I whipped up a couple of quick screencasts to walk people through what they needed to do.

This got me thinking about how I’d love to use more screencasts in my teaching. I’d like to think that producing these YouTube videos was more engaging, more helpful, and more personal than any text-based solutions I could have provided, and I imagine that’s true for a lot of online teaching, not just tech support. I still have a lot to learn about screencasting, though, so I’m glad there are some good resources out there. This includes a couple of podcast episodes that I’ve listened to recently and that I thought I’d share for anyone else interested in incorporating some more screencasting into their online (or other) teaching:

The first is an episode of the Teaching in Higher Ed podcast where Bonni Stachowiak talks to Brandy Dudas about pencasting and other uses of video in the classroom.

The second is an episode of Mac Power Users, with Katie Floyd and David Sparks talking to JF Brissette about screencasting on a Mac and iOS. It’s not directly related to teaching—and certainly less useful if you’re not in the Apple-sphere—but it does a great job of getting you excited about the power and potential of screencasting.

Remembering to be a regular person (and not just a grad student)

Last Monday night, I went with some friends from church to an “all-you-can-eat chicken wings” event at a local restaurant. The company and conversations were fantastic, but I still left after a couple hours second-guessing my decision to go. The service had been slow, so I wound up staying longer than expected, which led to going to bed late and without my nightly routine of planning out the next day. Plus, spending $18 for the privilege of eating as much poultry as I could stomach probably wasn’t great for my bank account (which has been getting slimmer since my daughter was born), my waistline (which hasn’t followed the example of my bank account), or for my growing conviction that I really need to be eating more tofu and less meat.

Despite all of that second-guessing, though, I’m very glad that I went. I really needed to take the time to be a “regular person” and not just a grad student, something that the tweet above shows I’ve been thinking about a lot lately. I do a fair number of things outside of grad school, but not enough of them are unstructured opportunities to just spend time with other people. I remember turning down an invitation to a Dungeons and Dragons group as a first-year grad student because I already knew that I needed to set aside a lot of time for this exciting new adventure, and things have only gotten busier since. While it’s definitely important to know how to manage your time—and while I’ve found other, less time-intensive ways to get my D&D fix—I haven’t been great recently about being a regular person, and I’m glad I took the time to make a step toward that direction.

Swiss accents and using the Internet as a French teacher

Last week, on August 1st, I popped over to Radio Télévision Suisse to spend a couple of minutes celebrating the Swiss national holiday. While I was there, I spotted an article containing five “spoken portraits” of Swiss Francophones from different regions. Each portrait highlighted a different accent (or two) from Francophone Switzerland, and it was a lot of fun to spend part of my morning listening to each of these different accents, some of which were familiar to me from my time in Switzerland.

I’ve written before about how wonderful it is that the Internet can make cultural artifacts easily available to language teachers, but this was an especially nice resource to find because I specifically remember trying to find YouTube videos of Swiss accents as far back as five years ago (and arguing with classmates about whether the one video we could find was an authentic accent or just someone being goofy). One thing I always try to do when I’m in French teaching mode is to reinforce the idea that the Francophone world is more than just France (heck, more than just Paris). While the Internet—like any medium—tends to privilege dominant forms of expression, I think that it also makes more peripheral linguistic cultures accessible in a way that would be much harder than with just textbooks. Representing the French language as a monolith, when it’s full of regional and national variants, does it a disservice, and I think the Internet can help with that—I recently discovered the Francophone African word essencerie (instead of station-service) thanks to Wikipedia, and now it’s the word I always want to use for gas station!

It’s been over a year now since the last time I’ve taught French, but I still find myself bookmarking these resources just in case I get the chance to do it again!