My dissertation research is focused on Regional Educational Twitter Hashtags (RETHs), which are teacher-focused hashtags that are associated with particular geographic regions, such as American states or Canadian provinces or territories. This isn’t the first time that I’ve done a project on this phenomenon, and it’s rewarding to come back to RETHs to answer some questions that I wasn’t able to examine the last time around. However, there’s enough data here that I can’t even tackle it all in my dissertation. I recently successfully defended my proposal (hooray!), and have been trying to work on other projects to let my head clear some before diving back into the dissertation.
So, I decided I would go back to something I’ve tried a few times in the past, mapping Twitter user locations. This is particularly interesting when it comes to RETHs, because the hashtags themselves suggest that participants are likely to be concentrated in a particular area: That is, we can guess that #miched users will be mostly/completely within Michigan, and #bced users ought to be mostly/completely within British Columbia. Is that really the case, though?
So, to take a look at this, I rounded up all of the ~1.6 million tweets composed during 2016 using 68 different RETHs (52 from the US and 16 from Canada). Then, I found all of the unique participants who had composed these tweets. Using the Twitter API, I collected the locations listed in their Twitter profiles and used the Data Science Toolkit to convert those locations into longitude and latitude. Once I eliminated the participants that didn’t list a profile or who listed a profile that the Data Science Toolkit couldn’t interpret in terms of longitude and latitude, I wound up with 91,332 unique participants. Once I had coordinates for them all, I used Leaflet to map them. So, where are participants tweeting from? Take some time to explore the map below (or click here to open it in a separate tab). I’ll offer a few thoughts below the map.
First, while activity is definitely concentrated in the US and Canada (as expected), there is activity from all over the world. I know that some of the points outside North America result from errors and misinformation. A handful of jokers list themselves as tweeting from Antarctica, and the Data Science Toolkit can misinterpret location strings in unexpected ways. However, some of our forthcoming research suggests that if you’re only concerned with identifying the state that someone is tweeting from (and if you ignore the fact that profile locations aren’t necessarily truthful), computational recognition of listed Twitter profiles is actually pretty accurate. If you’re interested, you can click on any single point to see what the actual location listed in the Twitter profile is, just to spot check some of the more interesting ones.
Second, the clustering function of Leaflet (the feature that aggregates points at certain spots on the map rather than showing them all) is really, really handy here. The first time I built the Leaflet, I had all 90,000 points mapped individually, which made the map a real pain to explore. I think that whenever I’ll be working with large numbers of points in the future, I’ll definitely be clustering.
Finally, the fact that there are over 90,000 unique accounts (definitely over 100,000 if you add back in those without clear geodata) associated with these 68 teacher-focused hashtags over 2016 is worth thinking about, too.
Most of all, playing with Leaflet reminds me of how simple, powerful, yet pretty it is. I’m hoping to continue to work with it as I keep working with digital data!