A couple of weeks ago, I spent some time discussing how to use R and Web scraping to retrieve information on Twitter users’ locations, as stored in their profiles.
I’ve since updated the code to scrape not only locations, but names, descriptions, locations, personal websites, join dates, number of tweets, number of users following, number of followers, and number of favorited/liked tweets. I stopped short of tracking the number of lists each user has created, since I was having trouble with it, and the little bits of code I wrote to clean white space out of the text have been commented out, since I can’t vouch for their accuracy (they were also pretty crude measures, so I need to come up with something better).
If this might be useful to you, feel free to check the code out here.
Also, Paul Pival at the University of Calgary recently took the #educattentats data that I was originally working with and tried a different approach for scraping profile information. If you’re interested in different methods for working with Web data, his post is worth a read!