Wednesday, 20 March 2013

Tapping the Twitter brain

So I've set up a Twitter Streaming API client on a server and started consuming the large amounts of data that comes into it.

It feels a bit like that famous (to medics at least) Gary Larson cartoon with the mosquito who has hit an artery.

It is clearly possible to run some fancy types of analysis on Twitter data such as for collecting the top twitter posters and URLs for a particular hashtag e.g. GrabChat's #NICE2012, or the more recent text analysis of Diabetes UK's #dpc13.

The output from the streaming API though is another order of magnitude and will be promising for identifying, literally, 'trending' resources or individuals within a particular topic. I've just run 48 hours of the keyword 'diabetes' and got nearly 40,000 Tweets.

Tapping into the global discussion of diabetes is one thing but filtering out the good stuff is another challenge. I have a few competing algorithms that are running on the data to see which is best.

I've been filtering tweets for some time, sharing them with followers and relaying them through to our diabetes diploma course but this promises a whole new level - a more systematic approach. What's surprising is the enormous amount of spam and re-tweeting of low-level health and nutrition material that goes on. Thankfully, through the normal interfaces of Twitter, these are tweets you do not see. Hidden among them are useful ones and the trick is to filter them out.

No comments:

Post a comment