Fighting Twitter spam with bayes

Image by pdeee454 via Flickr

For a few days now I've been researching the nature of spam on Twitter and finding the best way of combating it for a tool I'm making. It's been an extremely interesting ride that has, surprise surprise, proven my initial assumptions made several months ago in the process of creating a different tool.

When I started looking into spambayes last week I realised a set of verified spam and ham was needed to train the filters so they could do their work. So I made myself a tool that would present everything from the public timeline (20 tweets) and set out to clicking