Prevention versus amelioration

So, ages back, when I started getting comment spam of all sorts of different types, I set up a moderation queue and made a rule … basically, if you posted with 2 or more links in your comment, it’d go into that queue and I’d moderate it. I get email whenever there’s a successful comment, and when there’s a comment waiting to be moderated, so generally turnaround times on that were fairly good.

But, that was not going to float when I started getting 20-30 comment spams a day. So I turned to a filtering solution: akismet. Akismet’s pretty decent, it does a good job catching about 99% of the spam that gets tossed at stuff, and leaving the moderation rules in place caught 90% of what made it past akismet (leaving only about 1 in 1000 that got through unmoderated). That was all well and good, right … it worked, it was sane, it had a good performance rate and a very low false-positive rate (I’ve seen maybe 3 false positives between here and gotwoot, which is set up exactly the same as this blog).

But, spammers have been winning the fight. Lately, they’ve been designing more and more messages that make it through akismet, and akismet’s false positives are getting worse because to catch all the spam means to catch broader patterns, which include legitimate comment patterns.

Well, I wasn’t happy about that. I don’t get a whole lot of traffic over here, but I do get a fair lot on gotwoot, and losing 2% or so of the legitimate comment traffic to false positives on spam filtering meant having to pay too much attention to the filters too often.

So, I decided it was time to move away from that model. Akismet’s a post-comment filtering system. The moderation queue is another post-comment filter. They treat the symptoms, but not the disease. So I moved the whole mess forward a step.

Captcha! is a more direct approach. Rather than guessing whether a comment is legit or not, why not just block out the generally irritating spambots before they can comment at all?

Captcha’s author suggests that it’s better for now to use other tools … like checking to make sure the user’s browser understands javascript, and that eventually captcha will be OCR-breakable (which is quite true). However, any technique can be automated given sufficient computing power. Breaking a captcha with OCR requires the comment-spammer to have the computer resources to OCR the image at all. While that’s not terribly hard, it’s a hell of a lot harder than just churning away using POST requests (since it means establishing a session, getting, parsing, ocr’ing, and then posting). It’s not a huge problem to do on a single page, but automating it will certainly slow down the spammer a bit, hurting his efficiency and making it cost slightly more than the nothing that it currently costs him.

They say “a stitch in time saves nine” and all that. The way I see it, captcha is the stitch before the tear gets worse, akismet’s the ten late ones to fix it up.

That said, I use them both now. If someone does break captcha with OCR, they’re also going to have to get past akismet :p.

Incidentally, since installing captcha on gotwoot and here, I went from ~50 spams a day being caught by akismet to none at all being either akismetted or moderated, and none getting through to the front page. So, so far it’s been 100% effective, which makes me happy. And I’ve aimed for a generally readable captcha, which nobody’s complained about yet… so I guess everything’s cool!

One Response to “Prevention versus amelioration”

  1. Angel Says:

    Hey, that’s a pretty good way to deal with spammers :)