Thoughts on Comment Spam
Wanted to speak about comment spam in blogs. While bloggers are aware of comment spam by the end of 2006, there are still many popular blogs flooded with spammy comments, for example del.icio.us blog. There are some popular solutions that proved to be not effective:
- Comment moderation - not scalable, one cannot moderate hundreds of comments per minute. Other cons: this way HTTP 200 response is returned to spammer, which makes spam software believe it was success. And it will continue with spam attempts.
- Closing comments on older posts. This one is lame, imho.
- Content filtering - easy to work around. It is also usually a lot of overhead.
- rel-nofollow - returns HTTP 200 to spammer, so it does not prevent flooding.
So it leaves us with CAPTCHA which proved to be the most effective. While it also has some cons:
- It reduces dramatically the number of normal comments, since users have to do more work.
- There are solutions that beat CAPTCHA. While these solutions exist, I can say it is not productive for spammer to distribute such solution as a software, because it is not trivial to implement and customize for each and every CAPTCHA solution. So deny access by IP address can usually solve the problem.
- Running long time with CAPTCHA can cause DoS, because there can be significant number of unsuccessful attmepts. So here I think need to combine it with IP blocking maintaining some sort of spam IP black list. Or customize CAPTCHA in a certain way to not return 200 HTTP response status. Which is difficult to implement since it can be normal user, who made a mistake reading CAPTCHA.
Lately I had as much as thousands comment attempts per day on this blog which nearly caused DoS. So I:
- Started to monitor all attempts.
- Collected most blatant waves of spam IP addresses.
- Ranked the IP addresses list.
- Banned spammers with most attempts by IP. BTW, having this list in front of me I don't think that most spammers use public proxies to post (which is a comon belief). Most of them just spam from their own computer/s. There are very few that use routing to public proxies IPs, so I don't care about them right now (not to mention it will play against them in the long run).
Saturday, December 16, 2006 6:26 PM