Comment Spammers Have Blogs of Thier Own
In the threadlink above, Jeremy Zawodny of Yahoo is talking about solutions to the ever increasing blog spam problem. Recently SixApart, makers of MovableType have been experiencing server load problems due the voracious appetite spambots employed by hardcore search marketers (spammers) in an effort to get top ranking in competative areas.
Jeremy's solution is this: Assuming that 80% of bloggers use the same major blog software and that 80% dont change the default templates, just have search engine spiders look at the code and differentiate between the original post and the comments. Dont count comment links at all.
Why that isn't a great idea...
I think Jeremy's solution is a poor one for a few reasons:
- It will kill a good many great links - Comments are used for discussion of the original post and as with here on Threadwatch the discussion that follows often produces some outstanding links to great resources that the original poster never knew about. I'd hate to see those sites not get the full benefit of a link from us.
- Computational overhead - Im not search engineer but im reasonably certain that comparing code on pages to look for MT (or other blogs) footprints and then weeding out the comment links would require a fair bit of extra computation and this may not be doable from an SE standpoint.
- Im not convinced that the search engines should be responsible for finding a solution - im not saying that it's sixapart fault, just that they are in a better position to find a solution to this.
So, What's on the Table?
I think the solution lies with the software producers and that the company that comes up with the best solution and can demonstrate figures to prove that it works will have an excellent selling point for thier product. As it stands, MT's MT-Blacklist is crap: It's a constant "bucket and bail" effort that's reactive rather than pro-active and falls way short of being labeled a "solution". Other blogs, such as bBlog have implemented Captchas - where you have to enter the digits shown in a graphic to comment - this is better but, it's not unbreakable.
Here are a few of the current solutions being used or suggested:
- Captchas - Enter the digits in an image to comment
- Registration - To comment, you must sign up
- Pre-moderation - The blog owner approves or dissaproves each comment before publication
- Blacklists - lists of known spammers by site or IP or keyworkd regular expressions
- Search engine filtering - as described by jeremy in the threadlink above
- Non pagerank passing links - links from comments go through a non-spiderable jump script.
- Bayesian filters - looking at the comments and determining if they are spammy or not
Let's have a look at each:
Captcha: This to me seems like a good solution, or part of a good solution. Captcha's can be broken but it's far from easy to do so the bar for comment spamming would be set very high.
Registration: Providing registration requires an email verification this would undoubtably be a good solution, or part of one. Again, it can be broken, but not easily. The major problem with this is that many bloggers feel that the quick and easy way of commenting on blogs is part of their appeal and this solution presents an obvious barrier to participation. Im of the opinion that there are too many bloggers that have a rose tinted view of the internet and how it works - they should wake up to reality and realize that it has never been, nor ever will be, an ideal world.
Pre-moderation: This sucks. It's worse than useless unless every blog out there (or at least 80%) have it built in as default. The server load would not slacken for some considerable time and the burden of blog spam would simply be put into a new area. IE. rather than removing spam, bloggers would spend their time pre-moderating spam. Not a viable solution in my opinion.
Search Engine Filtering: Well, as i said at the beginning of this post, i really don't feel that's the right way to go. It would kill a lot of good links and stop them from passing well deserved algorythmic recognition to worthy sites being discussed in context. Sorry Jeremy, that sucks :)
Non PageRank Passing JumpLinks: This sucks for the same reason given for search engine filtering given above.
Bayesian Filters: It's my understanding that these are piss easy to break. Please correct me if im wrong.
So, What is the Solution to Comment Spam?
To solve the blog spam problem you first have to understand why blog spam occurs. It amazes me that the majority of bloggers that complain loudly about spam seem to have no idea why they're being targetted. I wrote yesterday on the subject of why bloggers think they invented the internet and the points in that post seem to apply here aswell. Bloggers need to take a look at the world outside of the blogosphere.
Understand your Enemy
The blog spammers i know have an intimate understanding of all the major blog software, the blog boom itself and how it can be used to further their own sites advantage in the search engines. Bloggers need to take this approach also.
Let me help you get up to speed...
It's really rather simple: A link to a website with the right anchor text is valuable for search placement. Simple as that. By allowing users to comment and having their "name" as the link text that goes to their website you are enabling them to sign is as "v|agra guy" or whatever - the keyword in the link text is what is important.
Contrary to popular beleif, it doesn't really matter if your blog has a high PageRank or not, the anchor text is what is important.
Now you're up to speed, let's look at what I think is a good solution and ask the good boys and girls at Threadwatch to dissect it, poo poo it, agree with it or present better solutions:
And the Winner is....
It's not an ideal world, for those bloggers reading this that think that if they wish hard enough people will stop spamming their blogs, or that perhaps whining about it is a solution as opposed to altering the way comments work then wake up! It's not a pink fluffy internet out there and if you want to do something permenant about this then you're going to have to change a few things okay?
- Change the way comments work - Instead of having the "name" part of the comment form as the anchor text that links back to the commenter's website - substitute it like this: Have the link go through a non-spiderable jump script. (yes, i know this goes against what i said earlier, but bear with me...) This will allow users to click the link and go to the commenter's site but not allow any benefit to spammers.
- Allow HTML/BBcode in your actual comments - Yep, you heard it right. This will allow users to link to on-topic material and add value to the post - yes, it would allow spammers to just insert html into their comments, but again, bear with me...
- Use a Captcha system for those that comment OR require registration. Arguably the first option is best as it presents the smallest barrier to participation. This will stop the vast majority of automated spam - period. And Enable it by Default!
There, that wasnt so hard now was it?
What Problems will this Solution Present
Not too many I think. The main issue is getting bloggers over the idea that it's an ideal world out there. You WILL have to change some stuff if you want this to work, just accept it. You will also get very clever bots that can break your captcha's but having spoken to some tech guys about this, it would be very minimal.
The worst thing would be that you would have to police your comments. You will still get people submitting comments by hand with links to thier website embedded in the comment field. Some will be clever and hard to spot as a spam attempt or genuine comment, well, that goes with the territory im afraid. Forum owners have to deal with promo posts on a daily basis, I have to here at Threadwatch and so will you - You just cant do this without a little work.
Untill search engines work differently: IE they dont place as much importance on link text or they find a way to determine spam from genuine the idea of auto-spam links will not go away. There is already talk of Wiki Spam and blog software developers will need to look hard at their systems in regards to where else spammers might find a way to put live links in. As i've said, this wont go away without work.
Blog Software is Where the ultimate Responsiblily lies
It will be down to companies like SixApart to initiate changes in their software to thwart spamming - They're already working on it. Solving the problem for you personally by hacking your blog script wont help much on the whole - neither will plugins or tutorials etc - This stuff needs to factored into the the core scripts and Enabled by Default.
Go on, Tear it Apart...
Ok, so pull my post to pieces please. Let's have some thoughts on my suggestions, some pointing out of anything i've missed or why the whole thing is a big bag of shite....
It was pointed out to me that this might be seen as an attack/dig at jeremy personally - it's not, i dont think his solution was a good one and that's where it ends :) The remarks about bloggers "waking up" etc were aimed out in general, not at any specific individual - thanks...
Added: John Battelle has just written about jermemys post also, worth a look...