Evidently, Blog Bitch is tough to kill

Story Text:

Flashback to a March '05 TW thread Button Pushing goes Mainstream with Fake Blog Creation Now this from Versign about buying Winer's ping service (emphasis mine):

"I noted this morning searching for something on Technorati that they are telling us that we can search more than 18 million blogs now. I believe that’s true, but only if we’re fairly charitable in what we’d call a blog.

We’ve just begun doing some analysis on just how many blogs out there are real­ – the work of real humans crafting posts – rather than simply splogs – web pages that are generated automatically by scripts and programs to look just like (or much like) real blogs, but serve only as a place to park keywords that will hopefully be found in a search, and advertisements that hopefully will be clicked on by humans who happen to somehow land on that page. In talking to Google, they can confirm what our initial scan tells us: there are an enormous number of splogs out there, and the number is growing faster than the number of real blogs. By a good margin."

With all the back-slapping and suspender-snapping I've seen in the 'sphere lately, I haven't seen any workable solutions. Blind eye, I guess.


I think, with a fair amount

I think, with a fair amount of data, I could write a filter that would catch a good deal of spam blogs. There will always be a way to distinguish between man and machine.

That said though, Technorati are useless at it, but then the technical expertise of Technorati has always been in question.

You mean the Turing Test?

Turing held that computers would in time be programmed to acquire abilities rivalling human intelligence.

As part of his argument Turing put forward the idea of an 'imitation game', in which a human being and a computer would be interrogated under conditions where the interrogator would not know which was which, the communication being entirely by textual messages. Turing argued that if the interrogator could not distinguish them by questioning, then it would be unreasonable not to call the computer intelligent.

From turing.org.uk

The Turing test have been

The Turing test have been passed many times in real life allready. I've personally been involved in a few things that did so (not stuff I programmed though :))

What makes it even easier when it comes to blogging is the fact that a lot of the "real" blogs are of such a poor (writing) quality. All your program has to do is make the output just a little bit better than that.

And this brings up an interesting question: Why be so obsessed with who and how a piece of text is written? If my computer program can produce better pages than an average human then whats the problem? What is so bad about machine generated content? Some of my favrite sites are machine generated (search engines, news portals etc).

"Some of my favourite sites are machine generated "

I understand that Nick Wilson does not really exist...

..and that Threadwatch is written, maintained and run by a large computer somewhere in Denmark..

...whilst Mr Wilson himself lives on a private island somewhere in the Caribbean.

taht's crrocet cwronall but

taht's crrocet cwronall but im hvinag smome tourleb tihs mroning as the mgiac fiaires heav not been piad tihs week...


>Some of my favrite sites are machine generated (search engines, news portals etc).

Damn, what a great comment, Mikkel. I hadn't thought of it that way, but a serp or news page is indeed machine "assembled." There's very little distance up the supply chain separating machine-assembled from machine-created.

another emergency flare from the trenches:

The software that's generating these things is pretty sophisticated, you might think they were real at first glance.

Tim Bray

ummm, Tim, these guys are just getting warmed up.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.