Then the software mentioned in this post might be interesting to test and write up for TW?

It's too big for me to download on my horrible dialup tonight (needs .NET framework at 24mb) as it's already late, if it takes your fancy, post here so we know who (if any) wants to do it, and we'll discuss it in the resultant thread...

I'll give it a run through this evening.


Wonderful! thanks DG...

me too

That is... I'm going to take it for a spin, but unfortunately I'm not sure when, and I can't really promise to do a writeup... I'm pretty busy these days, so there's very little time for play.

There's a forum for it as well, although with only two posts it's not an active one. Here's the Help/Faq

Very interesting thingy, I was actually planning to write something like that myself but as I'm always busy it's nice to see others have done so already ;-)


Working on this, there was a conflict with some apps I run, should have it sorted on a bit.

possible extension

Someone should put this on a proxy and sell the results back to google.

got filter - need spam

So i've got it running on IE, now I only need some suggestions for keywords that will produce:

a) highly spammy Google.com SERPs
b) highly commercial Google.com SERPs
c) highly informative Google.com SERPs

...for filter training. I know a few, but I really need a lot, so fire away.

I guess it's best to PM me, as not to out anyone. I'm not going to publish any terms anywhere.

seems to work

- my first round was to enter a few of the "various medication names" and mark some sites as ham and others as spam. After entering just a few examples this thing can flag spam on searches you haven't done before (still in the medical region though).

Now i'm off to try some adult terms.

Added, a bit later:

The first adult search actually identified a spam listing, although the filter was only trained with medical terms. So, that seemed okay, and I entered a few other terms, and marked some ham and some spam.

The filter learns pretty quickly, although it tends to be too aggressive in the start, marking far too many sites in one category or other.

After looking at several different pr0n SERPs i turned toward the commercial ones: Entered a few well known brand names, identified placed where i could buy them and marked these as "commercial" - also marked some as "informational" and a few as "spam" (not that many though).

Overall, the filter seems to do what it promises to, and it's easy to use. It just takes some more training as it's still too agressive, but then again, i've only played with it for a few minutes.

Added, later still:

It collects the search terms and the scores in a XML file in "C:\program files\somewhere\..." - nice. That way you can examine the individual tokens it uses to classify as "spam", "ham", and "commercial".

As an example of "tokens", it seems "!" and "-" are pretty frequent on spam pages :-)

(note: my personal "spam" classification will probably not be the same as yours and it will probably not be the same as that of any search engine either)

Added, once again:

The program runs as a proxy server on your machine. I don't really know why, but it seems to slow general surfing down on my test configuration (win XP + IE6). That is - also on sites that are not Google.

That's rather odd, as i have 1Gb ram on this machine - so, that little bit of extra processing shouldn't really be a bottleneck in any way. Also, I'm on pretty fast broadband, so I'm used to pages being there instantly, even with graphics.

Anyway, when i say "slow down" i don't mean minutes, it's just a few seconds (max), and it's not always.

no more examples needed

no need to pm me keywords - i knew far more than i thought i knew *lol*

Not For Me

As you've already been provided with a detailed analysis, I'll just offer up this;

I don't want to constantly train an application in order for it to perform. Change categories? Training starts all over again. No thanks. Casual surfers will never use this. Power surfers will find something else.

Currently, Yahoo Mindset is looking pretty good.

Y! mindset

Great find DG :)

Where did that one come from? I think i've heard people asking for such a thing for as long as i've been reading forums.

Apparently, Yahoo Reps Read Fora

The commercial v. information SERP separation is definitely something I've seen people write about and ask for. Y! is tackling the challenge. Seems a bit odd, Y! rising to tough search challenges and G creating a portal interface. ;)

