New Duplicate Content Filter at Google?

Source Title:
My site has now vanished from Google
Story Text:

This is one of your long, long threads at WMW. Now 400 posts long it started a week ago and still going strong, got a renewed impetus on Friday, when a new Google filter kicked in. You can see the effect if you use the “&filter=0” to blot it out (eg “London hotels” as now, and ” “London hotels using &filter=0)

In message 188 Reseller quotes an old GG post as saying

“the "&filter=0" just shows all pages, even pages that appear to be duplicates. Normally if there are pages that appear to be duplicates, we try to pick the best one to show (but no heuristic can be perfect--it's like having two people claiming to own the same essay).

Caveman in post 171 says

The dup issue that steveb outlined so well explains some of what I see, but not all of it.
There are cases where established site homepages and subpages are holding their ranking for one phrase, but dropping out of the SERP's for another closely related phrase (when the site previously ranked for both) ... and where there is no evidence of dup content filters playing a role where pages dropped out.
They've tweaked something else IMO. Possibly related to linking/anchor text/kw patterns.

Something has been tweaked by Google last week. It has not effected all sites, but has certainly knocked some of my sites that were hit by the last “canonical pages” and “duplicate content” issues at Google. It seemed that their latest attempt to identify the “canonical page” is as flawed as the last one. If you have, like me, sites that have been badly scraped by others, then this “new” filter does as bad a job as the last one at identifying the canonical page.

I might add that Mr Cutts has not made an input on the WMW thread - wait till it blows over Matt!


Tell me, does setting filter

Tell me, does setting filter to 0 just show search results without any filter or just without the content penalty?


'Tis a nightmare indeed for my "artistic writing" site. It's been around for over five years so scrapers have had lots of time to use bits from it... kinda strange Google decides it's MY problem now and knocks the site back to page five from one. I miss those 1,000+ extra Google visitors a day!

setting filter to 0

Certainly takes out the effect of their latest filter.

But nobody really seems to know whether the filter is only duplicate copy (as referenced by that old GG quote)...

.. or whether it is related to linking/anchor text/kw patterns (or whatever)

I was running it past TW readers to see if anyone had a more authorative take than me. My feeling is that it is another off target bash at duplicte copy, as they continue to grapple with the problem.

Blah 2

One of my oldest sites has been affected by this latest filter thing, I've never really bothered looking at who's copying my content etc, as I'd always assumed that as it was up first it'd all be OK.

I'm not sure if it's related but I didn't protect all my pages from the www / non www issue only the index page, my mistake, so that's something I'm sorting out at the moment - incase that makes any difference.

Having spent a couple of hours looking into it, I'm impressed by how far and wide my content has been distributed - from the eBook that's using a load of the reviews submitted on one of my topics to promote his book, who didn't even bother to change the table colors to fit his site, just a plain html cut n paste (I'm tempted to buy the book to see how much more of my site I'll find) - to the guy who nicked some other content and then syndicated it out as his news - and of course the usual suspects: the ctrl+C ctrl+V happy msn groups people, forum posters, those great .ru scrapers that don't just give you nice 302'd links, but slap a full copy of you home page at the bottom of a few "directory categories" to stop themselves looking too much like a scraper, and not to mention the very helpful "archiving" services out there as well – fantastic :P

added: I'm of the same opinion that it's a hit at duplicate content, that's caught a few innocent sites as well - once it's settled down, if I'm still on the wrong side of it, I might just try re-writing a few of my most copied pages to see if that puts it right.

A new thread at WMW

The original thread that I quoted has climbed to over 800 posts at WMW.

Now caveman has put up a new thread , having distilled the problem down.

The fact that even within a single site, when pages are deemed too similar, G is not throwing out the dups - they're throwing out ALL the similar pages.

The result of this miscalculation is that high quality pages from leading/authoritative sites, some that also act as hubs, are lost in the SERP's. In most cases, these pages are not actually penalized or pushed into the Supplemental index. They are simply dampened so badly that they no longer appear anywhere in the SERP's.

and the conclusion

What is an Affected Site To Do?
One option, presumably, would be to stop allowing the robots to index the lesser pages that are 'causing' the SE's to drop ALL the related pages. But this is a disservice to the user, especially in an era when GG has gone on record as taking pride in delivering especially relevant results, and especially for longer tail terms.

Should we noindex all the bee subpages, so that at least searchers can find SOME page on bees from this site? (I'm assuming that noindexing or nofollowing the 'dup' pages that are not really 'dup' pages at all would nonetheless free the one remaining page on the topic to resurface; perhaps a bad assumption.)

In any case, I refuse. Talk about rigging sites simply for the purpose of ranking. That's exactly what we're NOT supposed to be doing.

G needs to sort this out. ;-)

Google have not managed to solve the problem of dup content before, and unfortunately I cannot see them douing it now.

Dupe content is a tricky

Dupe content is a tricky one.... no matter what approach you take, there's an exploit. Somehow, all SEs, not just G, have to come up with a more reliable "quality metric", link analysis is creaking these days, IMO...

The first one to get user data integration down will probably win

Expand the scope

I don't buy that's a bug. There is a lot of writing about bees out there. Reworded snippets and quotes may have triggered a dupe filter comparing blocks of the pages with information provided on other sites which have the source bonus. *If* that's the case, all manipulation of robots meta tags and robots.txt don't help.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.