Google's Matt Cutts' Blog GoogleWashed

71 comments
Story Text:

UPDATE: As Danny pointed out in this thread, what's happened to Matt Cutts' blog is not really "hijacking", though it does bear similarities. What has happened, is that the source of a story, in this case, Matt's blog, has been washed out of the results pages by duplicates on other sites - it's part of a noticeable problem in the way that Search engines hanndle duplicate content, and the trouble they have determining the original. In particular, Google seem very prone to this - and as you can read below, no one is exempt...

Well, strictly speaking it may not actually be what we've come to think of as the classic Google 302 Hijack, where one site can take over the rank of anothers by 302'ing for Google only to the victim site, but DarkSEOTeam are certainly having some fun with him.

If you check out this google search (here's a screeshot) for a specific phrase found on this post on his blog you'll see that he's nowhere to be seen - but the darkseo listings are.

Dayo UK discovered this in a thread about Google's dupe content filters, and member reseller sums up what is most likely going on:

What you are seeng my UK friend, is stealing contents in day light and Google isn´t able, as usual, to differentiate between the original contents and the duplicates. And as many fellow members, which sites either dropped totally from the index or just lost much of their rankings because of the same problem, have reported on this thread. It is a real disaster that both Google and the webmasters community are phasing.

We've talked about this before, and i understood that new filters in place were doing a better job, but when an employee of Google's site can be "hijacked" in this way, because Google cannot distinguish between original and stolen/copied content, it does rather hightlight the problem doesn't it?

Comments

Looks like Matt saw this

Looks like Matt saw this story before I did. :)

EDIT: Oops - depends upon the datacenter returning results...

Oops

It does highlight the problem - but unlike everyone else, Matt can get this one resolved in a matter of minutes ;)

Im surprised it's still

Im surprised it's still there, i actually emailed him before posting - i must be getting soft in my old age hehe..

too funny

gotta love the tall foreheads at the plex.

It might bring the dup content problem home to Mr Cutts' ...

Google, and their spokespersons, have studiously avoided making any comment on their continuing dup content problems.

I Can see a reason not to solve this particular instance

I can see the letters to Google now

"... My site has been hijacked... my competition has done... I need this fixed now because (insert whiny statement)... And when this happened to Matt's site it was corrected right away... and as a (insert how much business they think they give google), I expect the same treatment. I need this fixed ASAP because I'm losing (insert sum of money here)."

** edited: well rested... and in a better mood -lol **.

Natasha "That Girl From Marketing" Robinson

Hijack is definitely wrong

There's no hijacking here, Nick. Yeah, nice way to illustrate the duplicate content issue that's rapidly seeming to out pace hijacking concerns, but I think it's bad to confuse the two.

I mean, looking at the results, webcottagedesigns.com and webrankinfo.com alone have managed to hijack Matt without any superhuman talents of the DarkSEOTeam. Why? They just carried the first lines of his post on that subject, and Google's letting them outrank Matt for it.

Matt, of course, does show up in the top results for this search. So hijack? No. Should he have been ranked higher? As the source material, yeah. Should the exact page have been listed, rather than his home page that contains that phrase? Sure -- certainly an indented listing, at least. That's the issue here -- not that other pages are showing up, but that Google failed to list the very best page from his site, because the dupe content filter seems to favor his home page instead.

Removing the filter, brings the page out to be visible. But no one should have to do that.

Salt for the wound, Yahoo

Also fails to rank Matt above the others, but at least it gets the best page out there.

The Search is better without the quotation marks

The original search that alerted me was this

As I am sure that Matt said something about using sitemaps to help with Canonical urls - however - looking at that serp page Matt blog does not appear at all (Unlike the above posted example) - however you can bring it back by adding a filter=0 :P

Slightly different issue (some very strange matches at the bottom of those serps)

In an age of distributed

In an age of distributed content, rss/atom et al, this is bad news. You know, just about everyone and his dog has a scraped copy of TW out there - and no small amount of legitimate reprints using the rss - if you're blog doesnt have enough google juice, regardless of you being the original author, it seems your fucked...

thinking further

The entire thing needs a better name, and we've got two different issues as well.

Hijacking -- another site gets their URL listed even though clicking takes to your domain. Easy to understand the concept, and nice name.

DupeWash -- the duplicate content filter causes the best pages from your OWN site to get lost. Usage: "I got dupewashed with that update." I'm not wedded to that term, just think we could use something less cryptic than "duplicate content filter."

SourceWash -- Related to DupeWash. It means the source material is washed out by all the duplicate content from across the web. You post, then 8 billion blogs repost what you had, then the source material gets wiped out.

Orlowski called the above GoogleWashed way back in 2003, http://www.theregister.co.uk/2003/04/03/antiwar_slogan_coined_repurposed/, and when I did my write-up on the miserable failure search back then, http://searchenginewatch.com/sereport/article.php/3296101, I felt like he nailed a real issue that continued to grow. Clearly it has kept on.

We could go with GoogleWash, but SourceWashing isn't just a Google think. And I don't even necessarily like that name, either. Hmm, maybe I'll do a poll.

GoogleWash

people will "get it"

hmmm

yeah, for that query dayo you just posted, that feels weird. I just did this [canonical url sitemap matt cutts site:mattcutts.com]. That got me only one page from his site, the bacon poleta page coming up. But the content is coming out of the comments. I'm wondering if Google is recognizing the comments at Matt's blog and somehow discounting them a bit, while dark hat may have grabbed the raw text without any type of comment cues. Would love to have seen the cache!

Am I watching an SEO term

Am I watching an SEO term being coined here?

Not hijacking?

Not strictly hijacking--more like a different page shows up for a random phrase I used in a blog post? I'll investigate, just to see if the other folks have done full-fledged SEO efforts for those phrases and to see whether the algorithms can be improved, but as Natasha suggests, I'm not going to do anything special for my blog. Heck, part of the idea of doing the site and having things like both www and non-www versions and stuff like that was to see how different engines handled it.

Just checked and right now I'm number 1 in the world for [bacon polenta] (woohoo!). So I'll take this as a report that my site doesn't rank #1 for every phrase from my blog, not as a hijacking or googlewashing (I'll have to go back and read Orlowski to remember his definition). I'm sure that there are lots of things like that, but thanks for mentioning it. Again, I'll check into it but won't be doing anything special for my site.

Hmmmmmm

Why on this search does Matt site come up?

Here

Are they now somehow linked together ???

Ooops - answered own questions - I guess as site is linking back to Matt (although Obv we cant see the cache)

Relevance?

Better get the slide rules out if your page is the best/most relevant return for Bacon Polenta. Seems to me there may be other sites that would be of far greater releveance than Matt Cutt's blog for this search.

Emiril should put a Bacon Polenta recipe on his site. I wonder if Martha Stewart ever featured Bacon Polenta...

Matt - I dont understand

Your site has clearly been filtered out as another site has taken your content.

These are not random words they are words that should pull up your site - and do - but you are filtered out as another site has got your exact copy.

BTW - Only one search engine seems to have problems with non-www and www :/

Best wishes - I am glad that you are happy it was raised - I did not want to upset the apple cart but my friend from Denmark persauded me :)

The Great Bacon Polenta Conspiracy

That's really quite amazing...

Matt - I dont

Quote:
Matt - I dont understand

Your site has clearly been filtered out as another site has taken your content.

These are not random words they are words that should pull up your site - and do - but you are filtered out as another site has got your exact copy.

You don't?

"These are not the results you're looking for, you may go about your business..."

heheh

Relevance

AmericanBulldog is 100% spot on, there's no way your blog is the most relevant result for [bacon polenta] however yahoo [bacon polenta] and MSN [bacon polenta] have it wrong as well. I could see this SERP become a litmus test for relevancy.

slow on the trigger

damn, nick bet me to it.

I have that customizegoogle

I have that customizegoogle thing on FF, i just clicked all the "try your search on" links :)

Please, just say no to "-Wash"

What gooky terms. "SourceWash"? Sorry. This isn't a laundry issue.

If you have to come up with a cutesy name for it, call it DupRanking or something meaningful and relevant. It's a quality name, the best since Florida, and surely beats out "Hilltop" for a new concept in authority definitions.

Duplisting may be better.

Or, how about, Floridup?

Get your Dupes up, dudes, and be resourceful.

No more of this "-Wash" stuff. Ick.

"bacon polenta"

..shows the problem. Matt's blog is top for "bacon polenta" but not for any longer phrase from that sentence.

Take
"I never would have discovered the joy of bacon polenta"
and the Mat Cutt's blog is not ranked

Reduce your searches one word at a time till you get to
"of bacon polenta"

And each search still ignores the real McCoy.

Only when you seach for the two word phrase do you get McCutts.

Matt needs

men in black neuralizer

I can see the headlines

In an unofficial relevancy test at threadwatch.org leading search industry experts determined that altavista had the most relevant and spam free results, beating out Google, Yahoo, and MSN.

Bacon Polenta

graywolf, AmericanBulldog, if you read the post, you'll see how pleased I was that I thought I'd invented bacon polenta, because some bacon fell in my polenta. I'm sure I didn't, but ["bacon polenta"] only has ~300 results, so it's not something that a lot of people talk about. :)

Amazing?

Amazing? I don't think so as there are very few pages in the various indexes with an exact match ["Bacon Polenta"]

Google & Yahoo have about 300 results & MSN has a whopping 9.

oops - as he said :)

oops - as he said :)

Yes but

there are only 2 unfiltered results for :-

"I never would have discovered the joy of bacon polenta"

and you are not 1 of them :/

Matt we're just teasin' you,

Matt we're just teasin' you, but I think you'd have to agree this SERP is wonky everywhere. Even though google has ~300 results the SERP should still be accurate and relevant, so lets not go down that road.

For the record here are some actual bacon polenta recipes from as far back as 1992 before regular people started the using internet.

Oops 2

Without the quotes you still rank #1 and G has Results 1 - 10 of about 280,000. Evil.

Alta Vista are the only ones that do it right, and their's is bigger, hehe. AltaVista found 328,000 results

And, you'd think out of only a few hundred G would get it right, imaging how far off the mark they may be on bigger searches...

Matt´s Bacon polenta ;-)

Hi Folks

My first post.

Sometimes you talk about things without knowing that it is gonna be THE NEWS next day.

Yesterday I exchanged posts with my WMW friend Dayo_UK, and I actually mentioned Matt´s Bacon Polenta (Copyright Matt Cutts 2005, All Rights Reserved ;-) ) in connection with Google´s canonical issue msg #:841

on this thread

I guess that Matt wouldn´t touch neither the bacon nor the polenta for the rest of 2005, or at least not before the next major update ;-)

Dayo_UK - well, exactly... I

Dayo_UK - well, exactly... I think others are missing the point... Why should the original Matt Cutts article be omitted while the other sites that are using his content be ahead and showing when searching for "I never would have discovered the joy of bacon polenta"

(obviously forgetting the fact it's a daft sentence to search for...)

In order to show you the most relevant results, we have omitted some entries very similar to the 2 already displayed. If you like, you can repeat the search with the omitted results included.

Yes, I am more worried that

Matt is missing the point though :/

Although I think he is probably seeing the point but not wanting to comment ;)

sub plot

Clearly something is a amiss with the ranking algo's if Matt's site is the top serp for Bacon Polenta. No disrespect intended, as Matt's post is original and unique but it's clearly not really even about bacon polenta. This is a flaw in ranking parameters with too much value given to internal page rank, trust rank, authority score or whatever other internal ranking parameters the search engines are using to determine a site's "importantness", and how this "important rank" trumps a whole host of other ranking factors.

Get out the paper and pencils, and order lunch in there are some lessons to be learned here.

Well, the post at Matts blog

Well, the post at Matts blog is clearly optimized for "Bacon Polenta" and not for all those other phrases. So, it's really no wonder he ranks for "Bacon Polenta"

As those of us who have been hit by this Google bug are aware..

this is a deep problem with Google serps.

However it has not effected that many people. I ran this thread at TW a few days ago and very few takers. And has managed to avoid any real discussion either here or at WMW.

The dup filter at G is seriously flawed, whether they have now taken it aboard remains to be seen.

Yes, I noticed that post here Cornwall

I am more than willing to discuss anything about Canonical urls - and it has definetly effected lots of people.

But it is all one way discussion - everyone seems to agree there is a problem - but the solution of doing a 301 does not seem to work anymore as Gbot is so inactive for sites that have had the problem for a while.

Hang on I will just search for something encouraging from Matts Blog ;)

As we work down the list of canonicalization issues that people run into and cross them off the list, I wouldnt be surprised if this issue + 301s taking longer than before is the next thing on the list. as quoted on Matts Blog

Oops hang on a minute - it appears not to be in the results ;)

It is on the Bacon page :)

Cheers

Dayo

ownership in a web 2.0 world

So who wants to help Matt Cutts do a link campaign to help boost the authority of his site? That also might help him overtake Jeremy Zawodny's blog in a Google search for his own name. http://www.google.com/search?q=matt+cutts

Until there is an online copyright office that keeps a record of online content ownership (and how would that work on the global web?), the only thing that search engines have to go on for determining who is the proper source for content is power through authority, trust and links. Anyone have any better ideas of how to return true content sources in a Web 2.0 world?

Also, (playing devil's advocate) is it really a bad user experience to have a website that has aggregated a blogger's RSS content with other sources appear above the blog that originally posted the content? Isn't the aggregator adding value to the user experience? And if the content is interesting, won't the person click through to the original blog anyway?

I think it should be called

I think it should be called the "Bacon Polenta" problem in honour Matt's invention of a meal and it being used as a perfect example of the issue. Plus, the trad media boys n gals would love to hear the reason behind the name. :)

subplot

I agree with claus about the subplot; I actually put it in the title and talked about bacon polenta in the text a fair bit, while most of the other ~300 posts are talking about bacon and polenta in passing (e.g. mention it adjacent just b/c two ingredients are next to each other). There are bacon polenta recipes, but I mention it more on-page (e.g. in the title) and I'm sure that people link to me as well. I didn't check what the bacon polenta results looked like before I posted that. It's an interesting exercise to take a page and turn every word into "blah" except for the query words. It's fun to see humans trying to pick which page is more relevant, and harder than you'd think.

Regarding the larger question, I asked someone to check it out, but then I had to go to a meeting. My ears are always open for bugs or issues to triage, so I view this as a good report of something to look at more. Not to do anything special on of course, but to check out in good detail.

Heh, nice one Jason

Help! I have been "Bacon Palenta'd"

Help! I have been "Bacon

Help! I have been "Bacon Palenta'd"

Can we have a vegetarian alternative, please?

;)

Okay

Sounds good Matt.. :)

bob and weeve

Matt, how about the SERPs for a search on:
the joy of bacon polenta
are those more or less satisfactory than the SERPs for bacon polenta?

A competitor can ruin a site's ranking!!

Now we have seen what happened to Matt´s blog and the delicious Bacon Polenta, there is a remaining question to be answered:

Is it fiction or fact that a competitor can ruin your site´s ranking?

From Google webmaster guidelines :

-----------------------------------

Fiction: A competitor can ruin a site's ranking somehow or have another site removed from Google's index.

Fact:
There's almost nothing a competitor can do to harm your ranking or have your site removed from our index. Your rank and your inclusion are dependent on factors under your control as a webmaster, including content choices and site design.

------------------------------------------

Hurting Competition

I think most of us have known for awhile that hurting your competition is not only plausible, but fairly easy to do. Whether it be sending them a few ROS links from adult sites under dirty words or copying their text on other pages throughout the web, it can be done.

No problem with the site ranking for Bacon thingy..........

It just that it is missing for the phrases that are taken from the site which you should be able to find the site with.

Matt - If it follows the pattern of what happens to other sites you may also lose the ranking for Bacon Thingy too :(. Especially when your non-www issue kicks in (Unless it does not effect newer sites and it is just sites launched in a certain period that are in the black hole)

PS - Any news on Canonical urls ;) - dont distract the engineers long from looking at that problem.

Aging Delay

Isn't this just an obvious example of the aging delay at work?

Re: Aging Delay

Hi Jill

Did you mean: Google has sandboxed Matt´s blog ;-)

Jill I like your vocab

'aging delay' is more specific

and I think you're right.

Aging Delay

I think you may be right as well. God knows that every site made in the past year is spam. And after that year, they turn to non-spam. :-)

yeah

into bacon clearly....

Not really a mystery, imho

The other 300 sites were less optimised for "bacon polenta" than Matts post is (which is on a well linked , albeit fairly new site). The darkseoteam website is clearly running some experiment, so who knows what's going on with the cloaked pages (on that pretty well linked, and perhaps older site - it could very well, eg. have been used in some "seraphim proudleduck" competition that apparently ended 01.01.2005). IOW they're likely to rank for inferior phrases on whatever web page they choose to copy and link up.

Plus, the URL you see in the SERPS (Matts front page) ...well, that's one of the thing about blogs. Click the link in the SERPS, scroll down Matts page, and discover for yourself. It was clearly the most "relevant" page on that domain at that time (one of the other things about blogs is that they duplicate or triplicate, well "multiplicate"... everything, so when there are a few pages with that post on to choose from, the front page is a pretty safe bet for a winner, imho)

Bottom line, they just rank better for those inferior phrases than Matt do. And Matts site isn't strong enough yet to rank for them all. (But it sure looks like it will be, someday, so don't use "coca-cola" in your headlines *lol*)

I might be wrong, but I think think this is a marketing stunt from the darkseo website. Matts blog seems like an odd choice of target from any other point of view than publicity.

Bacon

Bacon.... yummmmm.... drool

Is there anything it can't do?

---

Quote:
Did you mean: Google has sandboxed Matt´s blog ;-)

I mean that every single new site is subjected to a 6 - 12 month (or so) aging delay where it can't rank for anything other than silly words like "bacon polenta" etc.

Why would Matt's site be any different than any other?

some sites can rank quicker for some terms

I mean that every single new site is subjected to a 6 - 12 month (or so) aging delay where it can't rank for anything other than silly words like "bacon polenta" etc.

not sure if this counts, by my MyriadSearch.com ranks #11 or #12 for the word Myriad out of 28,000,000 results.

The domain is about 2-3 weeks old & I have not bought any links for it.

People are still missing the point IMO

OK, Ignore Bacon thingy - that is not the point.

The point is that someone has taken Matts content and this has therefore totally removed him from the serps on many search phrases. Yes, fine they may have higher PR and a more establised domain - however it just shows that any site which is more establised or higher PR can steal content from new sites and rank for that content.

If people think this is expected and desired behaviour then OK - but I personally dont think this is the way Google should treat these situations.

Sites CAN rank quickly

I've seen a site go from nowhere (new domain) to top 5 in Google for reasonably competitive (travel industry) terms in under a week. The sandbox CAN be broken, but you do need some pull to do it

The point

Dayo_UK I'm not missing your point, don't know about others.

Of course it's unfair. It's the same thing that happens when some major newspaper writes an article without crediting the source properly and then they get perceived as being the source in the general public.

Search engines don't really have a reliable way of establishing who got there first, it seems. At least that's not what their SERPS show. I've tried doing blog searches and news searches sorted by date, but even here that dimension isn't really covered, as it's always latest first (and those that aren't shown aren't shown here either).

One should think that quotation analysis could help solve this, but if people just quote without saying who they quote from, then it's probably hard.

Still missing the point

I agree. When you look at the dup copy situation:-

"I never would have discovered the joy of bacon polenta" does not get Matt Cutts blog

"would have never discovered the joy of bacon polenta" does not get Matt Cutts blog

...and so on down to

"joy of bacon polenta" does not get Matt Cutts blog

It is only on the two word phrase
" bacon polenta" that you get a Google return in serps of the Cutts blog

This is IMO counter intuitive. In other words you would expect the real canonical page to rank for the longer quote rather than the shorter one. And shows a serious flaw in their filtering.

matt ..

just put a link from here http://www.cs.unc.edu/~cutts/

--

Quote:
If people think this is expected and desired behaviour then OK - but I personally dont think this is the way Google should treat these situations.

Which is why the whole aging delay thing is ridiculous and Google needs to get rid of it.

But..

Quote:
Which is why the whole aging delay thing is ridiculous and Google needs to get rid of it.

But that will hurt Adwords revenue.

But that will hurt Adwords

But that will hurt Adwords revenue

I bet they would make it back with adsense

Guess who ranks for Bacon Polenta now..

..TW comes top of the three results Google returns for..

"I never would have discovered the joy of bacon polenta"

Matt still does not feaure for the longer phrases, but I put this in by mistakes with two full stops (periods to the unwashed)

"I never would have discovered the joy of bacon polenta"..

And his blog now does come up top of three results (darkseoteam.com disappear from this now). Wierd and mysterious the ways of the Google dup copy filter / canonical page algo.

Open Letter to Matt from darkseoteam

Though I don´t endorse what the folks at Dark Seo Team have done, I think their open letter to Matt worth adding to this thread, for the record at least:

http://www.darkseoteam.com/index.php/2005/10/06/23-open-letter-to-matt-cutts

OLD THREAD

jezzzzzz

EDUs Can Steal Your Content and Your Traffic

This is a slightly modified message I just sent to a uni ..

Webmaster,

Please forward a copy of the email to your administrative executives.

Let me preface what follows with a *theory* that is supported by many webmasters. Search engines, especially Google, confer special status on any pages on an .edu server. When duplicate content is encountered on a .com and an .edu, Google often penalizes the .com site in favor of the .edu.

So if a sanctioned or non-sanctioned page has images or content taken from a .com site without permission, the .com may well be penalized for showing THEIR OWN CONTENT.

One of my web sites was recently penalized as result of similar actions taken by a webmaster with a page on your server. It even appears that the person responsible for taking the

copyrighted image is not faculty or staff, but a relative of a faculty member.

While the university may not be responsible for the content of non-sanctioned pages, you are ultimately responsible for copyrighted material on your server.

But more important is stopping these types of abuses. I recomment that XXXXX University pay more attention to what is on your server .. and put policies, checks and balance in place to stop copyright violations.

FWIW, I plan on sending a version of this email to all major search engines.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.