Millions of Pages Google Hijacked via Open Directory feed

43 comments
Story Text:

Like many webmasters I was a bit too relaxed about 302 hijacks, until I started digging into who was hijacking my pages and how could I remove them. I have now discovered that this problem is numerically out of control, with millions of 302s out there. And boy, am I angry. If you have a Dmoz link, check for 302s - obviously Google have to have indexed the page thats 302ing you, for it to be a problem. Maybe its my fault for having lots of Dmoz links!

Surely not millions you say, but there are a number of scraper directories taking the DMOZ feed of 4 million sites and 302ing the lot. I have explored the depths of two such sites found here in my blog . I have no idea how many of these Dmoz fed, 302 scraper directories there are in the wild, but there has to be more than the two I detail. At 4 million 302s each, that is one hell of a lot of 302s

I have now removed 40 of these 302s from sites that were pointing to my sites, and according to Google’s own fact & fiction page

There is almost nothing a competitor can do to harm your ranking or have your site removed from our index

If I may say so, this is Utter Bollocks. Their advice to me is unlikely to get me anywhere, so I have acted myself and removed the pages 302ing me. Real 302s or I would not have been able to remove the pages pointing to my sites.

we suggest that you directly address the webmaster of the page in question

Thanks for that advice guys. Bit like asking a mugger if he would mind awfully stopping robbing you.

And if you are unsure what a 302 page hijack is, then perhaps the best explanation of 302 page hijacking is given by Claus Schmitt

An explanation of the page hijack exploit using 302 server redirects. This exploit allows any webmaster to have his own "virtual pages" rank for terms that pages belonging to another webmaster used to rank for. Successfully employed, this technique will allow the offending webmaster ("the hijacker") to displace the pages of the "target" in the Search Engine Results Pages ("SERPS"), and hence (a) cause search engine traffic to the target website to vanish, and/or (b) further redirect traffic to any other page of choice.

Comments

here was my rundown on the problem

called Google and the Mysterious case of the 1969 PageJackers...

i mean, it seems like it's been years. they really need to address this rather than giving us diversions like Google X. heck, i'd even give up their 'customized google news' to get this fixed ;)

Cloaking

Cornwall, on your blog you point to a number of examples, but i dont see any cloaking on those 302's - they're just normal 302's which is a pretty standard way for directory sites to transfer visitors to the site listed in the directory right?

Can you explain what the issue is with those sites, what harm is being done please?

Have a look at this then

cloaked 302

I don't agree, by the way, that the example I gave first is a standard way of transfering visitors. I can eliminate that "pointing" page by putting "no robots" on my own page, and getting G to zap their page. G check that my page has no robots before they zap. The 302 in that case is harmimg my ranking in my opinion. If I am incorrect, then someone here will no doubt put me right.

In addition if it were standard practice I would assume massive numbers of such sites would occur when I did the "allinurl" test

issue main point

The main point of the issue -- as in "the point that affects most webmasters" -- is that pages that are linked to with a 302 redirect simply disappear from the Google result pages. In some cases the whole "target" website disappears, not to be found at all (some have reported that not even a "site:my-domain.com" search yields anything at all.)

So, it's a "competitor zapper", as in "delete target website from index". It can effectively kill other web sites. The SE traffic stops, and there's no proven, safe way to get back in.

The offensive exploits -- as in "offensive to the general public" -- redirecting visitors to spoof sites, spyware sites and such are fortunately rare.

well it was only time..

I mentioned this would happen to a Google engineer in the citie of york pub about a year go..

DaveN

Google....

They've fully jumped the shark... It's going to be a hard road from here on in unless they do something drastic...

Problem is..

the same has any thing that goes main stream every idiot starts doing it...

DaveN

flogging the horse

..while still alive, here's a Slashdot thread for you DaveN (if you didn't notice it) - linking to a nice problem description and recipe, even... so, it's very much out in the open now.

"every idiot starts doing it"...including WMW ???

Perhaps somebody more accomplished in these things than I could say if this is indeed WMW using 302s (it seems to be, but one can never be certain these days), and perhaps conjecture as to why if it is. My understanding of the situation is that this will lead eventually to duplicate content penalties from Google for my sites, true or false?

I came across this odd serp in Google when doing an “allinurl” check for www.mysite.com, and it was to:-

http://www.webmasterworld.com/ra.cgi?f=19&d=114&url=http://www.mysite.com

Google had certainly indexed that link and a header check shows a 302

I then checked and found the similar link onto many sites, for example

http://www.webmasterworld.com/ra.cgi?f=19&d=114&url=http://www.about.com

http://www.webmasterworld.com/ra.cgi?f=19&d=114&url=http://www.bbc.co.uk

A typical header is

#1 Server Response: http://www.webmasterworld.com/ra.cgi?f=19&d=114&url=http://www.about.com
HTTP Status Code: HTTP/1.1 302 Found
Location: http://www.about.com
Redirect Target: http://www.about.com

Can anyone explain what is going on here. If this really is WMW, then it is certainly not a directory, and one wonders why it is using 302s in this way.

I love your sensationalistic headlines and stories, NickW

First you had that big story about "Google removed Greg Duffy from their index" at http://www.threadwatch.org/node/1822 which turned out not to be true. At all. Now it's claiming that some scraper sites doing millions of 302s = millions of hijackings, which isn't true either. I'll do most of most posting on Slashdot to debunk this.

An explanation

Would be nice then GG

Let's here what Google intend to do about this problem? You've called for examples of the issue yourself, and many webmasters have experienced this.

The fact that 302 hijackings happen are not in issue - they are clear for anyone that can check a page header and a google result to see....

Let's finally have some clarification as to what google plan to do about this please....

i second the motion...

please?

GG, please...

If you have specific differences with cornwall's info, that's one thing, but just getting shirty with Nick about the fact that he put it on the front page or didn't delete it to avoid "sensationalistic" headlines is a bit of a straw man argument.

I have politely and patiently sent

umpteen requests to investigate this issue, in most of them I have shown a specific example where Google's indexing was undeniably broken for a site. Every email that dealt with the specifics of the 302 hijacking was ignored. every email that dealt with my sites indexing in more general terms received one version or another of your form letters. My tentative conclusion on that is that Google does not wish to respond to the 302 issue.

Are you upset because this stuff has escaped the "SEO corral" and is being discussed at /. ?

RE: GG

Are you upset because this stuff has escaped the "SEO corral" and is being discussed at /. ?

DING, point for cchance.

How 'bout it GG, a little communication instead of finger wagging, would go a long way toward 'mending some fences'.

How do I report 302 issues to Google?

Googleguy: In your slashdot article, you stated:

Quote:
But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report.

I'd like some clarification. Perhaps people were not including enough information in their reports to assist your engineers in the matter.

Is there a particular email address you would like these reports sent to?

And, what format does the report need to be in? Exactly what content do you need to help people?

I posted on /.

I posted my comments on Slashdot.

Comments on /.

GoogleGuy I actually can't believe your post. As far as I can see from the above Cornwall posted something and Nick asked him to explain (presumably because he thought it was all a bit sensational). Nick didn't post again and next thing you're making some very interesting comments over there.

Nick may well be a maniac self promoting egotistical spammer who should write for the Enquirer (and your point is??) but show a bit of a sense or proportion here. Wow - stories critical of Google huh? That is SO out of order. Definately worth a public spanking on slashdot, especially as Nicks only actual 'crime' was to allow the thread to exist.

So, er, any comments on Autolink then? Shall I start a new thread with a suitable sensationalist headline? Clearly that's where we've been going wrong....

Not exactly a Friendly Bunch over there

Quote:
This is a HUGELY serious problem - and it's getting worse all the time as more and more people deliberately try to exploit the 302 bug. I've been hit by this bug myself, and let me tell you that unless you know EXACTLY what to look for you'd be stuffed - all you'd see is your traffic flatlining.

/.

oh, a new slashdot thread...

- didn't see it until now: link

Btw, GoogleGuy is right above, it's not like each and every site that's in DMOZ is being hijacked (and it's not every dmoz mirror that use 302's either). At least i know of a few that haven't been hit yet - i will not rule out that it has been attempted, though.

Still, i would not be surprised at all if there was a seven digit number of pages with wrong canonicalization (sp?) / wrong URL.

Not GoogleGuy

This is not the IP we normally associate with the usually friendly and courtious GoogleGuy.

Im a little fed up with this. If Google wish to talk to TW then please use a real name or one nickname per person like everyone else.

I've been wondering at the change of style and attitude and have received many, many complaints about it - that now comes to an end.

The person who posts under that nickname usually is welcome here, as are any Google employee - but now i think we need to know who you are, no more silly games and jeckel and hyde personalities.

Im talking to the Google employee who normally posts under the GG nickname now, and trying to come to some terms on this.

We'll see how that goes, then decide how to deal with it.

Thanks.

what gives GG..

I have known your for quiet a while, and this just does not sound like the GG i know.. and the the slashdot shit !! Thats looks like a forum spammer !

DaveN

Hmm

Gurtie, the headline is "Millions of Pages Google Hijacked via Open Directory feed". That's an interesting claim for TW to make, and yes, it struck me as sensationalistic. As far as I can tell, the basis for that is that cornwall found two sites that do a lot of 302s. That fact does not automatically lead to the the assertion in the headline.

uh, yeah...

If that's not the GoogleGuy we know and love, somebody needs to put the gimp mask back on the new one. I was just thinking that GoogleGuy went a long way today toward ruining their rep with the SEO community. I hope the real googleguy manages to slip out of the ropes and break out of that broom closet.

Do you think it's Mark Jen :)

is this the real GG..

gg said I posted on /.

I posted my comments on Slashdot.

http://slashdot.org/comments.pl?sid=97291&cid=8316487
a google employee posting a tool thats scapes google

http://slashdot.org/comments.pl?sid=98160&cid=8385317 and another drop to the same site

I guess this most be an offical product then ???

http://slashdot.org/comments.pl?sid=111072&cid=9428336 now could not see him saying wtf in a public forum

DaveN

it struck me as sensationalistic too

but what a very strange way to deal with it.

FYI

Whoever is posting under GoogleGuy here, has had posting rights turned off. We dont play those kind of games period.

Im talking to the normal GG and really hope to work something out with them about how Google can continue to communicate with TW and other places without the "multiple personality" promblem.

It's a bit sad to have to do, but then unlike some other places, i value this place and it's members far too highly to have anyone, even a google rep(s) cause trouble - on purpose or unintentionally.

Multiple personalities under the same nick, of a mega-corp we all have an interest in causes more trouble than it's worth (and i might add that that trouble is reflected directly on Google more than it is here...) so untill we sort this out dont expect a GG answer to questions..

Thanks all

If you RTFA

"If you RTFA, the author "

/. Googleguy is mean and uses rude language. I want the old friendly Googleguy back, he was much nicer...

who?

haha, there is no way this is the real GG, I mean, GG has annoyed me before, but the guy posting on this thread is a huge d***.

Ok what can we agree on?

1. A number of scraper sites are using 302s. Those sites I cited have over a million 302s in each. Hence it is justifiable to say millions. Further it is extremely unlikely, don't you think, that I came across the only two sites employing this technique?

2. My basis for saying it was from the DMOZ feed is that a check showed that only sites in Dmoz were on those sites. Actually it is neither here nor there whether Dmoz is the source or not, but it gives an idea of the scale of the thing,

3. I did say "obviously Google have to have indexed the page thats 302ing you, for it to be a problem."

4. So if we agree that there are a lot of 302s out there, then is there a problem with the "target" site losing ranking?

GG makes these points on SlashDot

Quote:
...doing 302 redirects to sites in an attempt to hijack them. Note that this does not mean that lots of pages were hijacked at all.

Which I think we would all agree with. He then goes on to say

Quote:
PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

Which is what I think we have all understood. However the logical conclusion of this presumably is that if a scraper directory with a high page rank (easily done with a bit of money) puts a 302 on your site, then you loose pagerank

That is what I am upset about. The fact that I have tried to bring this to Google's attention and only be told to email the offensing web site and ask them to remove the 302, sort of made it worse.

I lost over 30 sites in the last 2 months

they had been hit by 302's

DaveN

So on Slashdot:

Either:

1) Googleguy is swearing at slashdotters or

2) Their Googleguy is not for real ?

Either way seems pretty appalling.

This is funny

There was a thread here a bit back about the trouble with 'anon' search engine reps in forums & such and the issue of identity.

Maybe the "new" GG used a 302 hijack to remove the old GG? ;)

 

Googleguy - It might be fun confusing people who are trying to make a living on the Internet but its not right. Can we have one person per nick and agree that statements from a Googleguy identity are offical Google policy.

We are all professionals. Let's get the conversation back to a civil manner where everyone is treated with respect. I think we can agree that the 302 issue is not good for Google or webmasters. Let's not complain about sensationalism or exagerations of the facts. Let's focus on hard facts and suggestions to orvercome or fix the hijacking.

not that it's important, but...

there's a kpaul on /. who isn't me...

i'm not sure how to solve the anon SE rep problem, though, unless all site owners filtered any attempts to use GoogleGuy as a user account or something.

that, or GoogleGuy 'unofficially' lets us know exactly what forums he posts on.

I'm Sorry

Hey we all make mistakes ... just got a email telling whats what from Google..

I did say i would say sorry !

DaveN

what did you ...

make a mistake about? now i'm really confused...

dang those /. threads goes on forever...

Just finished page one because i'm reading at "-1"... yeah, i know, i could set it higher, but sometimes i like details... like, say, that the /. GG nick supposedly was aquired by Google at some point from another /. member that did not have any relation to Google and did not mind too much letting go of it...

Anyway, RTFA and TFA are just normal Slashdot jargon - i wouldn't put too much into that specific issue. It may sound bad to some of you, but it's just a figure of speech over there.

Added:
All of the above does not in any way justify that we should ever have to be in doubt. A company the size of Google should be able to afford 100% official representatives and even to put up a page somewhere on a google domain stating which forums they post to, and what nicknames they use there.

RTFA

isn't the issue. It's a big and previously so well presented company like Google loosing all comprehension of public relations is the issue.

It's not big and it's not clever and it makes them look like dicks.

And, nice as it is if they've e-mailed you to explain Dave, it still leaves the rest of us (or at least the rest of us who aren't important to Google and therefore didn't get nice personal e-mails) thinking GG has a bad case of split personality. Are you allowed to share it? I'm guessing not.

We all are Gurtie

Quote:
(or at least the rest of us who aren't important to Google and therefore didn't get nice personal e-mails)

important to them, once /. pick it up it only takes a mainstream reporter to zero in on it and produce a nice article...public perception changes....share prices drop...someone gets a kicking.

we are important

public perception changes....share prices drop

What I really do not understand, is why Google have not grasped this.

If this side of the great divide is to be treated like something the cat's brought home, and they do not consider it worth even debating the issue, then the boil will continue to fester.

(that's a bag of mixed metephors if ever I saw one)

This is insane

Whoever heard of a public company with a $50 billion market cap using anonymous, "plausible deniability" spokespersons to explain important policy issues to the public? Even the CIA has real public relations people with real names. Google, Inc. should issue a public apology for everything GG has ever posted on any forum, and announce that henceforth anyone who claims to be speaking for Google, Inc. anonymously will be fired just as soon as they can be identified. That way we can all safely assume that anyone making such claims is bogus.

Sorry for what, DaveN...?

Was it:

  • that you didn't lose 30 sites to 302 redirects but to some other site disease
  • that the GoogleGuy that you (and possibly we) know and love does say WTF in public places as well as liking weedy fantasy films
  • that you now regret that you harboured impure thoughts about Marissa and you wish to make it clear that your entire life up until this point has been spent in the moral dark compared to the light which is the pure and vital vision of Larry and Sergey

I think we should be told...

Sorry - because I implied

it was not GG,,

but stever, marissa is a hottie :) and GG does like the Princess Bride which I like as well..

DaveN

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.