Google Hijacking? Nay, Its the law of this land!

34 comments
Story Text:

I am seeing all over the net a discussion on 302 Hijackings and that Google is evil.
But the thing is no one is discussing the actual cause of it. The actual cause is the HTTP Protocol that says explicity: "10.3.3 302 Found

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field."

Emphasis not mine.

You can read it for yourself

Now we all know the importance of protocol. Its a communicating language. In this case the protocol was basically developed when the WEB was pure and unadulterated. Where people expected others to follow and not misuse the protocol.

But with money always comes greed and dishonesty. WEB originally was not built with Business in mind. It was for free Information Interchange. But it has just evolved to a state where Commercially the WEB can be harnessed (exploited whatever) for its potential.

So now any search engine that follows the protocol to the letter is in effect aiding the Hijacking, but is it the mistake of the search engine or the protocol? Unlike Human languages, protocols dont evolve uninhibited. If it did then very soon no browser can understand all the servers and vice versa. i.e you might need 10 kinds of browsers to access 10 different website, beacuse those 10 websites talk a different language. (Now come to think of it, is this not what is happening in the DRM world. You download music from one site and you can't play it on another without a hack).
That is the reason there is a standard and it gets revised every so often so that it can also keep up with the times.

So some of the suggestions like throw the redirecting page into the bin and keep the target page will really have web wide repurcussions for people who use it with the standard in mind and with a legitimate purpose. So you ask who uses it and for what purpose?
Let me give an example.
Ever tried buying from Amazon.com?
Okay how do you reach the homepage?
Well i type in amazon.com into my browser and i get the page. BUT the url at which i get the page is exactly now http://www.amazon.com/exec/obidos/subst/home/home.html/103-7996157-2162261

Use this server header tool for understanding what hapens
http://www.webrankinfo.com/english/tools/server-header.php

1) Enter www.amazon.com
It says
HTTP/1.1 301 Moved Permanently
Date: Thu, 24 Mar 2005 14:38:22 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: skin=; domain=.amazon.com; path=/; exp
ires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com:80/exec/obidos/sub
st/home/home.html
Connection: close
Content-Type: text/plain
So amazon.com doesnt exist (dont mistake me, the page amazon.com) what exists is http://www.amazon.com:80/exec/obidos/subst/home/home.html.

2) Now enter http://www.amazon.com:80/exec/obidos/subst/home/home.html in the box.
It says
HTTP/1.1 302
Date: Thu, 24 Mar 2005 14:40:48 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1112256000; path=/; do
main=.amazon.com; expires=Thursday, 31-Mar-2005 08
:00:00 GMT
Set-Cookie: session-id=002-8272699-5270422; path=/
; domain=.amazon.com; expires=Thursday, 31-Mar-200
5 08:00:00 GMT
Location: http://www.amazon.com/exec/obidos/subst/
home/home.html/002-8272699-5270422
Connection: close
Content-Type: text/html

So now the home page is temporarily at http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422

If Google were to follow your advice the home page of amazon will be http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422
But this URL is so temporary that if you try to access the homepage even seconds after it gets re-directed to a new url with some other no in the end.

Currently searching for amazon to this day gives amazon homepage as being www.amazon.com/exec/obidos/subst/home/home.html and not someother url. Personally a thumbs up from my side.

What is the purpose of the amazon url dance? Its for tracking. You want Amazon to stop tracking in their websites because other sites are hijacked?
(I am not saying that your hijacked website is of lesser worth that big websites but the thing is your website was defrauded using a perfectly legitimate method because the method has no failsafe built into it)
Amazon isnt the only one. Lots of sites do it. And it is even more common in the education and the open source world where they redirect to other domains because things are shared even more.

So now coming back to one more comment, it was something akin to
"When i get mugged and complained to the cops, the cops tell me that i am supposed to ask the mugger to stop it!"
So in one stroke the commenter has said that Google is the cop of the net and that they are useless as a cop. Let me just tell you even in real life the cops are only as useful as the laws that back them up. And in this case the laws have not yet been written for handling mugging.

And do you really want any one company to change the protocol on its own in its own way? You know what this will lead up to? Three different search engines reading the protocols in three different ways. And this will only be a start. You know what a mess it will be trying to work with these engines.

This is a problem that has to be nicked at the source i.e the protocols. Lets stop laying the blame on the first step we come across.

One possible solution is the adding of an meta-tag protocol.
Meta-tag = 'Redirect, source url'
Value = 'accept'

And for inside the same domain redirect
Meta-tag = 'Redirect, yourdomain.com'
Value = 'accept'

(There will be better methods but that is for the w3 consortium to decide)

The protocol should specify that if the meta-tag is not there its default value is 'not accept'. And the standards committee must ensure that the world wide web speaks the same language.

AND THEN if Google lets your page be hijacked, we can blame Google. Not now when they are doing their work AS prescribed by the laws.

And in the Interim whats the solution?

I think Google's algo is pretty robust in weeding out MOST of the hijackings.
How do i say it? My experience alone. The thing is what you all are seeing is just the tip of the ice berg.
Pages still do get hijacked, but they are minisucle compared to the pages that dont get hijacked even though they are targetted with 302 redirected links.

I hope i have clearly explained where lies the problem. In the web hysteria that is following this current discussion majority are not being explained the facts. They all are being given the impression that it is Googles mistake alone.

And no i am not a paid or unpaid spokesman for Google.
I like to call a dog 'A Dog' immaterial of whether he bites me or he faithful to me.

Yogi

Comments

 

Hmmm...yes, it may say the client (the crawler) should reference the uri next time but it doesn't say anything about what it should store in the databases. It's up to the search engines to implement something sensible for their purposes (I see nothing wrong with the protocol nor it intended purpose).

The issue relies on search engines keeping historic data which would allow them to provide a far better determination of the validity of a 302 redirect (for example: how can a page be temporarily redirected if it never existed in the first place?)

>hysteria

Hey, thanks, yogi. To tell the truth, I've not done more than crack most threads on this since (A) I still rank and (B) I'm not keen on hysteria.

That said, I do have many friends in the biz who are widely known as, uummm, authorities on 302s and they say that Yahoo does a better job. If so, how?

great post.

will try to digest it all later... i don't want to lay the blame on anyone (except maybe the spammers), but if all involved parties could sit down and rationally come up with a solution, that would be cool.

thanks.

Good point

I was going to ask the same thing rc, it's my understanding that Yahoo don't have this problem at all - if that's true, how are they doing it, and why is Google not following suit?

Yogi, that's a wonderful post: Welcome to Threadwatch and please do introduce yourself

What Chris said is important though, if i run a website, i dont blame protocols for showing the wrong content on that website, i move to fix any problems with my website and show what people expect right?

 

Yogi. Top marks for an excellent post.

Thank you!

Why blame Google...

..because they make the call on using PR to sort out duplicate pages.

Googleguy (if it is he) put in his SlashDot post yesterday

Quote:
When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

If a spammer buys pagerank, then they gain from 302s duplication penalties.

Pagerank is a Google call, and they appear to admit to using it to determine which of two duplicate pages to penalise.

QED ??

Yahoo solves it?

Hi Nick. Thanks for your welcome.
I will surely introduce myself.

Now its the opinion of most that Yahoo has handled it well.

Search for amazon on yahoo. It lists the URL as www.amazon.com. Now that url is dead.

What Yahoo has done is (Me Thinks) just thrown away the baby with the bathwater.
Search for a single redirecting link on Yahoo. You wont find many because it just takes the targeted page and stores it in the database.

You might think that it makes sense. But you must understand that redirection is mostly with bad intentions in the case of commercial sites and it is mostly with good and legitimate intentions with non-commercial sites.

And G does have a much more bigger percentages of non commercial searches. It just cant afford to throw the baby away.

Technically speaking, Google is not less competent than Yahoo that it cant make a change that it disregards the redirecting page and keeps the targeted page. But as said in a field where servers are shared, data is shared and knowledge is shared, it would make finding information related to that field harder.

And it is very difficult trying to distinguish purely commercial site redirecting with foul intent from the non-com sites just redirecting to some server where they got the content currently stored.

Quote:
The issue relies on search engines keeping historic data which would allow them to provide a far better determination of the validity of a 302 redirect (for example: how can a page be temporarily redirected if it never existed in the first place?)

It happens all the time.

You must understand that at any given point of time there are pages which GBOT has not visited or does not even know about until almost 6 months after it is created. So how is it supposed to know that the page is new and redirecting or god knows been there for ages and now is temporarily placed some where.

Now i 302 link to this article which came into existence just now from my site,
Now that scripts URL is boredguru.com/visit.php?url=http://www.threadwatch.org/node/2032
Google does not know that threadwatch.org/node/2032 already exists right?
It comes to my site and fetches the script it comes here and then indexes it thinking that this is a new page and that its my sites. Now i wont be redirecting to this page alone.
I will be redirecting to a 1000 (if not 10000) pages. And whats the probability that atleast 1 page in that list gets hijacked. A high percentage. And you know what i saw from my sites own experience? Out of some hundred sites i had redirected to (i use a open source php based CMS and i did not even know what redirecting was when i started out) around 5 were hijacked including 2 of my own sites. The thing is those five sites were new and infact not fully indexed. I even linked to winamp.com. Nothing happened to winamp.com .

And basically Yahoo does not have so many pages in the Index as Google. Now the resources needed to check if a page is doing a legal redirection or trying to game the SERP's is not small. And it increases with the no of pages like that you encounter.

And Its not like Yahoo sorts out the valid redirects from the illegal ones pretty efficiently. Its more like, you redirect you are dropped.

As i said earlier GOOGLE CANT AFFORD TO DO THAT.
The ammount of backlash it will get from its core community will be much more than what flak it is facing now. I am giving a personal example from just one field. There are tons of them

My mom does not know how to move the mouse, but she constantly keeps coming to me and asking me to search for info on this medical term, research on it and she tells me to use google. (She is a doctor and i usually reply "Its google even if you say otherwise") Now where did she get to hear about Google? I dont know. But You know this is what it does best.

As Googleguy stated its not like they dont do anything, they do have a pretty robust system in place but when you are playing the no's game some will slip through.

Quote:
What Chris said is important though, if i run a website, i dont blame protocols for showing the wrong content on that website, i move to fix any problems with my website and show what people expect right?

Please take the next sentences lightly. I cant send over my expression when i am saying this so that you will know that it was not rude.

{{{Not Rude Sentences}}}You dont blame about protocols because you dont care how it works as long as it works. But then when you dont know how it works, its much more easier just blaming anyone that handles them for you{{{/Not Rude Sentences}}} really!

See Nick, people of your postiton, who have the ears and respect of thousands must care. Its people like you who built this good place. It needs people like you to keep it one.

You must surely know some people directly or indirectly who can reach the right ears.
With a name for the problem like Google Jacking, people who can really put an end to it will not bother.

The solution is not in the hands of any one company or one person.
I want even browsers (forget about search engines) to know if the targeted page is accepting the redirect or not.

As a searcher i use all three engines ( and sometime the meta engines which are lovely)
But i use these three different engines for different fields.

And by the way. Why isnt it called MSNJacking? coz MSN also has the same problem.
You know the answer to that alreaady right.

 

(Well, this now seems pretty redundant after Yogi's explanation. I have to type faster the next time.)

Pure suppostion, but maybe Yahoo handles 302s better by using some of the same techniques it uses to resolve domain aliases. I'm no techie, but it seems it might be related.

I think it was Tim Mayer who explained that Y tried to resolve domain aliases to one "correct" domain. He used "coke" and "coca-cola" as an example.

On Y, whether you search for either term, the only result returned is coca-cola.com.

On G, depending on which term you search for, you get cocacola.com, coca-cola.com and coke.com -- which all resolve to the same site.

Would this be related?

302ing me

I'll read your comment fully soon BG, in the meantime would you mind *not* 302ing to this site?

Thanks.

sorry, can someone explain...

Quote:
Search for amazon on yahoo. It lists the URL as www.amazon.com. Now that url is dead

what's the problem with the yahoo results? It returns the right result and the link takes me where I want to be.... I'd rather send people to a default page than have sites not appearing at all if I were controlling the algo.

Hi Gurtie here is the explanation

Quote:
what's the problem with the yahoo results? It returns the right result and the link takes me where I want to be.... I'd rather send people to a default page than have sites not appearing at all if I were controlling the algo.

If i permanently redirect my pages to a new urls because of whatever reason, I would expect the SE to pick up the new url and display them. Not the old URL's

For eg instead of www.mydomain.com/forum/viewpost.php?topic=33&post=78&viewmode=flat
i rewrite it too www.mydomain.com/forum/viewpost/33/78/flat

It would apply to all the forum posts. WHich one would you want to be displaced.

Anyway here Yahoo is keeping the redirecting link rather than the redirected link.

Of course not Nick

I dont like stealing.

 

I thought yahoo treated 302 redirects the correct way if the redirect is from the same domain, and use a non standard implantation for 302’s from domains different from the target domain. Can anyone confirm this? is this why Hijacking is not a problem for yahoo?

ok - I get that now, thanks :)

However, although as a searcher I would be irritated by clicking a link and getting a 404 because the page is no longer there, that is a) within the power of the website owner to address and b) possible for the searcher to deal with by starting from the homepage,

it would be a lot more irritating to know that the page or whole site which was the best destination for me may not actually be possible to get to because someone else has hijacked it. At the moment searchers aren't getting annoyed because they don't know about the problem so while I can see Googles POV and it's offending fewer searchers at the moment I think it's only sustainable in the short term.

As a website owner I would also rather accept that I may loose the occasional visitor through 404pagerage (which I can at least anticipate and deal with) than loose all of them through sitenapping.

Yahoos treatment of redirects

If you go to the Yahoo search blog there is a presentation I gave at Pubcon on how we handle redirects:
http://www.ysearchblog.com/files/wmw2004/search-engines-and-webmasters.ppt

Slide 14 & 15 gives all the info

The key is handling 302s within a domain differently than 302s between different domains.

We released this in November 2004
Tim

Yahoos treatment of redirects

Hi Tim
I am on a mac. I cant open ppt. I have downloaded it and will be watching it on my pc later.

I me mine

Quote:
If i permanently redirect my pages to a new urls because of whatever reason, I would expect the SE to pick up the new url and display them. Not the old URL's

The 302 problem arises when outside parties set up redirects to someone's content.

I'm having trouble understanding how the protocols apply to that.

 

Problem solved...maybe Tim can email that file to Google :) Honestly, I think this is more an issue of webmaster friendliness and attitude than an issue that's really technically complicated. {boredguru - many SEOs = programmers} I don't think G see it as an issue.

Google is Clueless

Yahoo's approach makes much more sense. True 302's are generally going to be on the same domain. And regardless of protocols, everyone should know that the 302 is the most commonly used (default on Apache) redirect. The vast majority of 302's on the web are not done to hijack. They are put in place for permanent redirection.

Both should be followed and the destination url should be indexed. The only difference between the two should be whether or not credit for pre-existing links should be given.

If you are smart enough to follow the protocol and tell bots that the page being requested has been permanently moved, they should be smart enough to figure out that all the links on the web that point to the old url now belong to the new one.

If you are dumb enough to use 302's for permanently moved content, then you shouldn't get credit for any old backlinks. You will have to start from scratch.

Simple solution. Problem solved.

{boredguru - many SEOs = programmers}

I did not get that. I am weak at maths.

-

Yahoo's setup is almost blindlingly simple. It simply treats 302 differently dependent on whether they are on the same domain or are between domains. For example:

a.com/page.html 302s to b.com/page.html, Yahoo indexes the target (b.com). a.com/page.html 302s to a.com/page2.html, Yahoo indexes a.com. Similar for meta refreshes. Makes hijacking much harder now, doesn't it?

If Google are really so keen on following standards, then why don't they follow web standards (HTML/CSS) for their own pages, and exclude all pages which don't validate? Obviously because if they did, their index would be much poorer, so they ignore specs to get better data. They should do the same with 302s.

(BTW the Yahoo PPT opens perfectly in OpenOffice.)

 

>If you are dumb enough to use 302's for permanently moved content, then you shouldn't get credit for any old backlinks. You will have to start from scratch. Simple solution. Problem solved.

There you go, it really [really is] that easy.

 

>>Yahoo's setup is almost blindlingly simple.

Too simple. There's apparently not enough algorithmic magic and comparisons between differenct indices for PhDs at G.

Just a couple of quick yes or no decisions. Even I can understand it.

Emphasis not mine.

>> they are doing their work AS prescribed by the laws.

A nice slashdot reader made me aware of this:

RFC 2119 (Key words for use in RFCs to Indicate Requirement Levels) defines "SHOULD" as follows:

3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

So, "laws" would be to stretch it - by far.

Here's how Yahoo fixed it btw. (PDF)

Added: Sorry, i didn't read the thread before i posted. I just saw Tim posted this already - well, the above is in PDF and readable on Mac :-)

Yahoos treatment of redirects

Hi Tim
I saw the ppt. I cant say anything. It works. But can you clarify a few things?
When you say that a 301 redirect from site A/page1 to site A/page2 will actually be responded to by keeping the Site A/page1, do you mean that the page will be kept in the index, but every time you crawl, you will request for the siteA/page1 or siteA/page2?

Because in Amazons case, i see both the pages in the index, but the long page does not appear for "amazon" search. Just curious how do you handle redundancy?
Because if i mod rewrite my urls (lets say 10000) to include keywords in them, and then 301 the old url to the new ones, the old ones will only be kept. No actually the old ones will only be shown, but both of them will be there in the index right?

Hi Claus
I hope its the one i already know.
If not, Hi to the new claus.

Hi encyclo
OPen Office, I was under the impression it was for linux. I will search for it definitely now.

Sorry for the late reply, travelling + Hangover = a pretty incoherent me

Love & Regards
Yogi

301

>> a 301 redirect from site A/page1 to site A/page2 will actually be responded to by keeping the Site A/page1

With deep pages, Y! treats a 301 as they should, by keeping the target page - in this case it is Site A/page2 (not page 1).

The only thing in the Y! approach i personally disagree with is that they treat 301's as if they are 302's in one particular case (when it's a redirect from the root of the domain to a deeper page on same domain).

As to the word "keeps" i personally interpret that as "the URL that is kept in the index" (ie. the other one is not displayed) - i think they probably crawl both.

(yes, i'm the claus you know - at least i think so :-)

302 hell

I would like to preface this message by saying I am not an SEO afficionado nor expert; just a photographer/victim trying to find a way out of this 302 hell without using black hat mesures.

I get the impression for all the forums and my SEO consultant that 302 redirects aimed at bleeding my site are part of Google environment; a wild card if you will and if any one so chooses to target me they can do this without reprocussions. Further, there is no way of finding out which of my competitors is bleeding my site: http://www.carreonphotography.com

Is there any way of this predicament with out going black hat once someone has done this to me?

 

sanpanza, lately it's been obvious that Google did not fix this mess.

The only thing they did was to make it impossible for webmasters that were hit to identify the sites that hijacked them. Of course, if you can't see who's done you in, you can't really do much.

Google's only advice sofar is to submit a reinclusion request.

 

Thanks for your response claus. I did a search in Google under allinurl:www.carreonphotography.com which came up with nothing then I did a search in Google for links +www.carreonphotography.+com in google for back links and many of those that showed up seemed to have 302 redirects

Would these be examples of maliscious 302 redirects? What is a reinclusion

 

> Of course, if you can't see who's done you in, you can't really do much.

Yes, there are other ways, but it's painful and time consuming I admit :)

 

OK, let's keep things as general as possible thanks guys, this isn't the best place to discuss individual situations - for that kind of thing, hire an seo :)

reinclusion request and hijacking

The method i have seen mentioned is to write an email to webmaster-[at]-google.com with the subject being "reinclusion request" and detailing all that you can think about regarding your situation.

That method may be the old one, however, as i have also heard that they now like people to use the contact form here in stead of writing an email.

---

To recognize a 302 hijack, check this: In the result page, is the headline equal to the headline of your page, and does the text excerpt (snippet) resemble one from your page, and is the cache equal to your page, but - the green url below the listing is not on your domain?

If all this is true, you have most likely been hit by a 302 hijack. To make sure, enter the green url from the SE result in a server header checker and look for the status code "302" - if you see it, and a "location" field with your URL, then that's a hijack.

You will find more on 302 and hijacking in this thread over at webmasterworld: Solutions for 302 Redirects and META Refreshes in Google.

reinclusion

Thank you Claus for you helpful information.

Google Cares

?Ya right,

As has been pointed out in this thread and others, google just does not want to fix the problem.

The Googlies know how to fix it (see the Yahoo example) and the googlies have the ability to fix it (all those PHD's must be good for something.)

So the FACT that google has not fixed this can only mean one thing, the googlies don't want to fix it.

The googlies believe that this is one of those things that don't need to be fixed cuz it is not broken. They are wrong... again.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.