Google & Static IP Addresses

25 comments
Source Title:
Why Using A Static IP Address is Benefical... Google Engineer Explains
Story Text:

Ben of Rank Smart, who along with Barry is providing some superb coverage of SES let's us know about a conversation he has had with a Google engineer to do with IP addresses.

It's very interesting, gives some insight into Google's spidering architecture and makes damn fine sense at both a business and SEO point of view but does have some subtle innacuracies I want to cover to save confusion.

The article speaks about how and why a dedicated IP address on a website is beneficial to SEO compared to what Barry calls a dynamic IP address.

Sorry to pull you on this Barry but I think you mean a dedicated IP address per hostname compared to a shared IP address per hostname. A dynamic IP address, one that changes regularly is more closely related to the way an individual will connect to their ISP rather than how you host a website.

It is possible to host a website/hostname on a dynamic IP address though, using either your own DNS with a low TTL or commercial (I am pretty sure there are some free ones as well) services such as www.dynip.com, no-ip.com etc but this is unusual and unlikely if I understand the context correctly.

I hope this doesn't become too boring but I'll try to explain what http 1 and http 1.1 mean in relation to IP addresses from a technical stand point and then hopefully the article will make a lot more sense.

Many moons ago every hostname (think domain name for now) directly mapped to a single IP address. As more and more services, people and websites starting appearing on the new commercial internet it became apparent that the limited resource of IP addresses could run out and ways of preserving them were thought of.

IPv6 became the longer term answer, but in the short to medium term there was an addition to the http 1 specification that basically said, lots of web sites can share a single IP address and the only change that is needed is that a browser ask for a specific hostname and IP address, rather than just the IP address itself, and the server understands this new request. This new specification, http1.1 name based hosting, became the norm (sensibly from an IP addressing point of view) of hostnig websites and you will now see most websites sharing their IP with many many MANY other websites.

Fast forward to now :)

Now IP addresses themselves don't directly cost ISPs a penny. They apply for them to RIPE, the organisation that administers who owns what IP addresses, and as long as you fulfill their criteria you get given a new batch. The downside is that the paperwork and hassle factor DOES have a cost for an ISP and many will simply not allow you to have more IP addresses for their hassle factor reasons.

On top of this RIPE's / ARIN's etc policy runs along the lines that http1.1 web hosting is the desired and sensible way to allocate IP addresses and requesting IP addresses solely for website allocation is not looked up favourably when http1.1, name based hosting is the "correct route" I do not believe SEO reasons will assist in getting RIPE (and therefore your ISP) to assist in getting an allocation.

What will help though is if you decide that every one of your websites should run in a secure SSL (https://) manner, as well as insecure (http://) way. The reason for this is that every SSL certificate needs to have a dedicated IP address and http1.1 hosting will not work with it. Don't worry about the costs as they are minimal in the larger picture (from about £20 per annum at the moment) but you can reduce that to Nothing if you self sign a certificate (too techy for here but managable for most system admins)

Now with all that aside and back to Barry's piece and the way that G's spiders work. It conforms what many of us have thought for years but as far as I can recall has never publicly been confirmed.

The spiders find your links to your site and keep trying to get the content in a gradullay increasnig technologically aware spidering system.

They start at http1 with basic parsing ability and gradually climb the list of http and html specifications until they understand the content inside it. I think it is fair (although it doesn't say this in the article) that eventually MozGoogleBot thingymajig will come along and parse all the content on the page, including JS, (though probably not VBScript) Flash etc etc.

Interesting and I for one wanna thank Barry for this info that seems to have missed the mainstream (here at TW at least) and also Nick for chucking in the delicious thing on the right and whoever tagged the story :)

Comments

nice catch :-)

... apart from the HTTP / HTML confusion, there's a "sandbox gem" in that last line.

Claus, there is also an

Claus, there is also an undetectable hijack and/or googlebowl in that same sentence :D

Phoenix = Ben Pfeiffer from

Phoenix = Ben Pfeiffer from RankSmart. Often people don't see the name at the bottom, I guess. But he had the scoop, not me. But thanks for noting. It was a great catch by Ben.

Fixed - off to scrutinize

Fixed - off to scrutinize that last line!

I had put this on the delicious list hoping someone would pick up on it, and had read it, but i confess to having not reached the end :(

Thanks and sorry Ben for not

Thanks and sorry Ben for not giving you the credit. Great piece Ben, thanks for the time and energy in posting it :)

You think that explains

You think that explains sandbox Jason?

It might make sense - some get sandboxed, some dont. Some on shared IP's, some aren't...

You think that explains

Quote:
You think that explains sandbox Jason?

No :)

So..... what was your point?

So.....

what was your point?

It was Claus's point not

It was Claus's point not mine, and I can't take the credit cos I never saw it till I saw his post, went back and read it and it struck me, hard in the face, like your wife would if she caught you with your trousers around your ankles with the local bike!!

But enough of that romantic analogy (corr I am starting to sound like Eugene!)

Imagine what would happen if you changed the default site, on an already known and existing IP address to one that is actually new?

As an addon from that, imagine what would happen if that new default site had some redirection happening ?

No more for today. Sue has just drawn her arm back and it looks like there is a frying pan in her hand!!!!!!!!!!!!!

Dunno what to say

It was just a thought, really. Hit me like, well, light summer rain on a hot day :-)

Actually I'm a bit sceptic about the whole "sandbox" thing, always have been. But, this is a thing that will certainly cause delays in ranking new sites. If the links don't count, the ranking doesn't come - simple issue, really.

I still think that some of the horror stories that have been reported under the headline "sandbox" are really someting completely different. Those stories could be all kinds of things, but this particular issue could well have some real substance to it*

---
* If interpreted and reported correctly. Could be the engineer in question was just pulling a joke, although the post don't read like that.

Many IP addresses are being recycled these days

So I wouldn't be so quick to discount the 3-month delay as a possible explanation for some of the sandboxing. I am usually careful, when asking for new IP addresses, to check them out in advance before accepting them. Many have been blacklisted because of past abuses by email spammers. Clearly, there is a chance that some or all of those burned IP addresses have been crawled by Google in the past.

I still think that some of

Quote:
I still think that some of the horror stories that have been reported under the headline "sandbox" are really someting completely different

Yep, a lot of what are "reported" as sandboxings could well be this shared IP thing i think.

Thanks for the explanation Jason, im with you now - im just not devious enough to have spotted that hehe!

Never believed the sandbox theory

tried to recreate the 'sandbox effect' numerious times, all without success.

We always use dedicated IP address by default.

This could very well explain why we could never recreate or see the 'sandbox'.

side note: Today my time at TW has been well spent. Learned some new stuff... :-)

not all shared IP's

hate to put a fly in the ointment but a load of our clients have shared IP's (ie; if they don't ask for a dedicated or need an SSL they get shared) and I still don't see a sandbox on them.

But they also mostly don't have huge linkbuilding campaigns etc etc etc so perhaps a combination of things...

(and I also don't think we have any 'bad' IP's)

I'm not Jason

Not that I dislike the name Jason. It's a very fine name. Nothing wrong with it, it's just not mine :-)

------------------------
Added:
Just discovered that the link I posted above was to the WMW Supporters Forum which requires login, sorry about that. Basically the post mentions that a Google Engineer had recently mentioned something about a "probation category" of sorts for new sites. Not more than that, no details or anything.

Explains..

A very funky behaviour I was seeing 6 months or so ago where the root domain of a major telecom was ranking for completely unrelated terms.

Totally Makes Sense

I don't understand the disagreements with this article (as few as they may be). It makes total sense that a site with a unique IP gets better positioning than a site with a shared IP. I think the general idea is that 'shared IP' = 'shared server' & 'unique IP' = 'dedicated server'.

Yes, there are a lot of good quality sites on a shared server, but the truth is that the sites who can afford a dedicated server generally (and I stress "generally") are a higher quality than a site that shares a box with 100 other sites.

...except in the case of about.com ;-)

In other words, in a search for "walt disney tickets", a site with a dedicated server (like disney.go.com) would be a better search result than a site on a shared server (like discount-disney-tickets.us)

I doubt that that the IP plays a huge factor in the algorithm, but I would be very surprised if it doesn’t play some factor.

thinking about this....

in relation to the so called sandbox.

Maybe I am missing something, but why would a site come in strong and then drop out a few weeks later?

Dont get me wrong, I am sure this is a light on moment and would account for some of the things we see, but there is more to it than IP and recursive crawls. IMHO

Cheers Jason for doing the more precise explanation :-)

Thanks Jason for the

Thanks Jason for the mention. It was interesting information to find out and glad you pointed out what I really meant. I was running to a session and had like 1.2 minutes to post this info so I was in a hurry.

On the notion of the sandbox relating to this, as I gathered it was a different thing entirely as both shared and dedicated IP's both can undergo some delay. However, this shared IP effect can have an impact on the overall problem I think. I will also mention that I learned while I was there that not all sites undergo the "sandbox" so to say, its just those that fall into a certain criteria.

tease

oh come on - and then you asked 'what criteria?' and they said......

What can they say?

What can they say?

implication for templated networks

The implication for templated site networks, is that by first coming in on HTTP/1.0, depending on your particular setup, the spider may notice, for later use, that as it moves to HTTP/1.1, that all the sites on that ip transform in a similar fashion.

Sorry for the convoluted sentence construction, but that's as clear as I can say it. Think of it as the spider thinking: "these sites seem to have the same DNA, hmm..."

All modern browsers can do HTTP/1.1, so there is no reason to fallback to HTTP/1.0.

So, when running a templated network of sites, the thing to do is avoid answering HTTP/1.0 requests without a hostname header. 403 the request until they come back with a hostname header.

On IIS systems that means allowing a blank hostname header, so that the base system will not intercept the request with the default page, but examining the hostname header in the page code and returning a 403 when it is missing. My memory says that 403 is request forbidden. This will keep all sites separate. Since templated systems imply dynamic pages, this should not be any hardship at all. Similar behaviour is possible on LAMP systems, I just don't know how to do it :)

The side effect is that it is also a good way to avoid poorly written scrapers. Yeah!

Apache

on Apache it requires two lines in the ".htacess" file, preferably before any other rewrite conditions (put it highest in the ".htacces" file, as the very first ruleset). They go like this:

-------------------------------------
# If no host name -> Forbidden
RewriteCond %{HTTP_HOST} !.
RewriteRule .* - [F]
-------------------------------------

Or, if you have a custom 403 error page that you would like to serve them, modify the last line like this:

-------------------------------------
# If no host name -> Forbidden
RewriteCond %{HTTP_HOST} !.
RewriteRule !errorpage403\.html - [F]
-------------------------------------

"errorpage403.html" being the file name of the custom error page.

(This will only work 100% as intended if you don't have other files with a name that includes "errorpage403.html").

---
Note: In some cases it might be a bad thing to refuse requests without a hostname, so think one extra time before applying this rule. If you need requests like that for some reason, don't apply it.

some posts about the sandbox from wmw

Because domains are cheap and clever spammers can create very convincing copy which looks unique. However, these types of sites are unlikely to fool a human and will not get linked to in a 'natural' and convincing way. Therefore the sandbox is there to await scrutiny of links in and only when google is satisfied that real people like the site (links) will it rank well.

Last night at the Google Dance at SES, I listened for some time to one of the Google engineers expounding on all things search at Google. He said that internally they do not refer to the probationary period as the sandbox. They've been amused by the term, and have affectionately turned to calling the sand covered volley ball court in their quadrangle "the sandbox".
He did, however, openly acknowledge that they place new sites, regardless of their merit, or lack thereof, in a sort of probationary category. The purpose is, as MHes mentioned earlier, to allow time to determine how users react to a new site, who links to it, etc. He dismissed the notion that its related to Adwords or any financial considerations favoring Google.
The probationary period can vary anywhere from six months to a year.
That said, there are exceptions where sites quickly stimulate an obvious quantum leap in user popularity measured by quality IBLs, etc.

I think the overal concensus was sandbox does not effect everyone and everything, it is just part of the algo and many sites fall foul of it. But it does not have to happen.

DougS

now we are complete

In some cases it might be a bad thing to refuse requests without a hostname, so think one extra time before applying this rule. If you need requests like that for some reason, don't apply it.

How very true.

Thanks Claus. I was sort of counting on you to step in with the correct.htaccess equivalent.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.