Big Daddy - Google's Crawling Sandbox

SEO Training.

I was just writing a post in part to quote Graywolf's comment that he equates the new BigDaddy crawling priorities with a sandbox for crawling, when I noticed he wrote a great post on the topic.

What I am here to say is I think we are hitting the leading edge of a new force to be addressed with, which for lack of a better term I’m calling ’sandbox crawling’. Here’s the way I see it, if your website is missing the right ‘quality indicators’ what you’ll start to see is superficial crawling and indexing of your website.

- Y! MyWeb

A lot of this article has

A lot of this article has old, overlooked techniques. Things I've forgotten about. Great article!


I'm glad my sites weren't

I'm glad my sites weren't hurt by this last update, but I still wonder how well my new pages will be indexed.


One has to wonder

Why rock the boat?

Just a year or so ago the attitude was "read and work hard and ask questions and learn SEO" and nowadays it's "out your competitor" and "influence the masses". It was and still is competition, and it will always hurt the lesser sites in the SERPs. Define "lesser" via SEO, not logic nor reason.


Yer it's nice post

I was trying to tell some folk over at SEW something similar to this, but nobody likes being told their site lacks 'signals of quality' tho and you just get in arguments over meaningless crap.
http://forums.searchenginewatch.com/showthread.php?t=11608&page=1&pp=20

Another one here:- http://forums.searchenginewatch.com/showthread.php?p=80618#post80618


I still think we're looking

I still think we're looking at technical problems with Google - I believe Google has previously stated it's aim is to crawl and index most of the web, and organise the world's information - a sudden sea change to "we will only index a fraction of the web, and only organise a little bit of information" I don't see as in keeping with their policy.

At the end of the day, there's the presumption that websites need links to be crawled - directory listings and articles, etc, are a way for newer websites at least to get some indexing attention.

If Google is going to cripple these formats, it simply cripples the ability for new and useful sites to get any attention whatsoever.

And let me underline something - the old claim that if you write great content you'll get lots of natural links to it is *BULLSHIT*.

Sure, if you create a huge information portal then some people will link to it, but for most small sites and businesses, you just *aren't* going to get natural links, unless you generate a very significant amount of interest online.

Now, maybe someone would counter that this separates the wheat from the chaffe - but all we're really talking about is Grehan's "rich getting richer" again - if new and useful websites can't get any presence on search engines, it's a lot harder to get any attention.

Rebukes about using Digg and the like just don't wash - it's very difficult to become a central focus of conversation, let alone get links from it. Yes it can be done, but it's not easy to be outstanding, and certainly not for small businesses and start-ups.

Which is my point - if any search engine is more concerned with only returning "outstanding" results then it's forgotten relevancy - - > "Want information on an historical Jesus and his familial relationships? Try the Da Vinci Code. It may be full of historical pseudo-shite but it's popular and everyone's talking about it, so it must be true what you need."

Hell - forget popularity and relevancy - try the new Google Cockney Rhyming Slang UK results! Want internet news? Try Gina's Shoes!

If Google really are going to wipe out a significant fraction of their index as not worth crawling, and simply become a guide to the top 10,000 Alexa sites, that's not a real snapshot of the web and simply loses relevancy.

Google has a load of tools in it's armoury for devaluing links, and that's all fair in love and war - but trying to kill it's own index? That's truly cutting off your nose to spite your face and it only hurts Google's users.

2c and rant.


Oh God

Although I agree that Google is using this as a way of filtering out sites they deem useless, I don't think it's all content based. I still think the biggest factor is what neighborhood your site falls in based on incoming links.

Quote:
At the end of the day, there's the presumption that websites need links to be crawled - directory listings and articles, etc, are a way for newer websites at least to get some indexing attention.

It's a way to get cheap links and an advantage. Most directories hold no value to users and thus should hold no value in their link strength.

Quote:
If Google really are going to wipe out a significant fraction of their index as not worth crawling, and simply become a guide to the top 10,000 Alexa sites, that's not a real snapshot of the web and simply loses relevancy.

This isn't a technical mistake. You can believe that like many believed Sandbox was just a big technical mistake 2 years ago. They are simply killing the link strength of sites that really hold no value or credentials. What is so wrong with them ranking a site based on real websites with real webmasters linking to them naturally instead of ranking the guy who went out and bought the most directory listings? Be honest, when you run a search for information on lung cancer, do you want to come across the site that has been linked from medical journals, doctors websites, and major news outlets, or the guy who went and bought 300 directory listings? Which do you think is a better option for the user?


one two three, test and see

I have a 10k page shopping cart site. It is not indexed beyond a few pages. They are the "chapter" pages (details of a category or sub-index of a category). Google doesn't seem to like the product pages or brand pages, although they get crawled regularly.

I add an inbound PR5 to a specific product page. BING. That page is indexed.
I add an inbound PR5 to another specific product. BING. Indexed.
I add an inbound PR5 to a brand page. BING. Indexed on the contents of the brand. The individual product pages (linked from that brand page, with tons of "content" on that particular product and brand) remain unindexed. That brand page starts getting traffic for the individual products...not the product pages.

What's the mystery here? Useless content publishers made this happen, and those same publishers now crafting new "seo methods" for their copies of DMOZ and their "national real estate directories" will just create a new footprint for Google to squash in the index (taking whomever else down in the process).

If you look like a duck, you're gonna get shot at.


They are simply killing the

They are simply killing the link strength of sites that really hold no value or credentials.

Google already have plenty of patents that deal with devaluing links - the concern is if they intentionally stopped indexing sites. That's not the Google we'd expect under any normal circumstances.

What is so wrong with them ranking a site based on real websites with real webmasters linking to them naturally instead of ranking the guy who went out and bought the most directory listings? Be honest, when you run a search for information on lung cancer, do you want to come across the site that has been linked from medical journals, doctors websites, and major news outlets, or the guy who went and bought 300 directory listings?

Real webmasters create real links - just there are many types of links.

As for citations - I really don't care whether a webpage is linked to by medical journals, forums, or directories - as a search user I simply want relevant pages to my query.

As for ranking from directory submissions - Google seems to have already taken care of this issue for *ranking* purposes. The idea that this would kill *indexing* as well would be pretty asty if intentional.

Newer and developing sites need to develop some kind of visibility - they don't necessarily deserve to rank for major keywords - that's not my argument - but at least indexed they can look to capture longtail while they try to develop an increased presence.