This post is on the main page of digg regarding a Googlebot hitting external CSS files looking for hidden text


I figured that could happen, as I'd seen (what appeared to be) AltaVista download a CSS file way back when.

My only concern is with respect to dropdown/flyout menus, which use visibility:hidden in the CSS and JavaScript to make the dropdown or flyout happen. If Google doesn't look that far, then that's a problem, because it *could* look like a pile of hidden links.

I hate it when search engines' problems with abuse end up meaning that you can't use elegant design solutions. Fingers crossed.

Didn't Google awhile back

Didn't Google awhile back release statistics with what people were putting in their CSS files?

Don't most css spammers just block their css files in robots.txt? I think Google's got a problem in that regard. What are they going to do - start ignoring robots.txt? I suspect that's their problem, can't imagine why else they wouldn't have been doing this years ago.

Ummmm @wheel

Even if you block *indexing* of your .css using robots.txt, that doesn't mean that googlebot (or the engineer pulling gbot's strings) can't *read* the file.

PS: I have seen my robots.txt-blocked files INDEXED on many occasions too BTW. I haven't checked lately, but some time ago, gbot used to spider everything within reach, just before a major update was made public.....

Looking for Hidden Text

Hmmm, I can't find any specific mention of someone confirming that Googlebot was looking for hidden text. Google has been requesting CSS files for quite some time (as far back as 2002). Not regularly, but every now and then. My guess? It's another part of their detection methods. Maybe something from the regular bot raised a flag and then deployed the CSS bot. I really don't think the CSS bot is intelligent enough to determine if text is being hidden. So, the CSS bot comes back and says okay, I found this, this and this. Match that with the regular bot data, we have a pattern, now it gets flagged for manual review as I don't think the algo is ready to start filtering based on CSS files. ;)

Css before Javascript?

Call me crazy but if they really wanted to fight spam I'd think they would tackle JS long before CSS. Hidden text is used only by the most basic blackhats that have no idea what they're doing.

The report was updated. JS was spidered/checked as well.

I wonder what made google pick for close examination. Sure, it has tools on it, but it's not icky.


>>>"I wonder what made google pick for close examination."

A lot of the javascript spam out there comes from exploits found in common CMS (Content Management Systems).

In other words, good sites get hacked and display spam... Soooo I expect the good sites are gonna get checked right along with the not so good sites...

If google is really checking JavaScript for redirects... all I can say is....
It's about fucking time!!!!

@wit: Google will not fetch


Google will not fetch files blocked by robots.txt, period. However they may still index that they exist and may well rank them if they have sufficient links with relevant anchor text.

However! Blocking your CSS files with robots.txt is massive "hand-review me", and hand reviewers will download your CSS as they are not robots.


Because verifying CSS on the world's websites takes a lot, lot less computation than executing JavaScript.


>>Because verifying CSS on the world's websites takes a lot, lot less computation than executing JavaScript.

I've got stylesheets on the same simple site in the open, in IE conditional comments, called in javascript and imported. I defy any search engine to make coherent algorithmic rules based on that selection. Far simpler for them to spread FUD.

(And no, nothing funny is going on with any of those stylesheets.)

Re: robots.txt blocking gbot

Googlebot may well obey the robots.txt file IF it decides to read it. Both Google's webmaster guidelines and my past experiences have made it clear to me that gbot does not examine the file prior to every spidering session. I know that it's just a bot, but it can be very creative with its timing nonetheless.

I know that the url-only results are pages that are indexed but (probably) not read. Also, I DO agree that (especially nowadays) Googlebot is very well-behaved. Almost civilised. I bet it slurps up our pages with its pinky in the air too.

CSS file indexed in Yahoo! siteexplorer & serps

I actually found a CSS file in Yahoo! siteexplorer today. Comes up in a inurl: search in the Yahoo! serps too. Never seen that before.....

