This post is on the main page of digg regarding a Googlebot hitting external CSS files looking for hidden text
I figured that could happen, as I'd seen (what appeared to be) AltaVista download a CSS file way back when.
I hate it when search engines' problems with abuse end up meaning that you can't use elegant design solutions. Fingers crossed.
...is at Googlebot Requested a CSS File
Pierre (of eKstreme.com)
Didn't Google awhile back release statistics with what people were putting in their CSS files?
Don't most css spammers just block their css files in robots.txt? I think Google's got a problem in that regard. What are they going to do - start ignoring robots.txt? I suspect that's their problem, can't imagine why else they wouldn't have been doing this years ago.
Show Google the CSS you want them to see, not what you really use.
Sheesh, now that was easy wasn't it?
or cloak your robots.txt and the CSS, avoid google the easy way and stop snoopers
Even if you block *indexing* of your .css using robots.txt, that doesn't mean that googlebot (or the engineer pulling gbot's strings) can't *read* the file.
PS: I have seen my robots.txt-blocked files INDEXED on many occasions too BTW. I haven't checked lately, but some time ago, gbot used to spider everything within reach, just before a major update was made public.....
Hmmm, I can't find any specific mention of someone confirming that Googlebot was looking for hidden text. Google has been requesting CSS files for quite some time (as far back as 2002). Not regularly, but every now and then. My guess? It's another part of their detection methods. Maybe something from the regular bot raised a flag and then deployed the CSS bot. I really don't think the CSS bot is intelligent enough to determine if text is being hidden. So, the CSS bot comes back and says okay, I found this, this and this. Match that with the regular bot data, we have a pattern, now it gets flagged for manual review as I don't think the algo is ready to start filtering based on CSS files. ;)
Call me crazy but if they really wanted to fight spam I'd think they would tackle JS long before CSS. Hidden text is used only by the most basic blackhats that have no idea what they're doing.
The report was updated. JS was spidered/checked as well.
I wonder what made google pick ekstreme.com for close examination. Sure, it has tools on it, but it's not icky.
>>>"I wonder what made google pick ekstreme.com for close examination."
In other words, good sites get hacked and display spam... Soooo I expect the good sites are gonna get checked right along with the not so good sites...
It's about fucking time!!!!
Google will not fetch files blocked by robots.txt, period. However they may still index that they exist and may well rank them if they have sufficient links with relevant anchor text.
However! Blocking your CSS files with robots.txt is massive "hand-review me", and hand reviewers will download your CSS as they are not robots.
(And no, nothing funny is going on with any of those stylesheets.)
Googlebot may well obey the robots.txt file IF it decides to read it. Both Google's webmaster guidelines and my past experiences have made it clear to me that gbot does not examine the file prior to every spidering session. I know that it's just a bot, but it can be very creative with its timing nonetheless.
I know that the url-only results are pages that are indexed but (probably) not read. Also, I DO agree that (especially nowadays) Googlebot is very well-behaved. Almost civilised. I bet it slurps up our pages with its pinky in the air too.
I actually found a CSS file in Yahoo! siteexplorer today. Comes up in a inurl: search in the Yahoo! serps too. Never seen that before.....
*Active* Threadwatch Editors