Google Paid Inclusion (Beta)

29 comments

Well, some doubted Google would ever do it, but Greg announced that Google paid inclusion program is now live.

For inclusion, instead of cash, they want screen real estate on your site. Jen has images of cache copies of pages showing the AdSense mediabot adding pages to the Google index.

Comments

Talk about sensationalism

You took me suprised for a minute. This is NOT paid inclusion or whatsoever. It could just be that adsense bots while they are around, might as well cache it.

Um, actually, if you read

Um, actually, if you read what I wrote, it had nothing to do with getting new pages in the index. It was updating pages that were already IN the Google index. I went hunting through multiple sites I had, and I could not find a single case of a new page or site being added to the index from a mediabot visit, and Greg's example was of an existing page as well.

So drawing a parallel to paid inclusion is not really accurate IMO.

still a big deal, IMHO

One of the biggest benifits to paid inclusion is a more frequent crawl, so if AdSense leads to a more frequent crawl and cache it allows for

  • quicker testing of tweaking pages to match algorithms
  • quicker ranking of pages that publish new products and service offers

So even if it is not just a way into the index, but just a way to get crawled more frequently that is still a huge issue. If that is the case I don't think it is that inaccurate to compare this with a paid inclusion program.

Most likely though Google will not state how Google's AdSense program effects crawl / cache frequency, or state any offical relationship between the two bots.

Why is this such a big deal?

Why is this such a big deal? I've noticed it before, but I just assumed everyone knew about it. All we're talking about here is the fact that the AdSense crawler is acting as an input stream for search engine indexing algorithm. It's no different than Sitemaps. Yeah, yeah, I know the wording that Google puts out about "this crawl is not associated with our main index crawl", but if you look at that statement closely it's not actually saying much. Google is often ambiguous in what it says.

Maybe I'm off, but I'm just thinking that Sitemaps, AdSense, Google Base, the Google URL submission page, these are all just inputs to the algorithm. If they're using the results of the AdSense crawl for the FIRST input into the system, that's just them making good use of the crawling. The question is, do they use subsequent mediabot crawls?

Ah, it's late and I'm cranky. This should be a good topic for my blog tomorrow... this makes my Google AdSense Tip #1 even more important...

It's Important

It's important because Google has been steadfast in keeping Adsense and their search results separate. They have always maintained that both were completely different areas that did not work together at all.

Crawling aside, it will bring up conspiracy theories on Adsense sites getting a boost in the SERPs and so on. Will we see a day when the Adsense quality score plays a role in the SERPs? I mean if it is a site that provides value to Google advertisers, is it not in the best interest to have it ranking higher?

Again, mostly conspiracy theories, but food for thought. Although the more frequent crawling of pages would be nice, I've always felt you need to make clear lines between your organic results and your advertising program.

crawling

I'm seeing lots of crawling wierdness, like pages that were in the index in February

h**p://thegraywolf.googlepages.com/ (not linked on purpose)

now not in the index

h**p://www.google.com/search?q=thegraywolf.googlepages.com&btnG=Search

Another example pages that were 301 redirected to a whole new domain, still listed as the old domain even though the cache shows it on a new domain.

I'm seeing lots of crawling

I'm seeing lots of crawling wierdness, like pages that were in the index in February

h**p://thegraywolf.googlepages.com/ (not linked on purpose)

now not in the index

h**p://www.google.com/search?q=thegraywolf.googlepages.com&btnG=Search

Another example pages that were 301 redirected to a whole new domain, still listed as the old domain even though the cache shows it on a new domain.

Yes, I've been seeing issues with Google indexing since February - I wonder if this is related to a change in indexing uses the mediabot to supplement the spidering process - and then these changes being reversed?

I reported on my blog a few weeks back that I'd seen indexing from mediabot only on a new development project, but since then I've seen a lot of content completely disappear again.

crawling weirdness

Yet another reason why their inconsistancies between their actions and official policies may make being in the adsense program a big deal.

Sure it is just one input stream, but how would you feel as a business knowing they were giving a competing site more oars in the water because they sell ads through the network. Sure that may just be conspiracy theory this or that, but this is a step down the slippery slope that makes me debate the real value or reliability of Google's serps.

The biggest business advantage Google has is trust - not algorithms. And this type of activity is selling it wholesale.

webmaster identity

It also plays on the Google trust issue. Sitemaps requires a name/email/IP be assigned to a domain. Google has WHOIS access as well, and via AdSense Gogle can assign a pub-id to a domain.

Knowns pages have more trust?

Might be a something to

Might be a something to querry Mr. Cutts about at the organic listings session at pubcon

I guess I'm being obstinate,

I guess I'm being obstinate, but I still don't get it. The only thing we know is that pages using AdSense can appear in the index. There's no implication that the indexing process itself is affected, only the input to the process.

I've always been under the assumption that any URL I provide to Google in any way -- whether it be through AdSense, or through AdWords (the landing pages), through Sitemaps, through Blogger, through Google Base, through Google Pages, etc. -- are all fair game for Google to crawl. One of Google's stated goals is to categorize all the world's information. This is completely in line with that goal...

Easy proof possible

Some people might want to see a proof without checking logfiles against cache dates. It's quite easily doable:
http://www.google.com/search?q=user-agent+mediapartners-google+%22build+date%22

More details at http://www.thomasbindl.com/blog/

Caching is not the same as indexing

Caching - storing the HTML so that you can present it to a user at a later date

Indexing - dissecting the web page and storing the word occurences in the search engine's index

Greg Boser has presented evidence that the Adsense bot is updating cached pages*, but I haven't seen any evidence yet that the pages fetched by the Adsense bot are being indexed.

* yes, I suppose it could be the regular Googlebot disguised as the mediabot, but that's less likely

Pubcon Update

Matt confirmed mediabot is indexing content it's a bandwidth saving issue. He said serving different content to the googlebot and mediabot "would be bad"

Matt ALWAYS says

serving different content in any way "would be bad" --G expects the search world to be flat.

Dan, look at the search

Dan, look at the search query I showed. There it's clearly visible that you can find content that is spidered _and_ indexed by Mediapartners-Bot.

As already pointed out Google just acknowledged it and it's called "crawl cache"

thanks

Greg Boser also showed me what's being done, and Matt Cutts has apparently confirmed it: Google Is Doing Something Really Dumb. So much for the great wall between the organic and payola side of the organization.

What they should really be doing is feeding pages from the mediabot to something that looks for robospam. But of course they won't do that... doing anything to reduce Adsense spam would seriously diminish shareholder value.

Can't wait for the annual report that lists Matt and the search quality team as a potential threat to future profits. Not gonna happen.

I still don't get what

I still don't get what you're complaining about. No one's saying that the page ranking algorithms have changed. The input to the algorithm is just coming from a different source, that's all.

Eric...perhaps historical

Eric...perhaps historical perspective would be of use...

Back in the olden days (where they had to type up hill in snow) many people maniplutes search indexes by quickly copying others code and submitting right away. Those programs died away because they served little use except for spamming.

There also used to be some rather well known pay for inclusion programs that existed which allowed you to get crawled more frequently. Yahoo! still does it, and they have been given A LOT OF SHIT for having that program, which they say constitutes less than 1% of their index.

Getting crawled more frequently IS AN ALGORITHMIC ADVANTAGE. Why?

  • New content indexed quicker
  • Links from said pages seen (and possibly indexed) quicker
  • More frequent algorithmic testing ability

For one to think that the subset of the web using AdSense is not in some ways biased as compared to the web as a whole is short sighted IMHO. It is not a good thing for those serving direct ads to be given preferential crawling behavior.

bit more background

I reverted the title...because now with more corroborating data I don't think my claim was off on any level at all.

Greg Boser posted some good information here and Shoemoney posted that Matt confirmed this information today.

Is it a big deal? Mikkel notes that Google's AdSense policies state:

Adding the Google AdSense ad code or AdSense for search code to your site will not queue your pages for crawling by our main index bots. While our bot (starting with 'Mediapartners-Google') does crawl content pages for the purpose of targeting ads, participation in AdSense does not increase the number of pages from a site in our main index.

Google AdSense policy statement updated in 5...4...3...2....

Can't you just smell the internal emails...

If it's more frequent crawls...

... then yes, I can see why that would be an issue if the more frequent crawls are happening only on AdSense pages. But it's not clear to me if this is what's happening or not. I assume certain sites get crawled more frequently already for various reasons. So really, we need Matt or someone from the AdSense team to clarify what exactly is happening. Yes, I suppose the emails must be flying around the Googleplex now, I bet they didn't expect this kind of fuss.

Optimism vs. pessimism

What I've been assuming all along is that whenever Google determines that a page needs to be recrawled, it first looks through the Mediabot's cache and if the page was crawled recently enough just grabs that copy instead of queuing a crawl request for the Googlebot. By this interpretation the page won't get crawled by the search engine any more frequently than before, it's strictly a bandwidth-saving measure on Google's part. The search algorithm just pulls its data from a different source. Given the distributed nature of the indexing process, this is exactly how I'd do it and I think it would work quite well.

The pessimistic view is that each AdSense crawl pushes new data into the search algorithm, which I guess is what everyone else is thinking. I agree that's cause for concern.

I left a note for Matt on his blog asking for clarification. Or maybe Jen will get the official scoop...

clearly BS

"bandwidth-saving"

I guess a cover story couldn't get much weaker.

G$ threw in the towel on any pretense of SERP "democracy" and did what any standard company would do: milk that cow for all its worth!

Go Google!

Not BS

I get crawled all day every day by both of their bots and if one of them can update the other it's OK in my books.

You forget it's also double the computing horsepower for Google unless they merge the two crawlers, makes perfect sense to me.

THe only thing I don't like

THe only thing I don't like about this is that it screws up some of the work I did to optimize one of my sites for adsense. I have a phpbb forum that was constantly showing ads based on the phpbb stuff on the site, so I hid phpbb stuff from the mediabot so it only saw what was being said in a threaded discussion. Guess its time to try that section targetting stuff the put out.

Section targetting

Section targetting only fixes how AdSense sees the page, which is a patch IMO and doesn't fix how the search engine perceives the page. If you get AdSense right without section targetting it usually improves how Google views the page across the board.

I know where your going but

I know where your going but that doesn't apply for phpbb & blogs. I don't know if its still a problem, but it used to be that people targeted certain things that were in every phpbb or wordpress install via adwords. Since many of my forums were nightlife related it really caused some confusion for the visitors.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.