Rumor: Google May Buy a Print to XML Company


SiliconBeat published an unconfirmed rumor that Google is looking to pay about $75 million to buy Olive Software, a company which specializes in being able to convert PDFs, microfilm and other files to XML, while still being able to keep the original document format in tact for online browsing.

Many traditional publishing companies are too lazy or shortsighted to see the value of the web. They are also afraid of losing their ad dollars. Many web advertisers are too cheap / thrifty to do much magazine advertising. The average page of web content is likely of far lower quality than the average newspaper or magazine article.

What if Google found a way to bridge the gap? What if they had almost all published information in a large XML database? Google Base on steroids? In that scenario will outlier non interactive non community content based websites have much value?


What if Google found a way

What if Google found a way to bridge the gap?

i think copyright issues will cause a resistance to the gap being bridged.

In that scenario will outlier non interactive non community content based websites have much value?

IMO they can but the content will have to be personalized.

google really

has a problem paying for content, an offline scraper for 75 mil? Now thats innovation!

This would make a nice gift

This would make a nice gift to coporations looking to purchase Google's search appliances.

I think kidmercury is correct about copyright issues; although it may be used for public domain stuff I'm willing to bet it'll mainly be used in a package deal to sell their other technologies to corporations.


would be lining up for a content box like that, forget corporate. The better to spam you with my darling.

Well the big issue is

Well the big issue is control of content ... seems Google wants more and more of that (perhaps because then they can integrate offline stuff into the general structure of the web, allow people to link to offline content, and then verify that with usage data and whatnot).

If they get enough big media to bite on this stuff I don't think this stuff will just go in the onebox area...I think it will largely become a large large part of the regular SERPs.

Stuff like this and perhaps sites accepted in other Google verticals may cause some small traditional media companies to get bought out and leveraged by some cash flush SEOs.

My concern isn't weather or not one could find access to this potential new publishing distribution channel, but moreso in how its higher content quality and vast volume will flood the SERPs and kill off the margins for many targeted but cheesy content sites.

Who will guard the guardians?

Or in this case who will guard the gatekeeper? I don't think big media will bite, it seems to me that they are already worried about Google as the gatekeeper to all their content.

Instead big media will probably come up with their own solutions in house so as to retain control.

Instead big media will

Instead big media will probably come up with their own solutions in house so as to retain control.

I am not so sure. When you think of some things in terms of SEO value some of the stuff seems to be on the market for way less than it could bring in.

For example, The Well ~ $200,000ish

A few media companies will bite then others will follow.

google and fair play, an oxymoron

Nice that this thread happened along so that I could tag along with something that I read last night on from their TOS covering services other than adwords and adsense.

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales.

[=blue]You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

Apparently, it is *not* ok to:

1. serve up *their* pages or a algo transformed version of *their* pages along with PPC ads. just like they do to everyone else.

2. serve up a cache of *their* pages. just like they do to everyone else.

3. serve up an analysed and indexed version of *their* pages according to your own algorithm. just like they do to everyone else.

4. apparently, an advance agreement is required. not exactly how they behave themselves.

No Automated Querying

You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things:

* using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
* "meta-searching" Google; and
* performing "offline" searches on Google.

Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted.

Apparently, scraping just like googlebot is a no-no.

Google stores many web pages in its cache to retrieve for users as a back-up in case the page's server temporarily fails.

The word weasel comes to mind here.

The very first reaction I had on reading this was to put the same terms on my sites. And the fact that googlebot ignores it, might be just to effing bad at some point in the future.

Something about the goose and the gander comes to mind here.

The first reaction is usually the correct one.

The big commercial publishers ought to catch on to this and use strictly equivalent terms on their own sites. Google would then have to argue to invalidate their own TOS in order to defend themselves in court.

Their mandated regulatory filings ought to include in the business risks section:

we could get cutoff from our gravy train at any point in time

That ought to wake up the Wall Street analysts!

Sorry about the choice of colors, they were the only ones I could remember of the top of my head. Which is very pointy right now. :)

Their recourse likely isn't legal

The only recourse they have is likely to be to block your bot if you don't follow the TOS. Just like the rest of us; the only way to stop the scrapers and SE spiders is to proactively block their bots.

I don't think that 'TOS' posted on websites count for much of anything. And I doubt they're legally binding to visitors.

A TOS may not

be binding, but copyright is. Google will once again push the limit daring someone to push back.

seen the craig's list tos lately?

costs assigned to everything.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.