Google's War on SEO - Documented

44 comments
Source Title:
Google's look into document scoring by historical data
Story Text:

UPDATE: After following the comments below, and taking WG's advice and reading the whole document, a little title change is necessary. As always, the meat is in the comments, read on....

Brian sent me a note pointing to this thread at SEW by MSGraph that points to this patent filed by Google. It would appear that this document confirms what many have suspected (known..) for some time regarding Google's infamous Sandbox.

The interesting bit, apart from those individuals credited with inventing the system, is this:

Consider the example of a document with an inception date of yesterday that is referenced by 10 back links. This document may be scored higher by search engine 125 than a document with an inception date of 10 years ago that is referenced by 100 back links because the rate of link growth for the former is relatively higher than the latter. While a spiky rate of growth in the number of back links may be a factor used by search engine 125 to score documents, it may also signal an attempt to spam search engine 125. Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.

Dont gain links too Fast!

Well, you knew that right? But this doesn't go all the way to explaining Sandbox, there was a little work around not so long ago, untill Google nuked it, allegedly taking 1.2million small businesses out as collateral damage in it's attempt to thwart a handful of spammers. It's an interesting read nonetheless (yes, i *did* only read the interesting bit :-) and will no doubt go some way to helping all those poor MF's mired in the dreaded Sandbox.

Feel free to correct me if i got any of that wrong, i should be banned from writing techy stuff....

Comments

SEO's are not friends

Reading through it now, it's pretty obvious they are focusing on reducing the effectiveness of SEO techniques.

The most important SEO related document in the last 5 years.

Take the time to read it from begining to end. It incorporates every single technique ever mentioned by those of use who are constantly accused of be cynical towards Google.

Quote:
[0114] According to an implementation consistent with the principles of the invention, user maintained or generated data may be used to generate (or alter) a score associated with a document. For example, search engine 125 may monitor data maintained or generated by a user, such as "bookmarks," "favorites," or other types of data that may provide some indication of documents favored by, or of interest to, the user. Search engine 125 may obtain this data either directly (e.g., via a browser assistant) or indirectly (e.g., via a browser). Search engine 125 may then analyze over time a number of bookmarks/favorites to which a document is associated to determine the importance of the document.

[0115] Search engine 125 may also analyze upward and downward trends to add or remove the document (or more specifically, a path to the document) from the bookmarks/favorites lists, the rate at which the document is added to or removed from the bookmarks/favorites lists, and/or whether the document is added to, deleted from, or accessed through the bookmarks/favorites lists. If a number of users are adding a particular document to their bookmarks/favorites lists or often accessing the document through such lists over time, this may be considered an indication that the document is relatively important. On the other hand, if a number of users are decreasingly accessing a document indicated in their bookmarks/favorites list or are increasingly deleting/replacing the path to such document from their lists, this may be taken as an indication that the document is outdated, unpopular, etc. Search engine 125 may then score the documents accordingly.

[0116] In an alternative implementation, other types of user data that may indicate an increase or decrease in user interest in a particular document over time may be used by search engine 125 to score the document. For example, the "temp" or cache files associated with users could be monitored by search engine 125 to identify whether there is an increase or decrease in a document being added over time. Similarly, cookies associated with a particular document might be monitored by search engine 125 to determine whether there is an upward or downward trend in interest in the document.

Toolbar and desktop search data. nice.

Suspiciously so

It incorporates every single technique ever mentioned by those of use who are constantly accused of be cynical towards Google.

Yes they are

What a broad based and generally in-accurate comment by Graywolf. They are focusing on spammers like they always have. Unfortunately some SEO's are spammers.

 

Looks like "fingerprinting" to me, too, WG.

Bookmarks

Can someone tell me how they can determine what i may be bookmarking? Can they get that through the toolbar?

Any in favor of renaming this thread "Google's war on SEO Documented" - seems it's much larger than just Sandbox...

 

Incredihelp I really hate ethical deabtes over who is or who is a spammer, and where the line is. Once you've made a decision to take actions to manipulate a search engines results for any reason, whether they be noble aspirations, profit oriented, ego-centric or for comedic value, in the general public eyes and now it seems Googles, you're a spammer.

Whether you choose to mentally reside in that river in Egypt about it or not is up to you.

And there it ends.....

Ok guys, let's stay on course - you know i wont hesitate to butcher this thread if it needs it.

This thread is NOT about ethics, let's leave that for another time please.

Thanks

I think I'm gonna...

...suggest installing the G desktop to all my sites' regulars (i.e the people who bookmark it).

Think about adding a link to G desktop above your [bookmark this] button :]

 

You would think they would have learned the lesson of the alexa troolbar (hmmm maybe I shouldn't edit that!!!)

identical anchors

Is this also confirmation on what many theorized when some SEO sites dropped from the top spots recently

Quote:
[0119]... search engine 125 may monitor web (or link) graphs and their behavior over time and use this information for scoring, spam detection, or other purposes. Naturally developed web graphs typically involve independent decisions. Synthetically generated web graphs, which are usually indicative of an intent to spam, are based on coordinated decisions, causing the profile of growth in anchor words/bigrams/phrases to likely be relatively spiky.

[0120] One reason for such spikiness may be the addition of a large number of identical anchors from many documents. Another possibility may be the addition of deliberately different anchors from a lot of documents. Search engine 125 may monitor the anchors and factor them into scoring a document to which their associated links point. For example, search engine 125 may cap the impact of suspect anchors on the score of the associated document. Alternatively, search engine 125 may use a continuous scale for the likelihood of synthetic generation and derive a multiplicative factor to scale the score for the document.

 

They can get all sorts of stuff once you have that toolbar and desktop search installed. Bookmarks, cookies, whatever - they have full access.

that's a load of info

I just skimmed the initial points and I haven't even got to the examples yet but already I think it could explain the Urchin purchase, and it answers so many questions I've had it must be a dream for you pros :)

 

I think the document certainly validates what a lot of SEO's have been saying - that the Sandbox does exist. The only err I see made is that blame was usually placed on the domain being new, rather than the individual documents of a domain.

However, as Nick rightly points out, there is more to Sandboxing in general - the sandboxing effect changes across keyword use - s site can rank well for some keywords, but be completely absent for others.

Also, it's worth referencing the golden mean topic, as this is such a brilliantly illustrative concept for showing that network (ie, backlink) growth can be given "natural" and "unnatural" parameters - and that if a document falls outside of these parameters, it will automatically "sandbox" itself. No manual penalty required - all fully automated.

The suggested use of tracking stats via toolbar, etc, is an insiduous addition to the mix, though not really that surprising that if Google has data to use, it may try to use that data.

Of course, as with any Google patent, there's no clear indication that this particular patent application is applied - but it can serve as an illustration of key points of interest that Google have been developing.

2c.

Bookmarks

Desktop search like WG said.

Also, remember that G bought Outride, developer of a bookmarking software package, a few years ago so maybe something else is in the works as well. Or they are applying that to data collected from the desktop search.

Further reading: http://www.webmasterworld.com/forum34/289-1-10.htm Msg#4

Yes they can

Quote:
Can someone tell me how they can determine what i may be bookmarking? Can they get that through the toolbar?

Absolutely.

Now read this section:

Quote:
[0090] Additionally, or alternatively, search engine 125 may monitor time-varying characteristics relating to "advertising traffic" for a particular document. For example, search engine 125 may monitor one or a combination of the following factors: (1) the extent to and rate at which advertisements are presented or updated by a given document over time; (2) the quality of the advertisers (e.g., a document whose advertisements refer/link to documents known to search engine 125 over time to have relatively high traffic and trust, such as amazon.com, may be given relatively more weight than those documents whose advertisements refer to low traffic/untrustworthy documents, such as a pornographic site); and (3) the extent to which the advertisements generate user traffic to the documents to which they relate (e.g., their click-through rate). Search engine 125 may use these time-varying characteristics relating to advertising traffic to score the document.

Let me summarize: We will favor pages that contain AdSense, and we will penalize pages that contain ads from our competitors. don't even think you can cloak in order to prevent us from seeing our competitor's code on your site. That won't work because we will be using our millions of toolbars to track what your visitors are clicking on.

Great Info here!

This is the BEST !

Confirmation of lots of stuff including those over 100 points of Google algo :-)

btw, the document there refers to the site, page or domain ?

Dont forget...

to add, "and our mates" to that.

>Amazon

Mainly Interesting Academically....

For those who've been paying attention and run or watch enough sites, this is hardly big news, though it does provide some intesting insight into certain details.

Since mid '04 I've been preaching (elsewhere) that the real issue re freezer/sandbox is avoiding the appearance of 'unnatural growth' or overt manipulation.
--Too many links or too many new pages (of certain kinds), too fast
--Too many site wide changes relating to known SEO-related elements
--Too much repitition of important kw's

For a year, the trick has not been knowing that this was happening...but what was needed to get sites around this newest version of their algo. As always, god is in the details. But hey, that's been true of SEO since SEO existed.

Their biggest problem now is how to avoid equating size with value, and to try and remember that symantic evaluation is a tool with limits, not a goal in and of itself. They are currenly discounting the appearance of the actual words people search on to an absurd degree.

BTW, who wants to bet on how many will still insist that the sandbox is a capacity/storage issue? ;-)

Now even Sitepoint will have to admit there's something like a (deliberate) sandbox. Sorry to BBAB* here.

* be bursting a bubble

yes, Nick, you can :)

> Can someone tell me how they can determine what i may be bookmarking?

IE users, or any other browser that supports the favicon.ico, request the favicon.ico file every time they make a bookmark. Try for yourself (once you get that XP machine up) - go and bookmark one f your pages and then check the log. There should be a request for favicon.ico there - even if you don't have the file on your server.

Another way, as you suggest, is through the toolbar. As far as I can tell the toolbar could really gain access to anything the programmers want as it has no of the limitations set by web standards - it's a desktop app.

Just a thought....

Thank Mikkel, that's obvious once i hear it, but i needed to hear it :) It would have to be combined with the toolbar though right?

What if...

Eeeeek!You know, they could have planted that doc, just to scare all the naughty SEO's heh...... It's not like they're any strangers to planting rumours and spreading paranoia....

 

The legal firm Harrity & Snyder LLP is registered thru 2010, coincidence?

 

I do think there's an opportunity for "extra padding" put in some patents, for reasons ranging from promoting a run-up in your company's stock, to deterrence, to just staking out your territory early.

That said, a whole lot of the points here are just common-sense fingerprinting, stuff we'd all said we'd do if we ran a search engine. I know of one clandestine "fingerprints" list that was done over a year ago that is encouragingly (and scarily) close the the numbered points here. (Some who participated in making that list are in shock, hhh!)

About the only thing I see missing in the G patent is multiple hyphens.

A few more details ...

> It would have to be combined with the toolbar though right?

No, the favicon.ico is requested by all browsers that support it (I am not actually sure if others than IE does, but that is still most of the users - far than enough for analysis purpose).

The toolbar would just be another way to access such actions. In fact, I don't see anything that could hold back the toolbar from any action even from grabbing all key strokes and return them to Google - not that I think they do that today, and I am pretty sure it would make a killer headline if they did :)

I stumbled over a funny fact in this document. They mention "spam" I don't know how many times, as if it was a well defined term. It's not. In fact, I question the whole concept of "search engine spam" - I just don't think it exsist. There are good engines and bad engines but not spam. I promise I will go into much more details about why I think so but for now, the interesting part is that they don't seem to care to define exactly what they think the word "spam" is.

I find it very strange to use a word so much in a ducument like this without defining the meaning of the word. I mean, they use it to argument the case but totally ignore to explain it. At least, thats how I read it. Funny!

Always makes me feel better...

Knowing there are others as paranoid as me is assuring. :)

But in this case, the doc's pass both the 'common sense' and 'smell' tests, so I for one will take them at face value and assume that much of it is already in place, or on its way to being there...as evidence of their existence is all around us...

Quote:
No, the favicon.ico is requested by all browsers that support it (I am not actually sure if others than IE does, but that is still most of the users - far than enough for analysis purpose).

One could say that the favicon.ico is more supported by other browsers than IE. I'm pretty sure that in IE, the file is only called when the user bookmarks the page. In Firefox, however, it's called every time a page loads, whether you're bookmarking it or not, whether it's in your bookmarks or not.

So I don't think G could track bookmarking on non-IE browsers by that method.

Favicon

No, i just meant that for Google to use favicon data they would have to go through the toolbar - they cant get at my favicons by me just visiting google, those favicon requests go to all kinds of ther sites, not through google :)

Ahh yes :)

Get you. I agree, it is most likely that they would use the toolbare to transmit the data back to Google in any case. So I guess, why not use it for data collection as well :)

Keep your secrets a secret.

We need some strict seo practice rules. Something like the Magicians guild. Where if you show how your tricks are done someone shows up on your doorstep and makes you eat your shoes. hehe. I know every noob with a cool new technique has to go blab in the forum but everytime they do so it's just a shot in the foot for the industry.

I've already got a plan on bookmark spam :) but I'll keep that sucka to myself :)

stealth leads to wealth

stealth => wealth

re: mentioning SPAM at every opportunity

I am great, and I am good. I will prevail and improve the world. I will also perform for my stockholders. However, those spammers might stop me. If I don't perform, it will probably be because of those spammers. Spammers are a threat to my business model. It is hard to stop spammers, because even trying to stop them takes energy away from my business model. I didn't fail, I just got abused by the evil spammers.

Classic plausible deniability seeding?

Hmmm

I just did some random bookmarking with a packet sniffer running. It'll show the http://www.somesite.com/favicon.ico request but nothing gets sent by the tool bar to any google ip or url. I tried it in both IE and FF.

Maybe ... maybe not

> nothing gets sent by the tool bar to any google ip or url. I tried it in both IE and FF.

I think it will be hard to determine when, how and if they send it and how they package the data. It could a embedded in other data feedback, as binary files, encoded or whatever.

 

Quote:
The legal firm Harrity & Snyder LLP is registered thru 2010, coincidence?

helluva an SE friendly design on that one too :-)

I find all those areas where they're using advertisement criteria to affect ranking to be somewhat ridiculous.

Red Herring

This smacks of an attempt to obfuscate the truth. The mention of magicians by seomike got me to thinking that while they are waving the left hand around, the right is digging into my back pocket.

Otherwise, this is just too simple.

Bookmarks

Google so need a browser. If they get enough users their data would be able to tell them everything they need. Toolbars need to be downloaded and are unlikely to be corporate mandated, browser upgrades on the other hand are often rolled out by IT departments, OEMs and ISPs..

Yeah

Looks like by buying up the developers they have effectively bought firefox, didnt think of that..

I said it before

and I'll say it again.

Google is taking a big dump on the same people that made it great to begin with. The webmaster. AutoLink, pre-fetching, sandboxing, etc. This kind of stuff can and will bite them in the arse in the long run.

However there is good news for all of the old time die hard SEO's. Now that they have released what they are doing you can work around what they have put in place and start spamming the index again!

I think it's high time that we all sit down and ask the real Google to please stand up.

I bet you no one even will.

Where Google Is Heading – the Final Road Map?

Google's filing a new search patent: if you read the fine print you'll find hard confirmation of what Google critics have surmised all along....

Here's the truth

HA!

 

lol

I'm sure there are lots of great ways to Spam this technology

You just need to think out of the box a bit :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.