Search Engine Filters and Penalties

34 comments
Thread Title:
Google, filters and penalties
Thread Description:

This one gets asked a lot but there is some decent advice and comments on this one.

Marcia

The discussion about whether or not there is such a thing as "filters" being applied by Google has come up several times, so it seems it would be good idea to examine the issue and get a clear picture about how filters operate.

Good idea Marcia :)

DaveN

some google penalties

Banned

“Slow Death”

-30 ( )

PR0

Guestbook / Links Pages

Site Has PageRank but will Not Pass it On

Prime Keyword Penalties ( oop )

Other Factors
Redirects, Duplicate Pages or very simalar Pages

I wonder how many of these Mr DaveN has encountered personally? hehehe

All in all a top thread, one for the bookmarks to show people. Knowing the faces present here, would anyone care to add? Perhaps even suggest some known Yahoo ones?

Comments

Value to Experience

That thread certainly shows how valuable experience and the willingness to learn are. ;)

It would have been a good thread

It would have been a good thread if Dave Hawley and Hugo guzman didnt take it off course!

Still there are some good bits in there.

Dave Hawley and Hugo guzman

Don't even go there, wonder how many thread those two kill..

DaveN

Too bad

Too bad the thread got personal = SAD

Well then...

Why dont we have a go at getting it back on topic?

What are the key pionts here with regard to Google, and Yahoo - The key filter/penalties that apply to each engine?

Anyone?

Nick

because they will just argue that Google does not have filters !! look how many times i tried to get it back on track

Give him NEG rep !! make him go red :)

Neg rep..

He got that from me on his first post [DaveH], just plain rude.

Anyway this is worth a read re: filters http://www.sethf.com/anticensorware/general/google-spam.php

Potentials?

Here are a few of my thoughts that although may not directly be ways of getting a penalty but rather come into the bigger picture of "Are these ways a search engine, now or in the future, can add or take away points from your overall SERP score?"

Some may see them as a penalty others may not but ultimately I feel it is a semantic argument (or even a lingual debate) as all that matters are the SERPs and your rankings in them!

You may or may not agree with my points of view and if not, that's fine so please take my thoughts with a very large pinch of white crystalline sodium chloride. :)

Readability Score:

Messrs Flesch Kincaid and their friends are wonderful chaps IMHO and led me to understand (via Jerseygirl) the importance of something being easier to read. Put yourself in the place of an SE techy. I believe they would think it a fair assumption that human edited spam is less of a problem than machine generated spam and as well as the standard fingerprint analysis techniques a score giving the readability would be sensible.

After all a human written page *should* (Chavs excluded) read in a better manner than the majority of machine generated pages. If you're unsure how a page reads then feel free to leech my bandwidth and server functionality reducer by using the bookmarklet at message 9

Personally I want to thank Markov for his work on machine generated SE spam :D

Knoll Peaks or Mountainous Pinnacles? :

I've said it lots and lots of times but you can't forget the incestuous nature of the web and the relationships within it.

As an abolsolute bare minimum look at uniqueness of content, IP address information, hostname and Whois information. I've said it for ages and you can see my thoughts here and there

I may be wrong (won't be the first or last time) but I may be right. I know which side of the fence I sit on, do you have a point of view on this subject?

Fraud Score:

How could you take into account the likelyhood of a site being fraudulent? I am sure the major search engines have put a lot of thought into this, if only to have a defence when the inevitable court case starts where someone was duped by a page that they found in Google (I know there precidents in place already but due diligence on the SE's behalf must take place)

Think along the lines of location. This worst case scenario of a site that wants to sell "419 Shirts" (BTW I hear The Register does an excellent range in these) should help explain it.

They target consumers in Nigeria.
They host in the US.
The domain is in the .za zone.
The payment processor is in Malaysia.
The contact info shows an address in China.
The contact phone number is a UK non geographic 0871
The WHOIS info shows it registered in Australia.
The DNS service is provided by a common no charge DNS company.
The mx records show a server in Belarus handling the email.

Let me ask you, "Would YOU buy from them?"

There are at least a thousand more areas that could be thought of as potential spam cleaners / penalty point additions but hopefully you're in a paranoid mindset and thinking what these could be :)

P.S. Nick, I get no benefit from any of the links in this message and tried to send most of them to G cache so hope it's OK to include them.

P.P.S. BUPA is excellent. You even get net connectivity :D

No mate..

I mean here..

If they want to take it off course at SEW, let's get back on track here at Threadwatch.

So, we have the main google penalties outlined in the start of this thread:

Quote:
Banned

“Slow Death”

-30

PR0

Guestbook / Links Pages

Site Has PageRank but will Not Pass it On

Prime Keyword Penalties ( oop )

Other Factors
Redirects, Duplicate Pages or very simalar Pages

Now, let me see if I have this correct:

Slow death = ? not sure on this one..

-30 = Does this mean a drop of 30 places?

PR0 = Okay, im clear on that - you get page rank zero'ed for being a bad boy like Bob Massa was for selling page rank

Guestbook / Links pages = Hmmm... well I know of the 'filter' but what are the exact consequences?

Pagerank not passed = ok, im aware of that. What can trigger it?

KW penalties = Yep, this is sandbox related?

Dupe pages = filter - no penalty, Google just shows what it thinks is the more valuable of the dupes right?

So, tell me everything :-)

So Tell you Everything?

Thanks but No,

but hopefully I told you enough to get the grey matter working and paranoia to set in with the myriad of other potentials we should all be looking out for :D

yeah

we posted at the same time :-)
check you pm's jason..

Quote:
P.S. Nick, I get no benefit from any of the links in this message and tried to send most of them to G cache so hope it's OK to include them.

Links are fine Jason, policy runs along the lines of "if it's higly relevant, i could care less who owns them" - It's only when people wait around for an excuse to post their links and dont contribute in any other way that i start to get a little narky :-)

They're good links, thanks for posting them!

Shame i don’t post more on

Shame i don’t post more on SEW then i could pass on more negative rep.

Having be penalised in the past it would be nice to know what other penalties there are and how to stay clear of them. Would like to know more about them, Nick has asked pretty much what i was wondering.
Slow death = ?
-30 ?

neg rep

You cant affect someone negatively with neg rep at SEW anymore, not for a few days now. THey just have pos rep or nothing - wussies..

-30 is a really old penalty a

-30 is a really old penalty and yes it moved down the serps 30 places lol

slow death : is very much like a dupe content filter but tends to target datafeeds(amazon, ebay feeds, stuff like that)

So a slow death is what exactly?

So, a slow death is what exactly?

Does the PR get continually reduced until it reaches 0? Would a site being given a "slow death" affect sites that link to it?

Could an Amazon Web Services site be given a "slow death"?

I am not certain that I know

I am not certain that I know exactly what slow death is (beings that I have not yet adequately dabbled in the XML feed type arts), but I think less and less pages get indexed and the whole sites rankings begin to drop.

I have experienced it quite a bit

if you do a site:-search some (more and more ...) show URL-only listings and the traffic decreases slowly (roughly 10-20% per week). After a while all results are shown as "supplemental results" and traffic is way lower. PR gets lost somwhen during the death.

Keith, normal sites using only aws most probably get caught by slow death. Ideally all do.

Additionally I don't agree that dupe content is the only reason for slow death. Tests take a bit longer then usually unfortunately. But there's still enough out there to SPAM, so who cares? ;)

AWS

Some AWS stuff i put up (maybe as long as) 2yrs ago and cloaked very badly has certainly seen a drop in traffic since i actively stopped doing any work at all on them.

The interesting/nice thing is that despite Google knowing that they cloak and despite the fact that they are in poor repair, they are still making money :)

btw, welcome to Threadwatch Keith! Do please introduce yourself

Slow Death

PR0 homepage - then watch the other pages go to URL only in the index.

OK So What About Yahoo?

Would anyone care to give an idea of what (if any) penalties one can find at Yahoo?

I've seen a lot of what appear to be C-Block bans but have no idea what triggered them...

Would be awesome if someone e

Would be awesome if someone explained the scheduling and / or page v/s site wide implications.

I'm sure DaveN and the rest have this in the back of their minds, but I have yet to see a document / thread which goes something like

(I don't know.. just speculating)
Penalty: Dupe Content
Triggers: Merchant datafeed
Schedule: Filter runs in batch mode every 3-6 months
Scope: Page
Remedies: Content shaker or independant offsite links or whatever

SEOBook if you're reading this.. would be a great addition to your book.

Slow death example

Ditto on the slow death penalty, except I didn't see any drop in toolbar PR at the time. More and more pages being listed as URL only in the SERPs, and loss of rankings for any pages affected. I believe on this occasion it was due to inadvertent dupe content. Two URLs returning 200s on the same site. Up until the slow death, the majority of the site was spidered under URL1, with a few duplicate pages under URL2. URL1 was the subject of the penalty.

The solution was to leave URL2, the unaffected domain, as the primary domain, and set up URL1 as a pointer. Within a fortnight the site was back up to speed, with no need to abandon any domains, and no loss of backlinks.

Yahoo penalties

Too many affiliate links is one I've heard bandied about alot.

Slow Death

I've seen Slow Death triggered on Google by classic over-optimization. Methods like:

title = only the keyword phrase
meta name description = keyword phrase is dominant
meta name keywords = keyword phrase is dominant
page header = only the keyword phrase
IBLs = only the keyword phrase used in anchor text
page name = only the keyword phrase, delimited or not
high keyword density on keyword phrase

That kind of machine-generated content.

Hmmm

I did see more than one person claiming exclusion from Y! with some really stuffed alt text, spacer gifs etc etc..

At the time i told him it was unlikely, but maybe it's unlikely with Google but likely with Yahoo?

Nick, or anyone, how sophisti

Nick, or anyone, how sophisticated do you really think Googles duplicate content detecting ability really is? At one time, a year or two ago, Nick I remember you telling me you didn't think they would be able to easily detect duplicates - it would just be too hard to compare every page to every other page.

Or what percent do you think needs to be duplicated for them to be able to detect or penalize it? And what about the difference between duplicate content within a site/domain compared to across domains?

A lot of people swear they've been penalized for duplicate content from datafeeds or whatever - but how can they really be sure that is what is causing it?

Duplicate Content

Is anyone out there a math wiz? If you are, please let us know your estimate of what it would take for Google to compare its current 8,058,044,651 web pages one with another to detect duplicate content.

Seems absolutely impossible to me.

That said, if I were publishing duplicate content, I would never place it on the same domain, nor would I cross-link domains with duplicate content.

I wonder if they would first

I wonder if they would first rank pages normally, then take the top whatever, and check for duplicates within that group? Then based on PR, age, or something else decide which one will rank well (or at all).

My thoughts exactly kirk

Yep,

And i *think* that's the gist of what we discussed a year or so ago is'nt it Trisha? - I dont think my opinion has changed much but, there are far more qualified people in here than me so it's a good question, let's hope for some more input from the likes of kirk.

Here's my thinking:

  • Google cant police/filter dupe content net-wide - period.
  • The can however, and maybe do, do it 'network wide' - meaning that in some cases maybe a trigger is set (aff. links?) that would cause them to take a look at who you're linking to and check for dupes - that theory is out there, but not too far fetched i think.
  • They do look at it site-wide and they do look at it in terms of direct links (as opposed to network wide though i dont think they look much further than the page being linked to) - you can see this in effect with the recent 302 hijackings of pages that certain directories with high(ish) PR have caused.
  • Beware the rubber stamp - when datafeeding - munge that data! - I certainly dont have proof but there's enough smoke out there to suggest a fire with datafeeds and google...

The 'network wide' thing coul

The 'network wide' thing could be a problem unless a merchant uses different information on their site than what is in the feed, if I understand you right. I don't know if hiding the affiliate links in some way would give any protection or not.

I really don't even know how to munge the data, except for doing a search and replace now and then, but I don't think that would be enough. Unless some sort of script is used, but other than adding gibberish or just changing around what is already there, I don't see what that would do. I'm a bit naive with this kind of stuff though.

Here ya go..

Some suggestions in this thread on duplicate content Trisha

'Munge that grabbed data into

'Munge that grabbed data into your rubber stamped spamazon stuff '

Are you saying 1)add that data onto the same page as aws data or 2)don't just add it to the page, but mix it up together.

Wouldn't 2) cause it not to read too sensibly for users? Or is that not a problem?

Depends...

I was talking about 2. Munge it into the datafeed.

I would split the datafeed by paragraphs, munge "see also" type stuff inbetween the paras - the kind of "see the top 10 results for" etc stuff. I'd also want to do a search and replace on certain terms and then replace them with modified terms that link to key pages in the site for example.

In terms of readability, it depends on how you want to do it. If you are truly trying to provide a great experience for the user then readability is a must right? If you are going to cloak the site and direct all incoming traffic direct to the merchants product pages then who cares?

You just gotta be creative :)

Thanks! I understand now.

Thanks! I understand now.

I've added paragraphs to the end of datafeed data, occasionally put stuff in the middle manually - never did anything completely automated, or on a large scale though.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.