Again, Google & Directories?

81 comments
Source Title:
Does Google Ban or Filter Web Directories?
Story Text:

A thread over at WMW started 2 days ago and is already spanning 17 pages and asks the question "Does Google Ban or Filter Web Directories?"

There seems to be quite a few people experiencing the same issue, has Google finally started to penalise those DMOZ rip-offs? Judging by the amount of replies it seems a large proportion of 'webmasters' are still slapping together DMOZ clones, they have to be penalised sooner or later surely?

Comments

There's also been talk of

There's also been talk of Google purging scraper sites - anyone noticed any real life examples of that?

If G really nullified the value of dmoz clones...

...my "prime" website would suffer tremendously :o

AIH, it's still doing fine.

I don’t normally pay much

I don’t normally pay much attention to most of these threads, never enough detail and never any agreement. But when someone who monitors as many results as DaveN thinks something might be up it’s worth taking a look

Quote:
I have been collecting quite a bit of data just recently and noticed that some WH sites got canned the only thing I can find is a large % of their IBL's are from directories.. is this the end of the road for smaller directories

SEW

anyone noticed any real life

anyone noticed any real life examples of that?

My sites have been hit pretty hard, i.e. removed.

In typical Google fashion I think they have been running some tests over the last few months on 3 or 4 different algos and then released then all at once on the 28th to reduced the change of reverse engineering.

I can say I've had sites killed which fall within the three most referenced reasons for being removed on the 28th,
1) Large directory
2) Dupe content
3) Large increase in pages

I've mentioned (3) here before and I'm sticking to it as the main culprit.
The only problem is that it may take me months to prove this theory right or wrong.

dupe content

as in empty cats does you no favours

Yes, but the dilemma is:

- should you have the empty cats making it easy for people to fill them?
- if not, will you leave them out and get loads of misplaced stuff?
- should you scrape something to "pre-fill" the cats (and isn't that a bit risky too)?

Hmmm, that was actually a trilemma... (?)

Nah... Just have general

Nah...

Just have general cats, then split them up when they get too big, easy.

If I were Google......

I'd be looking for sites without a focus and reducing their potential to rank for specific yet obscure keywords across multiple categories.

>Yes, but the dilemma

>Yes, but the dilemma is:

some people have the sub categories in place ready for submissions, but do not link to them until people submit to those categories or they add a few sites.

What I've Seen...

I started a handful of directories as test sites, each using one of the most popular directory scripts.

I haven't gotten too forensic as to what has settled out after the Google update, but at first blush, the ones that had "information pages" (hell, some look like pre-sell pages) for each listing still stand.

I was convinced prior to this that a format that was more mature than a simple link would stand the test of time AND be more useful to the traffic it garnered.

So, I guess if you're GoogleGuy reading this, you're feeling pretty good about the changes. If you're Get-Rich-Quick Directory Pimp, you're looking for a new cash cow strategy.

YMMV,
Brian

i do that

I have list items that link to subcategories "ONLY" if there are a listings in that category. If there are none, it is just text. People submitting entries can see all the cats though.

> do not link to them Or

> do not link to them

Or robots 'noindex' empty or near-empty categories

So what do we think the

So what do we think the problem here was then?

Follow the links...

for that Bluefind search. The top 3 results I saw were for empty cats....

*cough* dupe content *cough*

How different do you think the code / text looked to a robot?

Ahh.. well, there ya go

Ahh..

well, there ya go then...

*cough* dupe content *cough*

do you seriously thing so Tall Troll?

Sure the first few pages are like that but there is still plenty of other pages.

Besides this site has not been banned or else the 'site:' command would not work.

Sites hit by the scraper filter show zero results for 'site:'

>> Sites hit by the scraper filter show zero results for 'site:'

I mean within the site, as ukgimp pointed out

Quote:
as in empty cats does you no favours

Lot of pages with very few differences showing there, and I bet there a TON more not showing any more with few differences too...

Also, check out a lot of those "URL as title" entires. A lot of them are from subcats of the Shopping main cat. Wanna bet that has been penalised too, hence no deep spider?

but surly the same logic

but surly the same logic would apply for sites like blogger with 1000s of opened account with no content added?

Nick, what is the story behind bluefind.com?
Did it lose rankings, pages other things.....

Im not sure how many pages

Im not sure how many pages it used to have, but it was *thousands* rather than 500 or so

we talked about it ages ago, and i seem to remember it having over 10k pages index im sure..

>> but surly the same logic

>> but surly the same logic would apply for sites like blogger with 1000s of opened account with no content added?

Not really. Some blogger pages actually have backlinks.... even some from other domains! A quick check for Bluefind shows links to the homepage only, pretty much

A quick check for Bluefind shows links to the homepage only

great point, that seems like a more plausible rational for Google to apply (to me anyway).

the test could be something like:
if percentage of deep links as a ratio of pages on a site is lower than 1/2000 then kick their spammy butt!

A low ratio would indicate that the deep content was of little value while a high ratio would indicate quality content that was worth receiving links.

bluefind last week was 19,200

I monitor a large handful of directories on a daily/weekly process and Bluefind had 19,200 pages listed for most of this year until last Thursday/Friday and it dropped to 534 pages

thanks for the info Mick

further to my last post
in Yahoo
link:http://www.bluefind.com 291,000
linkdomain:www.bluefind.com 294,000

so with 99% of the linkbounds pointing at the root we might be learning something here

compare this with

link:http://www.dmoz.org 1,660,000
linkdomain:www.dmoz.org 5,110,000
where only 32% of the inbounds are directed at the root

The sites I have seen go under have a deep to root link ration of under 2% too

anyone else?

fwiw

The cloning issue aside, I think directories are being negatively impacted by the algos moving to better semantic analysis. Directory scripts (and the resulting themed pyramids in my case) have served me well over the last 7 years but I'm already substantially migrated to pages that are more autonomous from the rigid structure and weak/empty cats that plague directories.

Directories?

I don't think it's directories as such that was the target, only some directories fitted the description of whatever it was they were after.

Isn't it true...

...that if a directory were somehow (but beyond doubt) linked to someone who's known for doing evil* things, the directory in question may be labelled "evil" as well????

All hypothetical stuff of course.

* in Google's view: say - sell links for PR

I have one directory that

I have one directory that has 99% of the 'known' links pointing toward the root, google just indexed another 3500 pages last week, bringing the total pages indexed to well over 100,000. so I don't think this problem has to do with linking to the root.

I believe all this has to do with the google duplicate filter and some of the directory scripts out there. Some of those cheap directory scripts just love to create duplicate pages.

I don't think it's

I don't think it's directories as such that was the target, only some directories fitted the description of whatever it was they were after.

That's kind of my feeling also. At just about the same time directories started to disappear there was another thread on WW with comment by GG that G had implemented something to go after scraper sites.

Both threads coming on the heels of one another seems to be a bit too much of a coincidence. I'm leaning toward the theory that G's scraper profile is a bit wide and that the algo's net also caught some directories.

My directories are doing fine - they did not ban directories

It's difficult to read between the lines in all those messages at WMW where people are weeping, gnashing their teeth, and sprinkling ashes on their foreheads, but I have a sneaking suspicion that what linked them to to the DMOZ clones would have been their use of Javascript ad servers -- like Google AdSense.

All the Spamad sites that used to dominate the first pages of search results in my daily research have magically vanished. Suddenly, Google is serving good content again.

The people whose hand-edited directories have also vanished don't say (in the pages I scanned) whether they carry Google Ads, but at least one person mentioned complaining to AdSense about the disruption in listing.

If everyone who lost a site has been following the same format as the SpamAd sites -- forcing the user to scroll past intrusive Google Ads -- maybe Google simply delisted those kinds of sites based on placement of the ads.

I still see content with ads being served through Javascript, but they are nowhere near as intrusive or dominant as the SpamAd sites have been. You see real content first.

Just a hypothesis. I don't know how to prove or disprove it without everyone who has been delisted honestly fessing up to shoving Google Ads down their visitors' throats (or showing that they haven't been doing so). It's a pity so many of the SEO forums still forbid people to post URLs. It makes it really hard to see what the truth of these various complaints are.

>>>...there was another

>>>...there was another thread on WW with comment by GG that G had implemented something to go after scraper sites.

And what is the most logical way to go after scraper pages?

My 2cents sz its changes to the duplicate filter.

Well, GoogleGuy has sort of helped clarify things

See this discussion at WMW

Quote:
...from listening to feedback that the search engineers heard at the the last WebmasterWorld pubconference, I have a strong hunch that we're going to be taking a closer look at sites that are just scraper sites, or throwing up a copy of the ODP with no value added. So I wouldn't be surprised to see (for example) sites that are just scraping Google (or possibly other sites) not doing as well over time.

Hope I did the link right.

He doesn't say how they are doing it. I don't favor the duplicate content filter hypothesis at all. Google has been handling duplicate content for a long time, now. I have looked at a couple of sites mentioned in very obscure forum complaints since July 25 (when this update began rolling out) and neither site is actually delisted as its owner believes. Google brings up both for various test searches I performed.

Neither site carried Gooogle Ads, though.

I think they are looking at the ad placements and prominence, for lack of anything more likely to point my finger at. They have had the ability to crawl Javascript for quite some time now. I wonder, if that is what they are doing, what other Javascript thingees they will go after?

I am sure I will get corrected

if I am wrong...

I believe that the only way to detect scraper pages(duplicate/ripped off content) across a reverse index is by using a duplicate check/filter. There is no other way, at least that I can think of.

Dupe filters seem more than

Dupe filters seem more than likely. It's clear to me, admittedly only from reading the anecdotes of others, that directories were NOT targeted specifically.

Dupe content was, and directories fell afoul of that in many cases. Scraper sites could also be trimmed in a similar fashion, and i wouldnt be too surprised to find out that the deep link ratio was also a factor.

We all seem to be focussing on one thing, be it ads, dupes, or links - we should be looking at combinations, though i think ads is unlikely...

totally agree Nick

I was talking with TallTroll today and we threw out the idea that maybe scraper sites have been providing many directories with the only deep links and varied anchor text they have been getting.
So when the scraper were hit over the weekend the directories took a correlated dive too as they now fell short of an older filter.

Duplicate content is still being served in the results, hence

An adjustment to Google's duplicate filters is the least likely explanation. They have found a way to target three kinds of sites (based on what I am no longer seeing in the search results).

1) DMOZ clones (everyone seems to agree on this)

2) Scraper sites (everyone seems to agree on this)

3) RSS-feed driven sites (haven't seen any independent references to these, but I have been tracking several over the past few months and they are gone now)

All these sites had one feature in common: they were serving ads as their primary content (first thing the visitors saw). Not all were using Google AdSense.

I would have to look at several of the delisted sites, but so far none of the URLs I have investigated (and there are darned few of them being linked to in the forums) are actually delisted. They come up just fine if you search for the contents in their TITLE tags.

If anyone has a list of URLs to share, please do so.

Debra at Jill Whalen's High Rankings forum has commented on a similar pattern in what she saw (I am not claiming confirmation).

Quote:
...Of the handful of Directories I'm watching fall out of Google's index, all but one have been hard at work adding footer/site wide links instead of content. Others fill pages with Adwords and offer only a handful of "real" sites and some never build out cats. Search engine bots take note of changes/updates/additions and most importantly, inbound links. Directory owners need to cultivate a link and search marketing plan just like any other site.

It is too soon to draw firm conclusions. But, based on the discussions I've been reading in many fora over the past few days, I just don't see any indication of a duplicate content filter adjustment.

>>>...based on the

>>>...based on the discussions I've been reading in many fora over the past few days...

Come on Michael, not a very sound foundation to base your conclusions on. ;-)

Regarding "based on the..."

Quote:
>>>...based on the discussions I've been reading in many fora over the past few days...

Come on Michael, not a very sound foundation to base your conclusions on. ;-)

What would you have me look at? All I can do is scour the forums for threads where people complain that their sites have dropped out of Google and look at what evidence (if any) they present.

Most of the conclusions people share are based on considerably less, from what I see.

dupes

- now, where's Marcia? She pointed me to a paper some time ago about how to identify duplicate content when the dupe content is not full pages (ie. copies), but in stead smaller chunks, like paragraphs and phrases. I'm not saying this is it, only that it's perfectly possible to identify duplicate content, even if that content is just a small part of a page.

However, special linking patterns should be something that the Google engineers are experts in. I personally like the theory about homepage-only links, as it makes good sense to me. That is: The reasoning that if you link deep the site's worth it. Don't know if it makes sense to Google, and I can see that for some types of sites (news sites, mainly) it's misleading as most people will link to the front/news page, not to individual articles.

Duplicates

Quote:
now, where's Marcia? She pointed me to a paper some time ago about how to identify duplicate content when the dupe content is not full pages (ie. copies), but in stead smaller chunks, like paragraphs and phrases.

Google was doing that prior to the July 25 update. What they have done now is considerably different.

Quote:
However, special linking patterns should be something that the Google engineers are experts in. I personally like the theory about homepage-only links, as it makes good sense to me. That is: The reasoning that if you link deep the site's worth it. Don't know if it makes sense to Google, and I can see that for some types of sites (news sites, mainly) it's misleading as most people will link to the front/news page, not to individual articles.

No, the RSS-feed driven SpamAd sites that are no longer showing were definitely deep-linking to inside content. So were some of the DMOZ clones, as well as the scraper sites.

now, where's Marcia? She

now, where's Marcia? She pointed me to a paper some time ago about how to identify duplicate content when the dupe content is not full pages (ie. copies), but in stead smaller chunks, like paragraphs and phrases.

If Marcia pops us and can point us in the right direction (and I'm not at all doubting that she can), I'd be more liekly to consider the dupe content theory. But from what I've read on some of the threads around the boards is that quite a few hand-rolled directories also got the ax.

Of course it's possible that some of these hand-rolled directories were scraped and then subject to a dupe content penalty, but until something pops up about having the computing power necessary to find a few lines of dupe content among xxxx number of sites -- and then penalizing the entire site -- well, I think I still have to go with an algorithmic profile of some sort. Maybe - just maybe - some dupe content is a part (yeah, I'm backsliding a bit due to overwhelming opinion), but I'm still going to think that some other factors weigh more in the "scraper" profile.

"Overwhelming opinion"...

Should anyone wonder why I keep reminding people to disbelieve 95% of what they read in the SEO community discussions?

It's highly doubtful Marcia's theory has any relevance to the current situation (particularly given that she was discussing it prior to the rollout of this update).

I'm still looking for evidence of a duplicate content penalty or filter and have found none.

In fact, I'm still looking for the URL of a site that has been knocked out the SERPs by Google (other than one of the SpamAd sites). So far, all the pages I have looked at still come up for their title tags (although one recent site owner indicated the page should be coming up for sub-phrases of the title tag -- the page was laden with many outdated meta tags).

whilst not evidence

Placement of ads seems unlikley to me. The adsense team themselves provide a nice little guide to ad placement:

Adsense Placement Tips

I go with the most plausible to me which is empty cats (dupe) and sites losing deep links due to scraper site removal.

The SpamAd sites were not following those guidelines

They all characteristically force the user to scroll past 2-3 screenfuls of ads. Based on the total lack of evidence regarding any changes to the way Google is handling duplicate content, it is hardly "plausible".

Right now, the only hypothesis which fits the available facts is the one I have proposed, simply because the ads are a common factor among all the missing sites.

If Google had really gone after duplicate content, then there should be other examples of duplicate content sites missing. So far, no one has mentioned any (that I can confirm by looking at URLs) and I have probably scoured the 15 or so most popular SEO forums so far (by no means have I visited them all).

Again, if anyone will share URLs, we could probably get a much clearer picture of what is going on very quickly. As it is, there is simply no evidence whatsoever that this is all due to a duplicate content filter. Blind opinions don't constitute evidence.

....not following those guidelines

>>Neither site carried Gooogle Ads, though.

So a sample of two out of two did not have ads? :) I know you like to deal with facts and statistics, so this does not help the ads theory.

Tackling some types of scraper sites could be quite easy, they have definite footprint. Now I have many sites that show backlinks that are from scraper sites. Now if those scapers are/were taken out I am likely to lose some of what I had.

Those same scraper sites will have footprints similar to a lot of directories:

Similar page sizes due to similar (if not exact!) number of links out. Similar snippet sizes (as they are robbed from google/yahoo/the site) and made using a program.

As well some directory software only allows entries of a certain size, which may put them into the path of the collateral damage that occured during the proposed cull of scraper sites.

A new direction

Quote:
>>Neither site carried Gooogle Ads, though.

So a sample of two out of two did not have ads? :) I know you like to deal with facts and statistics, so this does not help the ads theory.

Doesn't help anyone's theory, since the sites were not actually delisted (as their operators believed).

Quote:
Tackling some types of scraper sites could be quite easy, they have definite footprint.

Google didn't just tackle "some types of scraper sites", though. They tackled three very different kinds of sites. But a new hypothesis occurred to me while I was on my way in to work this morning.

Since Google has been (prior to last week) substituting paragraphs for Web sites (that is, the paragraphs on the SpamAd sites were sufficient to get those sites into the search results in place of the Web sites they were linking to), all it might really have required would be a simple reversal of the algorithm factoring for those listings.

That is, Google has been (for several months now) equating small paragraphs with whole Web sites as duplicate content. What if they just flipped the greater than sign? (Metaphorically speaking)

It's as likely as anything, based on what little we know, and it fits with everyone's ideas to a certain extent. The chief problem with the SpamAd sites was that they were superceding the sites they linked to. In my work, I have to research dozens of companies every week, and I often had to click on the SpamAd sites in order to get to the real Web sites they were linking to.

Google just didn't want to show the real content. All that has changed, and now Google is showing the real content instead of the SpamAd sites.

Of course, some other things have changed, and they must all be interconnected in some way. But I seriously doubt that the deep-links from the SpamAd sites had much to do with anything.

Links are not THAT important to Google (the SEO community may eventually catch onto this fact, but I have serious doubts). Linkage is too easy to manipulate, and Google has been looking at many other factors for a long time.

For example, a badly formed title tag can kill you. Since last week, a number of people have been posting complaints like this in several forums:

"My Web site ranks well on Yahoo! and MSN for 'Gobbledy Gook' but it doesn't show up on Google. I have optimized my site for 'Gobbledy Gook' so I am wondering if I have been penalized in some way. I don't use any black hat SEO techniques. Please help."

Naturally, when you look at the site, the title tag says something like "Gobbledy Gook Itsy Bitsy Spider Whoop Whoop!" and if you search for that expression in Google, the so-called optimized-but-penalized site comes up first.

Title tags are important to Google. Some people get that. Many still don't.

I don't build links, but I rank well on content for some highly competitive expressions. Everyone's mileage may vary.

Well, we are going to have

Well, we are going to have to settle for differences of opinion here.

I go back to my previous statement in this thread;

"I believe that the only way to detect scraper pages (duplicate/ripped off content) across a reverse index is by using a duplicate check/filter. There is no other way, at least that I can think of."

If anyone can tell me how google (or any SE) is going to reduce scraper pages without using a duplicate check, I would LOVE to hear it.

This has been pointed out, but I thought I would emphasize it a little. The missing links from those scraper pages that have been removed from the index (those caught in a duplicate check) will hurt some sites and lower their ranking.

To some people this lowering of rank might appear to be a penalty.

Links are not THAT important to Google (the SEO community may eventually catch onto this fact, but I have serious doubts). Linkage is too easy to manipulate, and Google has been looking at many other factors for a long time.

If you keep talking like that Michael, I am going to have a hard time giving any credibility at all to your statements.

Credibility

Quote:
Michael: Links are not THAT important to Google (the SEO community may eventually catch onto this fact, but I have serious doubts). Linkage is too easy to manipulate, and Google has been looking at many other factors for a long time.

lots0: If you keep talking like that Michael, I am going to have a hard time giving any credibility at all to your statements.

No need to struggle with it. I base my comments on substantiable facts (I have linked to Google's Web site and the technical papers many times), not on SEO forum chats and opinions.

Google plainly states that it's not all about links. It's the SEO community stalwarts who have yet to catch up to Google.

In any event, until someone shows some evidence that Google is filtering on duplicate content (and, as usual, no one has), that's just another shot in the dark.

What we know for sure is that they have removed at least three types of sites which were all created for one purpose: to dominate a random selection of rankings for the sake of putting advertising in front of surfers.

As far as finding pagragraph

As far as finding pagragraph or snipit duplicate content, these guys have the edge on that.

http://www.turnitin.com/static/home.html
or
http://www.ithenticate.com

If they can do it google can do it.

>>>Google plainly states

>>>Google plainly states that it's not all about links.

google plainly states a lot of things, some of which are just not true.

>>>In any event, until someone shows some evidence that Google is filtering on duplicate content (and, as usual, no one has), that's just another shot in the dark.

I have a lot of evidence, you can get your own, just throw up some duplicate pages (or parts thereof) on different domains and watch what happens, really very simple.

>>>What we know for sure is that...

No your still assuming, for every example you can show, I can show one to debunk it.

Like I said, we are going to have to leave it at a difference of opinion.

Talk about credibility issues

Quote:
>>>Google plainly states that it's not all about links.

google plainly states a lot of things, some of which are just not true.

Friend, it's very difficult to take anything like that seriously. Google may occasionally have some outdated information on their site, but catching them in a lie is not easy.

Again, I don't subscribe to all the conspiracy theories that are so popular on the SEO forums (someone has already suggested that Google is trying to do in the competition at WMW, for example).

So, no, I'm not the one making assumptions here, lots0. You are. I'll be glad to look at any actual facts you may care to produce.

So far, your opinion and the facts are not converging.

Now, getting back to the lack of URLs. I found ONE SITE in someone's profile that is truly no longer listed in Google. The front page seemed clean enough and I was ready to concede it didn't fit my adscript hypothesis. But then I looked at a secondary page and, yup! there were "Ads by Goooooogle" in two sections on the page, forcing the visitor to hunt for the real content.

While that remains nothing more than a documentable coincidence, it is nonetheless more substantial than anything else I have yet seen.

>>> I found ONE SITE in

>>> I found ONE SITE in someone's profile that is truly no longer listed in Google.

How about a URL?

>>>So far, your opinion and the facts are not converging.

I can safely say the same... ;-)

The removal of scraper sites

The removal of scraper sites and the loss of deep links I feel is the main cause, if its the position of adsense why would the adsense team advise the best position and ad format to use ??

this problem was discussed about 4 weeks ago but has only just hit the mainstream last week, in fact 4 weeks ago the thought of this happening to directory sites was being played down by the nayslayers

http://forums.searchenginewatch.com/showthread.php?t=6642

It's a UK Web site

lots0:

I am reluctant to spotlight someone without their permission, but they host PC reviews. You may be able to find who I am referring to with a little creativity. I debated whether I should include the URL or not, but I know that even Threadwatch is a little sensitive to third-party linkage.

mick g:

Quote:
The removal of scraper sites and the loss of deep links I feel is the main cause, if its the position of adsense why would the adsense team advise the best position and ad format to use ??

As I said above, the SpamAd sites were not following the guidelines. I actually came across a number of discussions today, from the past few months (none really current), where people have discussed whether to use the embedded link for reporting abuse.

Maybe all the sites that disappeared were sites that people had reported to AdSense as operating outside the guidelines. The most obvious and likely violation for the vast majority of the now-missing sites (in my opinion) would be that the sites appeared to exist solely for the purpose of displaying ads.

While the UK site strikes me as being a legitimate site in intention, its articles dno't seem to be very long (there are too many for me to have looked at all of them). But it displays the ads, then the brief article text, and then the ads. So, clearly, the site operator is attempting to maximize his cash flow from the articles.

Would Google have declared that a violation? I don't know. The ads are still displaying. So, if the site was banned for violating AdSense policy, it probably wasn't banned by the AdSense side.

Remember that Google's left-hand sometimes doesn't know what the right-hand is doing.

This is only a hypothesis, based strictly on my examination of one site, and any number of conclusions could be drawn on the basis of what I found there.

For example, on the front page, there was a headline which read something like "Tiny users won't get their money back" (I don't think that is the exact wording, but I believe I linked to the site on another forum and I'll come back here with a correct citation later if that is wrong).

I found that headline on ANOTHER site in Google, associated with an article which was substantively different. If Google decided these two sites were displaying duplicate content, then why did it drop the UK site and leave in something like 8-10 pages on the remaining site which all displayed the exact same content?

This is one of the many reasons for why I don't accept the duplicate content filter hypothesis. Everything I have seen so far just contradicts that explanation.

Not being funny but....

I am not being funny Michael but you are talking about one review directory site what is not looking as healthy as it was, try checking this list of directory sites

http://www.strongestlinks.com/directories.php?sortcolumn=sat

other than the obvious directories like yahoo and business.com and a few others most of them are not looking is as strong a position as what they used to be in especially with the amount of pages they have had removed in the last few weeks

likewise I also dont want to out any individual sites but the above list is a good example of accross the board directories what are now showing url only on a lot of pages and have had many pages removed, now that can either be a google glitch or its a penalty because these same pages did once have a full title and description

as with any update you will always see loosers and gainers but this time there are a lot more loosers but not only that they have had 10 x 1,000 of pages removed or have been hand removed from the index altogether, that is the point I am trying to make, not all of these sites would of stepped outside of the google guidelines but are looking like they have picked up a penalty which getting back to my comment above if these same directory sites had been listed on 10 x 1,000's of scraper sites which does seem to be the case, you remove these back links and you have left a massive black hole.

I know this is true because I have just had 420,000 back links removed from one site and is currently on a life support machine, even after 2 emails to google explaining the situation it has made no difference.

Directory coverage

mick g:

Quote:
I am not being funny Michael but you are talking about one review directory site what is not looking as healthy as it was, try checking this list of directory sites

http://www.strongestlinks.com/directories.php?sortcolumn=sat

other than the obvious directories like yahoo and business.com and a few others most of them are not looking is as strong a position as what they used to be in especially with the amount of pages they have had removed in the last few weeks

I'm not sure what you think the relevance of that report is to Google's update. Whether any of those directories have been affected or not isn't the issue. The issue is, does this update target directories. The clear and obvious answer is NO, it does not, because many directories remain unaffected.

Once again, people have taken up an unsupportable sweeping generalization and concluded that it must be the right answer because the facts clearly contradict it. The SEO community just continues to amaze me.

My point is that, if Google were targeting directories, we shouldn't be able to find directory listings in Google. It's not the fact that a site is a directory that got it dropped from Google's index (or that saw its rankings performance drop). There are other factors involved, and we don't have to know what they are to see that the change is not based on who has a directory site and who doesn't.

I suggested that the AdSense guidelines may provide some insight into what Google is doing. Whether Google got rid of only the sites it intended to get rid of, or whether they have made another mistake, is an entirely different matter.

Michael Martinez

Michael Martinez said : Should anyone wonder why I keep reminding people to disbelieve 95% of what they read in the SEO community discussions?

a couple more posts and you will have 95% coverage in this thread and then I will agree with you.

DaveN

but catching them in a lie

Quote:
but catching them in a lie is not easy.

Guffaw, Guffaw.

I expect that you don't count their vague and misleading FUD statements as a 'lie', so lets remove any semantic confusion and state for the record:

"Google use FUD to mislead and confuse the SEO community"

Surely you don't find that hard to believe.

Google and the SEO community

Quote:
I expect that you don't count their vague and misleading FUD statements as a 'lie', so lets remove any semantic confusion and state for the record:

"Google use FUD to mislead and confuse the SEO community"

Well, seeing how as the SEO community specializes in misleading itself with a great deal of nonsense, I'm not really concerned about how many people in that community want to call Google a group of liars.

They have done things I don't approve of (such as pre-fetching content). But outright lying? Sorry. That dog won't hunt.

I'll take what Google says about its operations over the Phil Cravens of the SEO community any day of the week. At least Google can be held accountable for what it says. The SEO community often behaves like a gang of wild west outlaws.

And the reason I mention Mr. Craven is that he was recently taken to task by GoogleGuy for claiming that Google holds all SEO to be spam. GoogleGuy claims that is not so, and I believe him. Mr. Craven also continues to tell people that Google begins its PageRank calculations by assigning a value of 1 to all pages, which is utter nonsense. The various technical papers which discuss the PageRank algorithm (and modifications to it) in detail stipulate that initial PageRank for each document is set to 1/n (where n = the total number of documents in the collection).

PageRank is a probability measurement, and probabilities can only exist in the range of 0..1. The ToolBar PageRank is not produced by the PageRank algorithm.

As long as people in the SEO community continue to get their "facts" from experts like Mr. Craven, I'll stand by what Google says. They are considerably more reliable in that respect.

Dave N

Quote:
Michael Martinez said : Should anyone wonder why I keep reminding people to disbelieve 95% of what they read in the SEO community discussions?

a couple more posts and you will have 95% coverage in this thread and then I will agree with you.

Then I'll be catching up to plenty of other people, won't I? But at least I make an effort to include actual information in my posts.

Speaking of which, I have noticed a significant decrease in the number of matching results reported by Google across a broad selection of queries. They have not updated the reported size of their index on the main page, but queries which used to report over 1 million hits now show several hundred thousand fewer hits. And queries which used to report several million hits may now report as many as 1 million less.

I am inclined to agree that a substantial number of pages are being filtered out. Far too many to be coming solely from directories.

collateral damage

large drops could be due to the culling of scraper sites. Some directories are very like scraper sites and as such may be affected also. Collateral damage has happend before and will happen again. If your site comes close to a "new" breach in guidelines or is similar to whatever their "most wanted" is, then expect some pain.

Misinformation is a great weapon

G are at war with the SEO, they dont like us.

Misinformation as a tool has been used for a long long time and not just in the SEO business.

Take for example that patent they did. You cant seriously think that they ARE implementing all they wrote about (length of regstration details etc).

Classic misinformation, keep the animals guessing. It's not like they are going to admit it though :-)

The promised URL and followup to ukgimp

The site I was speaking of earlier is www .pcreview .co .uk and as of yesterday they were not appearing in Google. I don't know when the last changes were made. I have not contacted the site owner.

ukgimp:

Quote:
large drops could be due to the culling of scraper sites. Some directories are very like scraper sites and as such may be affected also. Collateral damage has happend before and will happen again. If your site comes close to a "new" breach in guidelines or is similar to whatever their "most wanted" is, then expect some pain.

I think you have raised some important points. How much impact did the SpamAd sites have on other sites? I know that some of my pages were knocked out of the rankings by SpamAd sites that had links to my sites. All those sites have since been restored, now that the SpamAd sites have been removed. I am beginning to wonder if Google wasn't simply relying on a partially completed algorithm, since they apparently rolled out the full update over an extended period.

Quote:
G are at war with the SEO, they dont like us.

Misinformation as a tool has been used for a long long time and not just in the SEO business.

Take for example that patent they did. You cant seriously think that they ARE implementing all they wrote about (length of regstration details etc).

I don't fault Google for being vague about trade secrets. I only fault the SEO community for coming up with hare-brained explanations for what Google does and then treating those explanations as established fact despite very reputable sources of information which contradict the wild ideas.

I also fault the SEO forum operators for encouraging the madness through their policies of forbidding the posting of URLs and their toleration of the kind of conspiracy theories that permeate the post-update discussions.

Every time Google changes something, there are a handful of people who loudly proclaim that Google has gone after the eCommerce sites and anyone else who got in their way. The hysterical postings (not to mention the condescension and outright flames by moderators) do as much to harm the SEO community as the pseudo-factual explanations.

First Michael- Phil Craven

Fist Michael- Phil Craven has not joined in this thread - your bringing him up like you did is a major breach of manners to say the least.

Phil knows more about search engines than matt cutts and certainly more than you. (BTW I agree with Phil - google does look at all SEO as SPAM - if matt cutts is saying different he is a bold faced liar.)

Michael if you have such a low opinion of SEO and the people involved with it why come here and the other SEO forums and 'hang out' and post your 'theories'? Why is it so important to you to post your theories and then defend them to a bunch of dirty low life SEOs, like us?

Michael, I'll tell you what I think, I think your a wannabe SEO, that just can't quite make it in the real world as an SEO... So you scour the SEO forums looking for the golden grail... Good luck my boy!

Too bad TW does not have an ignore feature, I know who I would use it on from now on...

Fist Michael? Mis-spelling?

You're right, Phil is not participating in this thread. But then, he does continue to post misinformation regarding PageRank and how Google views SEO in a very popular SEO forum (SEW), so his comments are quite relevant to the points I have made here.

Would you like me to single out other purveyors of misinformation? They don't all post at SEW, but then, they seem to get around. Marcia is another good example.

The list is long, and it doesn't take much effort to show that they are advocating fallacious points of view on various topics.

So, get all riled up because I don't buy the nonsense these people post across the Web every day. That doesn't matter to me.

At least I understand how probabilities work, and I'm not out there telling people "It's all about links".

Quote:
Michael if you have such a low opinion of SEO and the people involved with it why come here and the other SEO forums and 'hang out' and post your 'theories'?

There are people who appreciate what I say. And I scour the SEO forums, as I have pointed out before, for specific types of threads -- the ones where people report sudden changes in rankings.

Instead of lobbing standard replies at them without checking the facts (which many of the SEO forum regulars do every day), I investigate the sites and look at their situations.

Learning from what is actually happening in the world of search engines, rather than rehashing the same tired old theories and tossing around silly update names, is more important to me than to most of the people in the SEO world.

I understand that. Cannot change it. But I've been doing this longer than most of you, so your opinions of me don't matter to me.

And Googleguy can be ecomical with the truth...

We (well me anyway) had a spat with G about 302's and duplicate copies. Himself (Googleguy) rushed into print at Slashdot with

Quote:
Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

Indicates that GG is not to keen on forums, and is prepared to offer disinformation (in that case about PR being the key to sorting duplicate copy)

Michael if you have such a

Quote:
Michael if you have such a low opinion of SEO and the people involved with it why come here and the other SEO forums and 'hang out' and post your 'theories'? Why is it so important to you to post your theories and then defend them to a bunch of dirty low life SEOs, like us?

In light of the continued insults to TW members and the SEO community at large, i think that's a valid question?

Can someone please ping phil? If he doesn't get the chance to defend himself, i don't think it very fair...

About GoogleGuy

Cornwall:

Quote:
Indicates that GG is not to keen on forums, and is prepared to offer disinformation (in that case about PR being the key to sorting duplicate copy)

Or maybe he just doesn't see things the way other people do. As a past victim of 302-hijacking, I take the subject very seriously, and I like so many other people continue to fault Google for not addressing the problem as other services have.

But I'm hardly Google's biggest booster. I have always maintained, for example, that PageRank (link popularity in general) is one of the dumbest ideas to ever hit the search engine industry. It continues to remain popular with the technologists. So what? Microsoft continues to remain popular with the business community despite the fact they have produced the most unstable operating systems in history.

That's life. We adjust to it.

If its not all about links...

then why are these two searches almost identical?

You can see where some of the switching is coming from on-page stuff, but apart from some shuffling the top 10 are identical...

Identical?

Quote:
then why are these two searches almost identical?

You can see where some of the switching is coming from on-page stuff, but apart from some shuffling the top 10 are identical...

You're judging two queries by the first page of results?

What about this query? How does linking affect its results?

What about this query?

Yes, ping Phil

I would like to see him explain why, when so many technical papers say that PageRank starts with 1/n, he continues to tell people it starts out at 1.

I would like to see him explain why, when PageRank is definied to be a probability distribution, he continues to tell people they can increase their PageRank scores past 1.

I would like to see if he now accepts GoogleGuy's assertion that Google doesn't hold all SEOs to be spammers.

Indeed, ping Phil.

>>>...so your opinions of me

>>>...so your opinions of me don't matter to me.

Ya right. So you always write ten paragraphs to people whos opinion does not matter to you...

I am done with this thread, before it gets nasty.

LOL, wrong end of stick firmly grasped

Michael, of my examples, one is for a competitive search term, the other is using the "allinanchor" switch, which is essentially a query on G for the top sites strictly according to anchor text data. Since the 2 lists are identical in content, and VERY similar in ordering, it is reasonable to conclude that that term is being decided almost exclusively on linking data.

The slight variations in positioning represent the entire effect of on-page factors

>> What about this query? How does linking affect its results?

Since you've provided no comparison, it is a menaingless question

>> You're judging two queries by the first page of results?

Doing so puts me in the company of around 80% of surfers, and a fair number of search engineers....

TallTroll, about your links

Quote:
Michael, of my examples, one is for a competitive search term, the other is using the "allinanchor" switch, which is essentially a query on G for the top sites strictly according to anchor text data. Since the 2 lists are identical in content, and VERY similar in ordering, it is reasonable to conclude that that term is being decided almost exclusively on linking data.

The slight variations in positioning represent the entire effect of on-page factors

Well, unfortunately, the one search I really wanted to show you was edited for what I agree were reasonable concerns.

My point, however, is that anyone can trot out a couple of searches that appear to demonstrate some massively important principle that someone else supposedly doesn't get. A small comparison like that is not statistically useful.

Quote:
>> You're judging two queries by the first page of results?

Doing so puts me in the company of around 80% of surfers, and a fair number of search engineers....

I'm one of those people who looks at larger samples, thank you. I'm not interested in staying back with the pack.

You just don't get it, do you

Quote:
But outright lying? Sorry. That dog won't hunt.

Its not a dog, and it doesn't need to hunt.

'Outright lying' would be stoopid. They don't do stoopid things. They mislead in a way that can be denied/defended later - in the same way that Bill Clinton did over Monica.

You know that, and you know what the real score is.
There are logic gaps a mile wide in your reasoning.

I haven't got time enough to argue with you as I suspect that no amount of reasoning would get past the 'I am never wrong barrier'.

I know Nick doesn't allow personal insults and attacks on this forum, which is a healthy policy normally. However there are times when it would be nice to tell someone that they are our of their depth and point our how idiotic they are being in the most insulting terms possible. This is one such instance.

well

>Marcia is another good example.

Marcia can well look after herself but I would like to say that comment is uncalled for if not downright cowardly. If you are looking to goad people into picking fights with you try me, I enjoy it.

NFFC

Quote:
Marcia can well look after herself but I would like to say that comment is uncalled for if not downright cowardly. If you are looking to goad people into picking fights with you try me, I enjoy it.

The problem here, gentlemen, is that I'm not looking for a fight. I'm just pointing out unpopular facts, and several of you are rolling onto the balls of your feet and curling your hands into fists.

The SEO community has generally proven unwilling to accept criticism and different perspectives. The responses here today are typical: rather than provide clear citations of facts that refute some point you disagree with, you get all testy and defensive.

I've been accused now of trolling, lying, picking fights, insulting people, etc., etc.

Tsk, tsk. Is that best you have to offer your fellow SEO professionals? What about the questions I have asked?

For example, what evidence does anyone have that a duplicate filter is targeting DMOZ directory clones?

The fact that DMOZ clones have vanished isn't evidence of such a filter. It's only indicative of a change in Google's behavior, and a filter is one possible explanation. But given the lack of evidence for it, it's no more likely than anything else.

On the other hand, since other types of sites have also been delisted en masse, clearly Google didn't just single out DMOZ clones.

So, roll up your sleeves, get to work, and do some research. Post some URLs.

Do something other than get all riled up just because I'm willing to ask hard questions and require evidence.

wow..

What happened here? Been away for a day or so and the thread exploded...

I just want to add that I only mentioned Marcia because I remembered that she had pointed me to a paper about discovering certain types of duplicates. Marcia wasn't the author of the paper, it wasn't her theories. All she did was make me aware of it. I'm not even sure if she thought any of it was being applied anywhere, but lots0 gave a few examples above that showed that this could be done, I believe copyscape is another example.

Only thread I found was this one from February 2005 which mentions Monika Henzinger (Head of Research @ Google) mentioning to a German newspaper that they would throw some effort at duplicate detection. In this thread, vitaplease mentions this thread from January 2004, which is most likely dealing with the exact same paper that Marcia pointed me to on another occasion.

By the way, the "paper" seems to be a patent application by Google, Inc. and Monika Henzinger is one of two inventors.

---
Anyway, you all seem to agree with my first post that it's not directories as such, but something else that apparently dragged some down.

I'm not very sure about what happened myself and while the above might sound as a defense for the dupe theory it wasn't really intended as such. Actually I haven't read all the threads, so I'm not familiar with the theories but duplicate detection seems like the only intuitive way of discovering scraped content.

So, if scrapers were the focus of this, duplicate filters would be my personal primary choice of tool. Feel free to disagree and all, and of course: YMMV, AFAIK, FWIW...

Marcia was just an example, like Phil

Claus wrote:

Quote:
I just want to add that I only mentioned Marcia because I remembered that she had pointed me to a paper about discovering certain types of duplicates. Marcia wasn't the author of the paper, it wasn't her theories. All she did was make me aware of it. I'm not even sure if she thought any of it was being applied anywhere, but lots0 gave a few examples above that showed that this could be done, I believe copyscape is another example.

The discussion has drifted off topic, but my point in naming names is that some of today's popular icons of "conventional SEO wisdom" (as someone put it to me a few months ago, or something like that) are people who, in fact, don't do a very good job of evaluating or interpreting or conveying the available information out there.

To be blunt, I run into these argumentative positions all the time, where people who spend a great deal of time in the most popular SEO forums don't appreciate it when someone doesn't toe the line. If all you guys want is to see everyone parrot the nonsense that people like Marcia, Phil, and others spew at a constant rate, then I suggest you NOT followup to anything I post.

That ain't gonna happen with me, I assure you. (Marcia is, btw, the person who asserted in a most authoritative fashion that "Yes, it IS all about links" in a discussion I participated in.)

As far as your report of Marcia's comment about a paper, I understood clearly what you were referring to. As best I can determine, her comments are in a private forum on Webmasterworld. She was proposing a theory (and she labeled it a theory) and made reference to the paper. Maybe you had something else in mind.

Quote:
I'm not very sure about what happened myself and while the above might sound as a defense for the dupe theory it wasn't really intended as such. Actually I haven't read all the threads, so I'm not familiar with the theories but duplicate detection seems like the only intuitive way of discovering scraped content.

No one outside of Google knows what happened. But this discussion got derailed because once again I saw people drumming up an unsupported hypothesis as an authoritative explanation for the phenomenon.

Most of the SEO forums have yet to acknowledge that a major update rolled out last week. But they all have threads where people are reporting the same kind of changes: their sites dropped out of the index for no apparent reason.

Taking these people at face value, I would say that Google accidentally zinked a few thousand legitimate Webmasters. They've done worse in the past, so I'm sure most of these site operators will eventually get something to come back into the rankings. It may take months for that to happen, and I feel for them.

Those of my pages which were replaced by SpamAd content were down in the rankings from about March until last week. I've learned to live with these ups-and-downs. And, yes, I had plenty of content that continued to rank well. Diversification pays off in the long run, but not everyone is in a position to diversify like me.

Anyway, all we have are hypotheses, and I have not presented mine as if they were any more than that. But I have, at least, made a very dedicated effort over the past few days to find out information about what has happened.

I seem to be the only person who has actually gone to anything like that kind of trouble -- at least among those of us discussing the issue.

'Nuff said, unless Phil wants to answer my questions.

name 10 directories that have not been affected

Whether any of those directories have been affected or not isn't the issue. The issue is, does this update target directories. The clear and obvious answer is NO, it does not, because many directories remain unaffected

Michael, name 10 directories that have not suffered at all from the last update, oh and for the record I dont mean a review type site what does not accept directory submissions because in my neck of the woods that is not, repeat NOT a directory...

......rolls onto balls of feet and curls hands, and.......

Quote:
The problem here, gentlemen, is that I'm not looking for a fight. I'm just pointing out unpopular facts, and several of you are rolling onto the balls of your feet and curling your hands into fists.

Look Michael, enough is enough.

Let me make it as clear as I can.

I disagree with your logic - not your right to post nonsense (OK, thats a little irksome, but no more than that).

I'd back my ability to understand stats, probability, scientific method and logic against those of the best here.

What your are doing is 'rhetoric', not logic.

There are a bunch of people here who disagree with your logic - learn to deal with it gracefully old chap.

Thanks everyone, great

Thanks everyone, great thread!

Now i think we can move on :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.