Google Duplicate Duplicate Content

25 comments

Joeychgo offers an in depth post questioning if duplicate content really matters. He uses Vbulletin as his example.

Comments

Interesting post - but the

Interesting post - but the conclusions are totally wrong in my experience. Duplicate content can very well become a problem and the last thing on earth you want to do is let the engines decide what to do about it. In my experience, the engines far too often make the wrong decision.

This is not at all about "penalty" - whoever said that? It is about making a good an relevant index - and you just don't do that by filling it up with duplicate content. None of the engies currently do perfect duplicate filtering but they do get better and one day you may loose everything because you let the engines decide how to deal with your crap. Thats not a good long-term strategy!

Another problem is link structures. If there are multiple URL's to the same content some people will link to one version and other people to another version.

Been working fine for me for

Been working fine for me for 3 years. My oldest forum has not had any of the stock content removed, has top rankings for very tough keyword pharases and hundreds of thousands of pages indexed.

How long term are talking? After 3 years, im pretty secure. :)

So, joey

Do you want us to take a good deep look at your forum? *giggle* (just kidding)

Or would it be, say, an "acceptable statement" that one case can't really be taken as evidence of anything when we are dealing with a database of 8 billion pages? Or, say, that a forum does not share a lot of characteristics with your average shopping cart software, online pharmacy, or corporate "informational" site? Perhaps even that a forum is a special kind of website in terms of interlinking, structure, and neighbourhood patterns altogether?

Or, in other words - do you have proof that your case

(a) Is exactly as you think it is?
(b) Looks like any other case than yours?

Please don't think this is "an attack" on you. It's just that people on boards and everywhere post a lot of these "my site does so-and-so" and then they think all sites do "so-and-so" - after a while you tend to get a bit cynic about it.

That said, you have some good points about the special case "BBS software" (to some degree even vBulletin specific), but I'm not sure they're exactly as you think they are. In other words, how you phrase your findings is very important here. As an example you can't say that you are examining duplicate pages, if they're not even duplicates, say "The standard forum pages are much different then the archive pages."

You *almost* hit the nail on the head with this one:

duplicate content is not really duplicate content at all, at least as far as the search engines see it.

Now, the case for/against duplicate content will always revolve around the latter case "as SEs see it" - not around the perception of the individual webmaster.

In order to examine the phenomenon of "duplicate pages" you must examine pages that are very similar, of course. You can't say that because two pages that don't have much in common are treated unlike duplicates that duplicates is no concern. That's like saying that bird flu doesn't exist because fish don't get it.

I could go on nitpicking, but essentially you are describing an isolated case and you are not always identifying the things you think you are identifying. IOW, in some cases you may be right that you "get wet" but it's not "rain". So, you're 100% right in some of your findings but your thoughts about how these findings come about aren't always right.

Bonus tip:
As for your archive pages, try comparing how much link power they've got, and how frequent they change -- as compared to your individual threads -- and you will know why those are indexed faster. It's not page size concerns.

right claus

nice post joeychgo, but claus is correct. just because it's working for you doesn't mean it will work for everyone, or even continue working for you. I worry that you post could be setting some up for a hard fall.

The article was written for vBulletin forum owners specifically

The article I wrote was in response to many questions in the vBulletin community about if the vBulletin archive would cause a site to be penalized for duplicate content. Some are advocating turing off the archive to avoid this. The title is "Duplicate Content and vBulletin Forums" - Please look at the article in that context, and my arguments may make a little more sense.

My article showed specifically, that Google routinely indexes both the archive and main forum threads. Again, this is pertaining to the vBulletin community. I also use other sites, not my own, to demonstrate this. I also show several examples from different sites where threads and the corresponding archive pages are both indexed and appear in the google Index, and in come cases are well ranked.

I am not saying, in my article, that there is not duplicate content issues in general, I am only talking about vBulletin forums specifically, and even more specifically, I am referring to the vBulletin forum software archive vs a normal vBullletin forum thread.

I would ask you read the article again, and look at it as it applies to a vBulletin forum specifically. vBulletin FAQ

It appears that at the

It appears that at the present time, there is no reason to concern yourself with duplicate content as far as how vBulletin is constructed and presents information and content.

It's not at all uncommon for Google to prefer to the archive pages in the SERPS - which unless you've paid attention to modifying your static archive, means that visitors are directed to a lifeless, functionless page, that they can't really do much with, and doesn't entice participation.

Certainly there's no direct "penalty" - but duplicate content issues in vbulletin *are* a problem for vb owners. Whether it's a problem they wish to address or leave as be, is entirely up to the forum admin.

Certainly there's no direct

Quote:
Certainly there's no direct "penalty" - but duplicate content issues in vbulletin *are* a problem for vb owners. Whether it's a problem they wish to address or leave as be, is entirely up to the forum admin.

Explain please.... What are the problems you see?

brian pointed out one

brian pointed out one problem, which i agree with:

It's not at all uncommon for Google to prefer to the archive pages in the SERPS - which unless you've paid attention to modifying your static archive, means that visitors are directed to a lifeless, functionless page, that they can't really do much with, and doesn't entice participation.

Not really, Brian pointed

Not really, Brian pointed out that you also need to do something to make the archive more attractive, which is an article I am working on.

Quote:
which unless you've paid attention to modifying your static archive, means that visitors are directed to a lifeless, functionless page, that they can't really do much with, and doesn't entice participation.

But what he pointed out is not a problem in regards to duplicate content.

But what he pointed out is

But what he pointed out is not a problem in regards to duplicate content.

Yes it is a problem, though - and while I may have pointed out the archive issue, let's point out just a few of the duplicate pages you can get listed in SERPs for *the same page content* of your main thread page:

- showthread (normal)
- archive
- printthread
- &mode=threaded
- &mode=linear
- mode=hybrid&t=
- &mode=threaded
- &goto=nextoldest
- &goto=nextnewest
- goto=newpost&t=
- goto=lastpost&t=

Let me underline it: vbulletin does have a problem with duplicating content.

Now, while you can sit back and allow the search engines to try and work out which is the most important page, the fact remains that Google especially can have real problems with it and end up listing your less favourable URLs. I've seen this happen especially with the BigDaddy migrations.

Added to that, the way the threaded options are set up means that even if you disable the DHTML menu for that option in the vb admin panel, (until very recently at least) these options were still displayed to non-Javascripted enabled browsers - such as Googlebot.

Overall, vbulletin has bad issues with duplicated content, and this duplicating of content can cause problems because at the end of the day, you want visitors clicking through from the SERPs to find your main thread directly, instead of being shunted off into a different page format for the same content.

Simply redesigning the archive may go some way to addressing this - but in doing so you are simply dressing up duplicated content to try and not look like duplicated content while you are doing so - which is sort of missing the point.

And so far as I'm aware, you can't use a different template for the last/next/previous/threaded view posts because they are all runing from the main template.

I'm not trying to knock you, joeychicago - but I am trying to point out that since the first vb3 beta release, duplicated content has been an issue and continues to be an issue, and this doesn't seem to be something you're taking into consideration.

No, there isn't a direct penalty, and for most webmasters - they couldn't care less - especially as a vb forum bulked up with duplicate content it can stroke the webmaster's ego on the site:command to see tens of thousands of pages indexed.

But the duplicate content issue isn't an optimal position for any webmaster who wants to get more aggressive on SEO/traffic targeting.

Wordpress Duplicate Content

This is something very similar to a potential problem I considered.
On my wordpress blog I want to use the 'read rest of entry feature' in my posts.

What concerned me though is that I will end up with 2 urls

www.mysite.com/latest-post
www.mystite.com/latest-post#more-21

Both will have exactly the same content, surely a dup' issue.
Any feedback or opinions would be greatly appreciated

#blah is an internal

#blah is an internal anchor... that really shouldn't trigger a dup content penalty...for example,
http://www.threadwatch.org/node/6076#comment-36976
http://www.threadwatch.org/node/6076#comment-36983
http://www.threadwatch.org/node/6076
all the same.
no biggie.

Exactly.

Exactly.

Wordpress Duplicate Content

Hi Guys
thanks for the input, reassurance and replies to my first post over here. My concern was based on the fact that the same content could be accessed on two URLs independent of each other.
It's all part of being Google paranoid, I remember thinking that
http://mysite.com was just the same as http://www.mysite.com !

When the site first entered the index last week, the first page was exactly the same as the March page they indexed !
Then they indexed, and ranked, the feed which was the same again.
It all seems like a potential nightmare
Regards from England

People tend to forget -

People tend to forget - Create content for users, not search bots. With all the vBulletin forums and wordpress blogs out there, as long as your not making the scripts do something unusual that creates duplicate content, you should be fine. Google is smarter then people tend to think sometimes. MSN, im not so sure, but Google is. :)

While I iunderstand what

While I iunderstand what your saying, re-read the article I wrote. The article provides evidence that much of what you say, while logical, doesnt appear to actually be a problem.

See, people have been talking about the 'Duplicate Content' problem in vBulletin, but dont provide any proof. So I sought to disprove the theory, and provide some evidence in doing so.

Indeed, I read it - you

Indeed, I read it - you focussed on that fact that people were worried that duplicated content within vbulletin may flag a penalty - and you're right, there is no outright penalty of sites being delisted for duplication by vbulletin.

However, what I took you to task over is this statement in your conclusion:

there is no reason to concern yourself with duplicate content

As I've tried to illustrate, duplicate content in vbulletin can and does cause problems.

Mikkel's point is very well made - it really isn't wise to hope the search engines will figure out the most important page - because their algorithmic opinion won't necessarily agree with yours, and often doesn't.

Additionally, the frequent listing of archive, printthread, and various nasty duplicated URLs, can especially fail to invite users to join the forum if they find them via the SERPs.

There's nothing you've said that disapproves any of what I'm saying - I've been administrating vb3's since the first beta releases in 2003, and currently run 8 vb3 licences. From the experience of monitoring their positioning in SERPS, duplicated content with vbulletin *can* be a problem.

It depends upon the webmaster's priorities, though - as a SEO I have to look at how to aggressively deal with such issues, but that doesn't mean to say any and every vb admin should take on board SEO issues. Frankly, I'd prefer them not to. ;)

If your going to quote me,

If your going to quote me, quote me completely so the quote is in the proper context.

Quote:
It appears that at the present time, there is no reason to concern yourself with duplicate content as far as how vBulletin is constructed and presents information and content.

Mikkel's point might be logical, but many people have turned off their archive on the advice of others, and LOST rankings, only to eventually turn them back on. I have been administering vBulletin forums for just as long as you, and own just as many licenses as you. My forums routinely rank page 1 for HARD keywords with no problem. I have noticed no problem with duplicate content.

As I demonstrate in my article, Google will index both the original thread and the archive of that thread. The "My Wish" thread I outline demonstrates this. Both pages even carry Pagerank. Thats what you call evidence. It's only one example. Do some research and disprove my theory - I honestly would be very interested to see some actual evidence instead of assumption and conjecture. I'm not trying to sound harsh, but while the arguments seem logical, I see actual evidence that doesnt fit the logic.

> Mikkel's point might be

> Mikkel's point might be logical

You got it totally wrong if you think my point is only based on logic - it is based on years of experience with a high number of different dysnmic web publishing systems.

Duplicate content remains to be a problem, that is a fact. If you don't want to deal with it, thats your choice. Brian have pointed out very good reasons for this being a problem even within vBulletin. Leaving the search engines to prioritize your pages is just not very smart SEO. You can do that if you like, and probably do OK, but the smarter SEO will outrank you any day with a more well focused strategy. Thats is the facts - like it or not :)

Mikkel Any thoughts on Wordpress

I heard you on strikepoint talking about your wordpress testing. Just wondered what you though of my concern about duplicate content mentioned in the above posts?
i.e
www.mysite.com/latest-post
www.mystite.com/latest-post#more-21

I would hope that Google is aware and has this in hand but I am loathe to put my faith in something that screams dup content
Regards fom England

are you reading the thread

are you reading the thread phantombookman?
http://www.threadwatch.org/node/6076#comment-36985

Yes I am indeed reading

the thread and enjoying the site as a whole. I was appreciative of the reply and indeed said so in the 2nd post after your reply.

that really shouldn't trigger a dup content penalty

I agree entirely with what you said above but whilst it shouldn't doesn't mean it might, or may not in the future. It seems a reasonable concern when I see 2 pages in G's index with exactly the same content but a different URL
All the best from England

WP

Internal anchors shouldn't trigger dupe filters, but category pages, alternative archive navigation and such can do.

wtf

Is this the OUT webmasters board?

Oehlala

dude that was wrong, play nice or don't play at all.

DaveN

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.