Lockergnome & Co Scraping Wikipedia for Profit - Wrong?

12 comments
Story Text:

I really value wikipedia as a resource, both for the wealth of information and the fact it isn't covered with adverts. Whilst googling recently I stumbled across the following URL encyclopedia.lockergnome.com

According to google this subdomain has over 10k pages indexed.

Now all the content there is directly from wikipedia, ok so they don't try and disguise the fact as the articles do say at the top "From Wikipedia, the free encyclopedia." but I notice this text isn't hyperlinked back.

Compare some pages:
http://encyclopedia.lockergnome.com/s/b/Entertainment
http://en.wikipedia.org/wiki/Entertainment

The only substantial difference I can see between these two is the inclusion of advertising, namely two google adsense blocks (and some adlinks on the left hand navigation)

A quick google returned this:

"I've seen this technique used by both Chris Pirillo (http://encyclopedia.lockergnome.com/) and Joel Comm (http://encyclopedia.worldvillage.com/). It's interesting, but I don't see much point. From a user standpoint, I would much rather use the genuine article rather than a copy which is almost certainly out-of-date the day after it's published. As a website publisher, I'd rather not publish something that I myself wouldn't want to use."

The second site mentioned http://encyclopedia.worldvillage.com has over 400,000 pages indexed in Google.

Now from what I gather this is ok according to the wiki terms (or maybe not?) I'm not contesting legalities, more asking the question "Is this really ethically right"? Joel Comm runs an adsense ebook ( http://www.adsense-secrets.com/ ) and apparently earned $25,000 in April 2005 (screenshot). I wonder what proportion of this came from a wiki scraper?

People put a stack of time contributing freely to wikipedia, I don't see a problem with mirroring it but inserting adverts to benefit your own pocket? Part of the motivation to contribute to wikipedia is that it will be a free resource that should rank well on the search engines, and to save people the hassle of having to wade through lots of affiliate sites.

I found a similar comment on a Waxy thread from a user who obviously feels the same:

"I think the same thing as andy and leonard here everytime I see a wikipedia mirror covered in Google ads.

Sure it's legal (The FDL allows people to make commercial copies available), but it's entirely lame. You're just trying to eek out a few bucks off content that you didn't write, in order to trick people searching for real information into finding yours.

It's pretty simple. Are you trying to make money by tricking random people into finding your site? If so, you're not helping make the web a better place."

This comment sums it up nicely. Should / is this actually allowed?, should 'famous' bloggers be setting a better example? What's your opinion?

Comments

I think the Waxy comments a

I think the Waxy comments a good one. It's not as if they're hiding this, or they're doing anything illegal, or really trying to do anything abhorrent, but it is a shitty thing to do.

Particularly in light of who they are. If it was joe schmo spammer, i'd not have any issue. But high profile public figures should have more sense.

I'll say it before someone else does

Quote:
You're just trying to eek out a few bucks off content that you didn't write, in order to trick people searching for real information into finding yours.

It's pretty simple. Are you trying to make money by tricking random people into finding your site? If so, you're not helping make the web a better place."

Well that's not limited to wikipedia scrapers is it? I tend towards feeling that sites which do this are a waste of time (mine) and space (on the www), but the very fact that they do make profits must mean a lot of people don't agree with me.

I don't see the

I don't see the problem.

Wikipedia released everything under the GNU Documentation License. Take them at their word that they allow copying under the terms of the license. That is all Lockernome is doing. Just like there are DMOZ clones there are Wikipedia clones - they help spread the fame of Wikipedia far and wide.

How many book companies

How many book companies publish "The Complete Works of William Shakespeare" every year? Of course if you were to republish them with interpretations, notes or otherwise "add value" along the way, that would be a good thing.

If you had 400,000 pages and each of them only got one click a month with the minimum payout of $0.03 that would be $12,000 a month.

Oof, here comes that word

Oof, here comes that word "ethics" again.
I'd agree with Nick, it is pretty shitty, but is part of the reason people moan about it because someone else is making money from content they didn't write?

How many of us would happily add Wikipedia to our sites if we could integrate it and earn money from advertising? There's a shed load of great content, you can legally use.

How many people display RSS feeds on their site? I've got BCC Sports Headlines feeds displaying on a site I run. May not result in a huge number of extra hits, but it gets visitors from people searching for phrases picked up from those feeds. Is that "unethical" too?

Wwikipedia clones have been

Wikipedia clones have been around years and were trading at a height about a year and a half ago as almost turnkey sites for anything between $300 and $900 a piece if you could not be bothered to roll your own.

Many clones fell out of the serps for duplicate content related issues.. Those attached to higher profile sites with good link pop, and those that contain "evolved" text appear to have prospered :)

What are they good for? Wikipedia clones make great MFA (Made For Adsense) sites and these days also serve as great MFDP (Made For DigitalPoint) sites... if you can get the pages indexed ;)

"should 'famous' bloggers be setting a better example?"
I don't see why they should. I don't even read the guy's blog, so even if he was "setting a better example" I wouldnt be any the wiser and he certainly would not be any richer. Your question seems to imply that using wikipedia as Made For Adsense content is setting a bad example? Setting a bad example to who? Those people who are going to spam are going to spam regardless of what some blogger does.

Ethics? What is Ethics? Societies lame attempt at self regulation... I'll leave that discussion for you guys :)

License

Brad hit it on the head, the GNU Public License allows for this, so ethics are irrelevant here in my book because we are not talking about content theft but a legal re-use of content compliant with a license: good on Pirillo and the other guy for creative use of content done in a legal way.

Ethical Analysis requires looking at the actors

Ethical Analysis requires looking at the actors.

The actors in this are wikipedia, the public, and the republisher.

Wikipedia:
Wiki doesn't get harmed by them making money via the site. They explicitly permit republishing so there is no harm from dilution of content. There is some minor good done for them as their name gets further pushed in front of others.

The Public:
The public finds the information they want so that is good. There is no harm done to them in any way that I can perceive.

The RePublisher:
The republisher makes money so that is good. The republisher adds more content and value to her/his site and that is good. The down side is the publisher may get dropped out of the serps via duplicate content, that is a risk and niether good nor bad.

So at the end of it there is no harm done to any party that I can see. Given the lack of harm and benifit to most parties I don't see how you could consider it unethical.

If wishes were horses,

If wishes were horses, beggars would ride. ;)

>>If you had 400,000 pages and each of them only got one click a month with the minimum payout of $0.03 that would be $12,000 a month.

There are few problems here..
1. Getting a wiki clone script.
2. Getting 400K pages indexed, you need to have a huge link popularity.
3. And if you are getting that much pages indexed, you are surely going to fall out of the SERPs, for duplicate content.
4. Getting even a single visitor on each page each month won`t be easy.

I do agree with Brad if GNU Public License allows for this, then where is the question of ethics.

I think wiki idea has outlived its utility, now people are going further with their endeavours to milk the Adsense, Yahoo Publishers, or COOP and similar link networks.

PsychCentral.com scrapes it, too

What used to be a good mental health site, PsychCentral.com, with a lot of great content, has stooped to the depths of scraping the Wikipedia as well. It's not just for the little bios of different people from psychotherapy, which they advertise on their front page, but for the whole encyclopedia. Their strategy of diluting previously focused content with a little something about everything seems to have worked: you can now find people linking to the site for all kinds of ridiculous, off-topic subjects that have nothing to do with mental health, and their Alexa traffic shot up around the time they started scraping.

On the other hand, their PR also DROPPED from 7 to 6 around the same time. Coincidence? Here's hoping they drop even more as payback for exploiting their visitors by drowning them in cheap copies of content made by someone else.

mauvy

Like it or not, the internet is the freest market in the world

and if the market says that it is economically viable to pull free content and monetise it, that's what people will do. You can generate a 2 million (ish) page site with a free template and a single line of SSI that pastes the Spamazon feed in.

If you add some value (content wrapper), get some links in, why shouldn't you make money from it?

Google will shortly penalize these "clones"

I read this on webpronews

Several copies of the same content in a search engine does not really do any good and so Google apparently decided to weed out some of this duplicate content to be able to deliver cleaner and better search results.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.