Wikipedia Guilty of Plagiarism

16 comments

While doing a bit of research Daniel Brandt discovered that some of the articles in Wikipedia copied directly from other websites via Yahoo News

Daniel Brandt found the examples of suspected plagiarism at
Wikipedia using a program he created to run a few sentences from about 12,000 articles against Google Inc.'s search engine. He removed matches in which another site appeared to be copying from Wikipedia, rather than the other way around, and examples in which material is in the public domain and was properly attributed.

Brandt ended with a list of 142 articles, which he brought to Wikipedia's attention.

The site's founder, Jimmy Wales, acknowledged that plagiarized passages do occasionally slip in but he dismissed Brandt's findings as exaggerated.

Does this expose an Achilles heel for wikipedia? Could some less that ethical people intentionally flood Wikipedia with copyrighted material and then file numerous DMCA violations ...

Comments

Is this a surprise?

I can't believe this is surprising. Over the last few years there have been many published studies showing how high school and college papers are plagiarized in record numbers. Why wouldn't this dishonesty extend to other areas of people's lives?

Daniel Brandt has a virulent anti-Google, anti-Wikipedia agenda

Daniel Brandt has a virulent anti-Google, anti-Wikipedia agenda. I wouldn't take anything he publishes as fact. Check him out yourself, and read what Aaron has written here.

Taking things people say as "fact"

Jonathan Hochman wrote:

Daniel Brandt has a virulent anti-Google, anti-Wikipedia agenda. I wouldn't take anything he publishes as fact....

Sounds like you have an anti-Daniel Brandt agenda, Jonathan, so why should anyone agree with your words of caution? Yes, I have read Aaron's blog about Daniel -- more than once, as many paths lead back to that post.

But considering that Wikipedia editors confirmed many of his findings of plagiarism, while correcting several errors of judgement on his part, he is pretty much in the clear on this issue, and it's unfair to Threadwatch's new readers to continue an obvious poison pen campaign.

Nothing new

This is nothing new to me. Wikipedia "Contributors" repeatedly copied content from my sites, and I stopped counting those articles that were minimally reworded. Of course without giving a link or acknowledge. I therefore appreciate every attempt of suing this "encyclopedia", which is in reality a heap of stolen stuff manually put in the top 10, and sometimes on #1, by our beloves search engine Google.

vindication of wikipedia

Although Brandt obviously had other motivations, this experiment in exposing plagiarism on Wikipedia is exactly what the free encyclopedia is all about. It's clear that as time passes this stuff gets weeded out by contributors. I acknowledge that it's not 100 per cent satisfactory that this stuff doesn't get caught immediately but as the number of contributors continues to grow it will get caught sooner and sooner.

Quote:
Of course without giving a link or acknowledge.

Wikipedia are tightening up on proper citation of source articles these days.

And bull, just remember that although some misguided contributors may be copying your material remember that they're not doing it for money but for the benefit of humanity (or their standing in the wikipedia community with humanity as a side-effect). Isn't imitation the sincerest form of flattery? Why single out Wikipedia (which you can ask for the material to be taken down from) rather than the thousands of scraper sites out there to which you have no recourse?

Quote:
Does this expose an Achilles heel for wikipedia?

It only exposes the same Achilles heel that's always been known, i.e. that unscrupulous people can change articles in whatever way they like. It, like vandalism, can be coped with through positive contributions (like Brandt's study) from the mass of users. (It is harder to spot than vandalism, granted.)

pretty easy to automate a check for

We make our content writers put their copy into a panel which checks for indexed duplicate content. If it's dupe or nearly dupe we don't allow it.

Doesn't seem very difficult to eliminate to me.

Aha. I hope, mm1220, that

Aha. I hope, mm1220, that you're not seriously suggesting that anyone's written works may be copied if it's for the benefit of humanity, for anyone's standing, or for free, without attribution and the permission of the copyright owner.

That would seem to fly in the face of many countries' copyright laws which govern the right to copy.

Or was it tongue in cheek? :)

benefit of humanity

Of course I'm not condoning copying or justifying it in any way. All I'm saying is that the efforts of some misguided Wikipedians are not as bad as the activities of scraper sites on a moral basis although both may be equally culpable under law.

I suppose I was trying to push bull away from this mindset:

Quote:
I therefore appreciate every attempt of suing this "encyclopedia"

If their mistakes aren't malicious then maybe suing them is not the best way to get them to stop or reform their ways.

Hehehe...

...maybe they're just retaliating cuz people are scraping THEM all the time.

(oh no, did I really say that out loud?)

oh yeah

copy your sites up to wikipedia and then sue them for copyright infringement, that will teach them to steal your site...

When did logic stop getting taught in school?

Here's the link to the study

http://www.wikipedia-watch.org/psamples.html

Threadwatch users in particular should be interested in the massive level of noise that I had to sift through in order to find the examples of plagiarism. You can blame Google's AdSense for that. Ask Aaron -- he knows all about this.

What happens if ...

Wikipedia plagiarises something from person X. Person Y uses the Wikipedia content on his own site as per the GNU Free Documentation License. Can person X sue person Y for copyright infringement?

Would you rather live in a world without WikiPedia?

I have to say, I honestly cannot understand the argument that companies like WikiMedia, Google, or YouTube should be held responsible for the dissemination of information.

Every item contributed to WikiPedia is done so by an essentially anonymous volunteer (excluding that many of the site's contributors consider their work on the site as part of their personal life, and decide to let people know about their real identities). While the WikiMedia Foundation is certainly responsible for connecting a community of people who would like to make all public human knowledge accessible to all mankind, they're hardly liable for the way that information is published- the responsibility to represent the information legally falls to the author. If anything, they're responsible for ascertaining the legitimacy of their content and remedying any infractions of the site's policy they find with due haste. Which is what they're already doing.

So I don't agree that just because someone publishes a stolen article on WikiPedia, that the site is to blame. When people used free GeoCities accounts five years ago and used them to host illegal Warez or plans to make home explosives, should GeoCities have been held responsible for offering free space on the web? Or perhaps we should hold the Internet responsible for allowing people to communicate potentially dangerous information to each other?

As long as the Foundation makes an effort to control publication by the unsavory types- which they do quite well, for the most part- they don't deserve to be held liable. Having written some content for the site, I can tell you that the rules for allowing content to stay there (even if it is able to show up on the Internet for a short period of time) are quite extensive and stringent. In the long run, WikiPedia is accurate about a great deal of subjects, even if it does have a few errancies- and in the long run, all of the content that stays on the site is legitimately owned by WikiMedia, or properly licensed.

And for what it's worth, jehochman is right... Brandt is one of the most technophobic (or maybe dataphobic is a better word) people I've ever seen online. He persecutes Google for providing services like Personalized Search, and it doesn't surprise me to see him persecuting WikiMedia. What does he stand to gain by writing a program to compare WikiPedia's content against that from other sites on the 'net? On a personal level, nothing- though we can all appreciate that WikiPedia will be even more legit when all's said and done, since they're using his findings to clean the infractions up- but if you ask me, his constant battle against all of these major information juggernauts boils down to nothing more than your standard, garden-variety sour grapes.

Mike McD, as much as I agree

Mike McD, as much as I agree with your conclusions about Brandt I think that the rest of what you say is wrong. The thrust of it is the old argument that WikiPedia cannot be to blame because some of its users are more blameworthy (I may have shot him in the shoulder but Bert's guilty, not I, because he shot him in the head). If something is reasonably foreseeable then why should you be allowed to say I'm not to blame because somebody else is more guilty than I? It's a reasonable assertion to think that Wikipedia would have plagiarised articles in it's pages given the way it has been set up and it's reasonable, therefore, to expect them to do more about it.

> in the long run, all of the content that stays on the site is legitimately owned by WikiMedia, or properly licensed.

I think the problem with this might be best explained thus: would you be happy for me to use your copyrighted content for my own purposes as long as in the long run I take it off?

I would be inclined to agree that Daniel has an anti-wikipedia profile, I certainly mistrust much of what he says, I think it's often exaggerated. However, that doesn't preclude him from having a good point and I think in this case he does.

The Wikipedia/Geocities comparison interests me. As a user placing content on Geocities I would have had control of the content to such a degree that it is clearly mine and I retain rights in it. In this respect I am a consumer consuming a service. There is an obvious and clear two way transfer of benefits. If I place content on Wikipedia I am restricted in what I do and it becomes Wikipedia's content and rights. The element of control has shifted so much that perhaps it is not possible to say that I am consuming a service but in fact acting on Wikipedia's behalf as an agent of theirs. Just thinking aloud on that one though.

who's fault?

First of all, Mike - are you a Rounders fan? :)

Secondly.

"I have to say, I honestly cannot understand the argument that companies like WikiMedia, Google, or YouTube should be held responsible for the dissemination of information." - Mike McD

Unfortunately, who else is responsible. It almost reminds me of the file sharing argument. Should Napster have been sued because there was copyrighted music on there, when they didnt put it there? Should a business be sued when it was only a single employee that committed an action that was the basis for a claim?

In an age of blameism, hey, if the info is on your website then you should be held responsible.

Personally, i could give a rats ass and unless there was some top secret information on wikipedia or full chapters of books, i dont see the point on lawsuits to remove.

RE:Wikipedia Guilty of Plagiarism

Laziness of writers can be the commonest cause of plagiarism, as they do not want to spend much time in doing research work. However, they must be aware that it takes no time to check for duplicate content, as there are varieties of plagiarism checker tools available in the market.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.