Internet Archive Sued over Copyright

Source Title:
Keeper of Expired Web Pages Is Sued Because Archive Was Used in Another Suit
Story Text:

This could set the cat amongst the pidgeons. The Internet Archive is being sued for copyright infringement

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.


those lawyers have no credibility IMHO

they sound sorta like tools to me. this is a great article quote as well:

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

It's always been a matter of

It's always been a matter of time before the DMCA was tested in this way. Definitely an interesting case to watch.

The Complaint

Here's a link to the complaint (pdf) in the case. There are a number of alternative theories of liability and harm set up there for the Court to consider, and a few of them might have implications for other sites that cache files in the manner that the Internet Archives does, if the Court agrees with them.

Some interesting commentary on the subject at Corante: Opening Up the Wayback Can of Worms.

Well, it was good while it lasted

The Internet Archive has proven to be a life-saver on many occasions. If they are forced to shut down, we'll lose one of the most important resources on the Internet.

For those that are

For those that are interested, Gary Price tracked down the complaint pdf

I don' think they ahve a leg to stand on..

There has *always* been a way to disallow the internet archive from archiving. So what's the problem?


What is the difference between wayback and public libraries that carry all magazines and newspapers etc.? Permanent records are kept on file for public use. Publications change, but we can always look at their shady past.

I'm with Michael Martinez

The Internet Archive has been a good source of third party proof about who really wrote what when. A nice thing when someone copies your site.

Copyright infringement

"What is the difference between wayback and public libraries that carry all magazines and newspapers etc.?"

Maybe that libraries pay the publishers for magazines and newspapers (or atleast have permission to have them for free). Just because the archive is 'useful' doesn't mean it is not copyright infringement. I think parts of this complaint make sense...yes, the robots.txt file is voluntary, but that is a convienient way to tell web crawlers NOT to cache things.

In the US, it is the legal responsibility of the (alleged) infringer to make sure that they are not infringing. It is not the original copyright holder's job to individually notify infringers. In this case, the robots.txt file informed the Archive that they did not have permission to cache certain pages. It was their job as a service that makes copies of things to make sure they were not infringing. Sure it may be voluntary to follow robots.txt, but that just means if you ignore it you are voluntarily infringing someone else's copyright.

Library of Congress

Maybe that libraries pay the publishers for magazines and newspapers (or at least have permission to have them for free).


"All works under copyright protection that are published in the United States are subject to the mandatory deposit provision of the copyright law. This law requires that 2 copies of the best edition of every copyrightable work published in the United States be sent to the Copyright Office within 3 months of publication." --

Also, books are often donated. The publisher may not want the book donated, but I haven't heard of anyone suing the library system.

In my local library 56 people (as of a week ago)are on the waiting list for the new Harry Potter book. Multiply that for the nation and you have many thousands of people who are planning on reading the book for free.

IMO this case is all about money. I know of a site that requested to be removed from wayback and were granted removal. The entire history of the site was purged. That would seem to be easier than a lawsuit unless you have hungry lawyers hanging around...


I didn't read the complaint, but seriously. You don't want your site archived, the procedure for having it removed is spelled out in no uncertain terms on the site. The moment you put that in your robots.txt, all existing backups of the site will be removed. Just like that.

If a serious site have webmasters who do NOT know about the Wayback machine, then said webmasters aren't using due diligence. They don't know what they're doing.

Wayback never actually removes anything

"The moment you put that in your robots.txt, all existing backups of the site will be removed."

Not true. I had a domain blocked for years with an ia_archiver exclusion in robots.txt. Then I sold the domain. The new owner doesn't use robots.txt. All my old pages suddenly appeared at

The crawling for the Archive comes from Alexa, about six months later. You need to do a route-table block on Alexa, because they don't honor any exclusion protocol that I'm aware of. Brewster Kahle founded Alexa, and then sold it to Amazon, but he retains some influence based on the terms of the sale. His Archive is a nonprofit spin-off from Alexa. The funding is complex, but you basically have web services from Alexa going to the Archive, in addition to the crawling after six months.

In my opinion, from looking at the's 990 form, Kahle's nonprofit is a sneaky nonprofit. Good luck tracing the funds.

When you request a page at, it does a real-time check for a robots.txt and provides the page if it doesn't find an exclusion. It also provides the page after a 20-second timeout if it cannot connect. When you think about it, that's just about the only way it could work, because the Archive is in the business of stashing a lot of obsolete pages that will never connect.

It's a privacy violation, in my opinion. I own the domain, but haven't built a site yet.

You can have it purged. had their history wiped out without robots.txt.

Well, a lot of people don't understand the underlying technology

The Internet Archive isn't doing anything that every caching server in the world isn't doing. Caching servers, proxy servers, corporate networks all take content from Web sites and redistribute it.

That said, this goes back to the old "I don't want people downloading pictures from my site" controversy. Since every Web browser in the world does just that, you cannot prevent people from downloading your content. That is the only way that people can browse it.

All Web sites are copied every time someone browses them. The crux of the copyright issue comes down to redistribution, and if this stupid, short-sighted complaint is carried through, it has the potential to bring down a huge portion of the Internet.

That is putting it mildly. Seeing that principle (no redistribution of content without permission) through to the bloody end will result in a huge amount of litigation and panic among corporations that operate extensive networks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.