Internet Archive Sued for Spidering

The Internet Archive is being sued by a woman for archiving her site despite a notice posted on her site (no there was no robots.txt file). She sued for a variety of things - civil theft, RICO, The Colorado Organized Crime Act. These were dismissed though her claim of breach of contract made it through.

There are still a few more steps before this gets to trial, but if the woman wins it could have a drastic affect on web spidering activities.

--------------------------------
An Additional Note From Natasha Robinson
:

The Website in question states,

"IF YOU COPY OR DISTRIBUTE ANYTHING ON THIS WEB SITE, YOU ARE ENTERING INTO A CONTRACT," at the bottom of the main page, and refers readers to a more detailed copyright notice and agreement."

What might the implications be:

If a notice such as Shell's is ultimately construed to represent just such a "meaningful opportunity" to an illiterate computer, the opt-out era on the Net may have to change. Sites that rely on automated content gathering like the Internet Archive, not to mention Google, will have to convince publishers to opt in before indexing or otherwise capturing their content. Either that or they'll have to teach their Web spiders how to read contracts.

See also: Can A Spider Enter Into A Binding Contract?--Internet Archive v. Shell

- Y! MyWeb

It's about time

Time to switch the 'net to an OPT-IN!

Let her pave the way and watch Google shit bricks.

A lack of robots.txt obviously says you don't know shit about the internet or spiders and shouldn't be spidered whatsoever. That would mean she wouldn't get any traffic whatsoever, the IA wouldn't have found her either, this wouldn't be an issue ;)

I'm OK with an OPT-IN web as the flood of shit spidering sites in some cases generates more page views than actual visitors, it's crazy...

P.S. The IA picks up a copy of your site without permission and redisplays it, which is a blatant copyright violation. Call it whatever you want, just because you claim to be an internet archive doesn't give you permission to steal, they are fucked IMO.

P.P.S. Why didn't she just file a DMCA complaint? Stupid.


I like archive.org

I must be just one who likes the Internet Archive's archive.org. It's a great place to find out whether a domain name you're about the purchase has been used for "interesting" purposes, or to show to people who have copied your content.


I agree DianeV

I love IA as well. I've used it for alot of research as well as to save my "Doh, I don't have an old version of this test site on back-up arse" LOL. I actually just found out that their offices are located in the same complex as me; and like a search nerd, I got all excited. However, this in an interesting case and I would love to see where it goes.


There are two points to watch out for

There is a difference between the IA and someone like a search engine. While search engines do cache pages, they don't make an entire copy of a site, at least not in the way that IA does it.

There is also a difference between IA and a lot of scraper sites out there - the IA is a public service with a long history and a very sound purpose.

I would expect this to go to trial, mainly because the question of whether or not a contract was in place should really be placed in a Jury's hands, as there is very little precedence for click-through contracts that I'm aware of.

If anything, perhaps defined and simple opt-out procedures would be put in place. It is ground that needs to be treaded on lightly, though, if any hard rules come out of this trial. The IA has a unique business model and applying similar rules across the board would have serious consequences and generate an untold amount of man-hours or work.


Public Service?

You can wrap yourself in any flag you want to try to make it appear you aren't violating copyrights but if I don't want your 'public service' I shouldn't have to defend myself to get rid of it.

We have opt-out procedures, that's called ROBOTS.TXT.

What we need are OPT-IN procedures so you can control what's being done with your content.

Copyright has never been OPT-OUT, it's always OPT-IN, but the search engines have been trying to claim the opposite for all these years just to cover their ass.


I'm not arguing that, Bill

In fact, I agree that it should be opt-in.


Good Point

You make a good point, but I believe that snippets would fall under the term fair use - although the caching that is done probably doesn't (from an SE perspective). A change to opt-in would bring about havoc for a few weeks, but it would be adopted well by the majority of webmasters. The SE's would have a quality drop instantly, though, as many older un-managed sites that have high quality content wouldn't opt-in in a short term time frame if at all.

I have only heard of a few rare instances when people actually complained about search engine traffic, though. I had a site up for my daughter that got indexed against the robots.txt and against my repeated removal requests. Eventually, I'll put it back up as a secure site with a password, but it's been down for two months because I was getting way too much random traffic. The other kinds of complaints I've seen are sites that get linked to the holiday-ish logo's google does and the utube problem - where sites get bombarded with no notice.

This lady's complaint is about having the entire site archived and not just a page or a snippet showing up elsewhere. It would be interesting to hear the arguments on both sides of the street and to find out if she's just fishing for a buck or if she is honestly fed up with IA. It could be that she couldn't get them to remove her site from the archive. You would think they would have been able to avoid a lawsuit pretty easily with a bit of personal attention.

I wonder where the EFF stands on this one.

With a breach of contract, I'm not sure if she'll have to prove actual damages or if there is a statutory standard or minimum/maximum. I imagine that varies state-to-state. Still, there is the question of the value of the violation, if nothing else. The question of value, or lack thereof may be enough to get this thing snipped at summary judgement.


well

Am I the only one that doesnt see that this is the internet equivelent of the trailer trash that appears on Jerry Springer.

Either:

A the person is suffering from mental healthproblems or
B they are one of the scum similar to the ones that run thease fake YP directory
scams that try to con people out of cash.

A+B is a posibility as well.

IncrediBILL

With robots.txt being around for a decade one should be able to argue that custom and practice measn that that is the way to exclude content.


Sue big fish.

Fish big Sue.

Either way: ew.

People could be sued for repeating a single word from that website. Oh I hope I didn't use any one of her exact words.

Nice effort. Better luck next time.


robots.txt

The robots.txt file should be inclusionary and not exclusionary

I have been arguing this for years.. making it inclusionary would nullify any and all arguments regarding the 'theft' of content and so fourth.

It would also give the search engines a license to use that content to plaster their ads on the serps. This would take out all these "why are you spidering my content' lawsuits... and clean up the mess overall..

Give people a year's head's up on the change over.. so that they could do their robots.txt right..


redefining "opt in"

Why isn't publishing a public website a form of "opt in"?

Lemme see if I can find a suitable analogy... By reading these words you agree that this is a CONTRACT, and by reading these words you agree to be bound by that contract. By reading the next two words these ones you agree to pay me $25 if you do read them. Send all payments to paypal@johnon.com.

Not a good analogy, but this case will just tear apart a small piece of the EULA concept (to be promptly replaced by another, functionally equivalent piece of doubtful contract language no doubt). I don't think it's any better than the class action lawsuit against Worldcom that earned me a free (small) latte from Starbucks, if I bothered to complete the paperwork. BFD.


Great for SEOs that's for sure...

Quote:
... making it inclusionary would nullify any and all arguments regarding the 'theft' of content and so fourth.

Yeah, and would be great for those of us in the know.

Sucks for the poor slobs without a clue though.

It's perfectly fine the way they do it now. People WANT their content indexed, despite what a few crackpots say...


Could be the new submission

Could be the new submission service.

Allow your website to be crawled be 3,000 search engines for only $35


..

I have been blocking IA for years with robots.txt. I never saw any benefit to me for letting IA run rampant through my sites. Besides looking at all those terrible old page designs gives me creative block... I don't want to ever create anything again that looks as bad as some of those old designs of mine.

***

I think Opt in would be good in the long run for Webmasters, but as far as I can see, the SEs are never going to go for it unless they are forced too.

There are no incentives for the SEs to go to a opt-in model and as far as I can see there are many incentives for the SEs to keep things the way they currently are.


Internet Archive v. Shell settled

Press release located here: http://KnowYourCOURTS.com/news.htm