You are viewing a plain text version of this content. The canonical link for it is here.

Posted to legal-discuss@apache.org by Grant Ingersoll <gs...@apache.org> on 2009/11/06 02:06:13 UTC

Redistributing Data

Hi legal-discuss,

I'm involved with a Lucene project called the Open Relevance Project  
(ORP).  As part of our project, we would like to be able to  
redistribute collections of data that we have gathered for use in  
evaluating search.  This could be things like partial crawls of the  
web or other types of data.  Perhaps I'm being paranoid given the  
number of people who crawl the web and "cache" what is otherwise  
copyrighted data, but I would like to get some more insight into what  
it would mean for us (i.e. the ASF) to put up a tarball (or similar)  
containing something like a crawl of the web.

Other parts that might be of interest:
1. We maybe could identify creative commons data and just do that
2. We could simply honor no-cache and robots.txt and basically treat  
everything else as redistributable
3. We could do like Internet Archive does (maybe) and even  
retroactively apply robots.txt
4. We could be in a position to get a donation of a crawled data set  
from a company.  Doesn't really sound like a "software grant", but is  
there anything we should do?

Basically, I want to err on the side of caution all while providing  
our community with valuable data.

Thanks,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

Re: Redistributing Data

Posted by Grant Ingersoll <gs...@apache.org>.

On Nov 8, 2009, at 11:14 PM, Lawrence Rosen wrote:

>> I figure we could just put up a page saying,
>> "If you don't wish to have your content in this archive, please send
>> an email to our user list and it will be removed"
>
> While the statement is probably true, I wouldn't have ASF commit to  
> it. Let
> us leave no illusions that we're administratively prepared to  
> respond to
> requests to remove content from our archives or to monitor the  
> content of
> our archives for infringing content. Only if someone can present a
> compelling case that we've done something wrong will we respond to  
> such
> requests.

Sorry, I think we lost a bit of context in that statement.  The  
archive in question is a specific tarball of a web crawl or other  
dataset that the Open Relevance project would be maintaining, as  
opposed to the general ASF mail archives or other things.  The burden  
for removal is much lower (untar, find offending item and remove.) 

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

RE: Redistributing Data

Posted by Lawrence Rosen <lr...@rosenlaw.com>.

> I figure we could just put up a page saying,
> "If you don't wish to have your content in this archive, please send
> an email to our user list and it will be removed"

While the statement is probably true, I wouldn't have ASF commit to it. Let
us leave no illusions that we're administratively prepared to respond to
requests to remove content from our archives or to monitor the content of
our archives for infringing content. Only if someone can present a
compelling case that we've done something wrong will we respond to such
requests. 

/Larry



> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Sunday, November 08, 2009 7:02 PM
> To: legal-discuss@apache.org
> Subject: Re: Redistributing Data
> 
> > [LR: ] First, we apologize profusely and promise never to do it
> > again. Then
> > we remind everyone that we have no budget to pay damages. Then we
> > cry. Don't
> > you have kids who make mistakes?
> >
> > Please don't ask me to help write/review another "policy" document
> > right
> > now.
> 
> That sounds good to me.  I figure we could just put up a page saying,
> "If you don't wish to have your content in this archive, please send
> an email to our user list and it will be removed"
> 
> -Grant
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

Re: Redistributing Data

Posted by Grant Ingersoll <gs...@apache.org>.

> [LR: ] First, we apologize profusely and promise never to do it  
> again. Then
> we remind everyone that we have no budget to pay damages. Then we  
> cry. Don't
> you have kids who make mistakes?
>
> Please don't ask me to help write/review another "policy" document  
> right
> now.

That sounds good to me.  I figure we could just put up a page saying,  
"If you don't wish to have your content in this archive, please send  
an email to our user list and it will be removed"

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

RE: Redistributing Data

Posted by Lawrence Rosen <lr...@rosenlaw.com>.

Grant, answers below. /Larry


> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Friday, November 06, 2009 10:22 AM
> To: legal-discuss@apache.org
> Subject: Re: Redistributing Data
> 
> 
> On Nov 5, 2009, at 7:04 PM, Lawrence Rosen wrote:
> 
> > Grant, snce it is an *Open* Relevance Project, it would be great to
> > have the
> > sample crawl be for Creative Commons content. That has the advantage
> > of
> > avoiding any copyright licensing issues.
> 
> This is likely where we will start, but it isn't always trivial to
> identify either.

[LR: ] You forgot to ask for a trivial solution to your problem. :-)


> > Google and other crawlers have teams of copyright lawyers to analyze
> > and
> > fight disputes over the re-use of retrieved data under "fair use"
> > guidelines. I'd much prefer if we can avoid such political and legal
> > hot
> > potatoes. It would make my life much easier.
> 
> Yeah, I thought so.
> 
> How do we handle "mistakes"?  That is, are we OK as long as we make
> reasonable efforts upfront and then remove offending content when
> notified by the author?  Anything else we should do to make sure we
> are on the right side of the law here?  Should we publish policy and
> notification steps up front?

[LR: ] First, we apologize profusely and promise never to do it again. Then
we remind everyone that we have no budget to pay damages. Then we cry. Don't
you have kids who make mistakes?

Please don't ask me to help write/review another "policy" document right
now.



---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

Re: Redistributing Data

Posted by Grant Ingersoll <gs...@apache.org>.

On Nov 5, 2009, at 7:04 PM, Lawrence Rosen wrote:

> Grant, snce it is an *Open* Relevance Project, it would be great to  
> have the
> sample crawl be for Creative Commons content. That has the advantage  
> of
> avoiding any copyright licensing issues.

This is likely where we will start, but it isn't always trivial to  
identify either.

>
> Google and other crawlers have teams of copyright lawyers to analyze  
> and
> fight disputes over the re-use of retrieved data under "fair use"
> guidelines. I'd much prefer if we can avoid such political and legal  
> hot
> potatoes. It would make my life much easier.

Yeah, I thought so.

How do we handle "mistakes"?  That is, are we OK as long as we make  
reasonable efforts upfront and then remove offending content when  
notified by the author?  Anything else we should do to make sure we  
are on the right side of the law here?  Should we publish policy and  
notification steps up front?

Thanks,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

RE: Redistributing Data

Posted by Lawrence Rosen <lr...@rosenlaw.com>.

Grant, snce it is an *Open* Relevance Project, it would be great to have the
sample crawl be for Creative Commons content. That has the advantage of
avoiding any copyright licensing issues.

Google and other crawlers have teams of copyright lawyers to analyze and
fight disputes over the re-use of retrieved data under "fair use"
guidelines. I'd much prefer if we can avoid such political and legal hot
potatoes. It would make my life much easier.

/Larry


> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Thursday, November 05, 2009 5:06 PM
> To: legal-discuss@apache.org
> Subject: Redistributing Data
> 
> Hi legal-discuss,
> 
> I'm involved with a Lucene project called the Open Relevance Project
> (ORP).  As part of our project, we would like to be able to
> redistribute collections of data that we have gathered for use in
> evaluating search.  This could be things like partial crawls of the
> web or other types of data.  Perhaps I'm being paranoid given the
> number of people who crawl the web and "cache" what is otherwise
> copyrighted data, but I would like to get some more insight into what
> it would mean for us (i.e. the ASF) to put up a tarball (or similar)
> containing something like a crawl of the web.
> 
> Other parts that might be of interest:
> 1. We maybe could identify creative commons data and just do that
> 2. We could simply honor no-cache and robots.txt and basically treat
> everything else as redistributable
> 3. We could do like Internet Archive does (maybe) and even
> retroactively apply robots.txt
> 4. We could be in a position to get a donation of a crawled data set
> from a company.  Doesn't really sound like a "software grant", but is
> there anything we should do?
> 
> Basically, I want to err on the side of caution all while providing
> our community with valuable data.
> 
> Thanks,
> Grant
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org