You are viewing a plain text version of this content. The canonical link for it is here.
Posted to legal-discuss@apache.org by "Steven Rowe (JIRA)" <ji...@apache.org> on 2011/05/18 23:28:47 UTC

[jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?
--------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: LEGAL-90
                 URL: https://issues.apache.org/jira/browse/LEGAL-90
             Project: Legal Discuss
          Issue Type: Question
            Reporter: Steven Rowe


I have generated word frequency lists from full Wikipedia dumps in several languages.  For the purposes of inclusion in ASL2-licensed products, do I need to care about the license(s) covering the original text?

My interpretation (IANAL) of the [Creative Commons Attribution-ShareAlike 3.0 Unported license|http://creativecommons.org/licenses/by-sa/3.0/legalcode], under which [Wikipedia text is licensed|http://wikimediafoundation.org/wiki/Terms_of_Use], is that the license applies only to the Covered Works, Adaptations, and Collections, and that a word frequency list qualifies as none of these: Adaptations are "recognizably derived from the original"; and Collections "the Work is included in its entirety in unmodified form along with one or more other contributions".

My interpretation of the answer to the resolved question ["Can Apache projects include Creative Commons Attribution-Share Alike works?"|http://www.apache.org/legal/resolved.html#cc-sa] is that even if the CC-SA license applies to my word frequency lists, I can still include them in an ASL2-licensed product, as long as attribution is provided.

I'm also interested in the more general question, as posed in the issue summary: do the licenses covering arbitrary data, text or otherwise, have any bearing on stastical products created over the data?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


Re: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by Ted Dunning <te...@gmail.com>.
Yeah... there *was* a copyright case involving this guy (and two others)
regarding the scrolls, but I kind of got the situations gemischt.  The
copyright case involved the publication of photos and an article, not the
reverse engineered text.

On Wed, May 18, 2011 at 8:04 PM, Lawrence Rosen <lr...@rosenlaw.com> wrote:

> Ted Dunning wrote:
>
> I think that somebody published a concordance and got in Dutch with the
> guys controlling access to the scrolls.  Not sure if it was actually a
> copyright issue (the scrolls themselves are just a bit out of copyright) or
> a contract issue or something else.
>
>
>
> You aroused my curiosity. Below is what Wikipedia tells us. This confirms
> again the ultimate futility of trying to restrict access to important
> copyrighted (or in this case long-since-not-copyrighted!) works. Thanks be
> to reverse engineering and the diligence of scholars.
>
>
>
> You suggest an even more complicated question: If, as I argued before, a
> concordance isn't a derivative work, then what do you call it when "an
> approximate reconstruction of the original text" is recreated from the
> concordance? Fortunately for me, I refuse to answer hypothetical legal
> questions on this list. :-)
>
>
>
> /Larry
>
>
>
>
>
>
>
> Inverting a concordance
>
> A famous use of a concordance involved the reconstruction of the text of
> some of the Dead Sea Scrolls<http://en.wikipedia.org/wiki/Dead_Sea_Scrolls>
>  from a concordance.
>
> Access to some of the scrolls was governed by a "secrecy rule" that allowed
> only the original International Team or their designates to view the
> original materials. After the death of Roland de Vaux<http://en.wikipedia.org/wiki/Roland_de_Vaux>
>  in 1971, his successors repeatedly refused to even allow the publication
> of photographs to other scholars. This restriction was circumvented by Martin
> Abegg<http://en.wikipedia.org/w/index.php?title=Martin_Abegg&action=edit&redlink=1>
>  in 1991, who used a computer to "invert" a concordance of the missing
> documents made in the 1950s which had come into the hands of scholars
> outside of the International Team, to obtain an approximate reconstruction
> of the original text of 17 of the documents.[2]<http://en.wikipedia.org/wiki/Concordance_(publishing)#cite_note-1>
> [3] <http://en.wikipedia.org/wiki/Concordance_(publishing)#cite_note-2> This
> was soon followed by the release of the original text of the scrolls.
>
> http://en.wikipedia.org/wiki/Concordance_(publishing)
>
>
>
>
>
>
>
> *From:* Ted Dunning [mailto:ted.dunning@gmail.com]
> *Sent:* Wednesday, May 18, 2011 4:39 PM
> *To:* legal-discuss@apache.org
> *Subject:* Re: [jira] [Created] (LEGAL-90) What are the licensing
> implications for statistical information drawn from non-ASL2-licensed data,
> e.g. word frequency lists from Wikipedia dumps?
>
>
>
> Wasn't there some case law on this with respect to published editions of
> the Dead Sea scrolls?
>
>
>
> I think that somebody published a concordance and got in Dutch with the
> guys controlling access to the scrolls.  Not sure
>
> if it was actually a copyright issue (the scrolls themselves are just a bit
> out of copyright) or a contract issue or something else.
>
> On Wed, May 18, 2011 at 4:24 PM, Lawrence Rosen <lr...@rosenlaw.com>
> wrote:
>
> You might also argue that a statistical transformation of a work doesn't
> create a copyrightable work, hence it is not even a derivative work. I'm not
> sure what it is.... Perhaps just a set of numbers that means something only
> to a statistician? Is the reduced data an "expressive work"?
>
>
>

RE: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by Lawrence Rosen <lr...@rosenlaw.com>.
Ted Dunning wrote:

I think that somebody published a concordance and got in Dutch with the guys controlling access to the scrolls.  Not sure if it was actually a copyright issue (the scrolls themselves are just a bit out of copyright) or a contract issue or something else. 

 

You aroused my curiosity. Below is what Wikipedia tells us. This confirms again the ultimate futility of trying to restrict access to important copyrighted (or in this case long-since-not-copyrighted!) works. Thanks be to reverse engineering and the diligence of scholars.

 

You suggest an even more complicated question: If, as I argued before, a concordance isn't a derivative work, then what do you call it when "an approximate reconstruction of the original text" is recreated from the concordance? Fortunately for me, I refuse to answer hypothetical legal questions on this list. :-)

 

/Larry

 

 

 

Inverting a concordance

A famous use of a concordance involved the reconstruction of the text of some of the  <http://en.wikipedia.org/wiki/Dead_Sea_Scrolls> Dead Sea Scrolls from a concordance.

Access to some of the scrolls was governed by a "secrecy rule" that allowed only the original International Team or their designates to view the original materials. After the death of  <http://en.wikipedia.org/wiki/Roland_de_Vaux> Roland de Vaux in 1971, his successors repeatedly refused to even allow the publication of photographs to other scholars. This restriction was circumvented by  <http://en.wikipedia.org/w/index.php?title=Martin_Abegg&action=edit&redlink=1> Martin Abegg in 1991, who used a computer to "invert" a concordance of the missing documents made in the 1950s which had come into the hands of scholars outside of the International Team, to obtain an approximate reconstruction of the original text of 17 of the documents. <http://en.wikipedia.org/wiki/Concordance_(publishing)#cite_note-1> [2] <http://en.wikipedia.org/wiki/Concordance_(publishing)#cite_note-2> [3] This was soon followed by the release of the original text of the scrolls.

http://en.wikipedia.org/wiki/Concordance_(publishing)

 

 

 

From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Wednesday, May 18, 2011 4:39 PM
To: legal-discuss@apache.org
Subject: Re: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

 

Wasn't there some case law on this with respect to published editions of the Dead Sea scrolls?

 

I think that somebody published a concordance and got in Dutch with the guys controlling access to the scrolls.  Not sure

if it was actually a copyright issue (the scrolls themselves are just a bit out of copyright) or a contract issue or something else. 

On Wed, May 18, 2011 at 4:24 PM, Lawrence Rosen <lr...@rosenlaw.com> wrote:

You might also argue that a statistical transformation of a work doesn't create a copyrightable work, hence it is not even a derivative work. I'm not sure what it is.... Perhaps just a set of numbers that means something only to a statistician? Is the reduced data an "expressive work"?

 


Re: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by Ted Dunning <te...@gmail.com>.
Wasn't there some case law on this with respect to published editions of the
Dead Sea scrolls?

I think that somebody published a concordance and got in Dutch with the guys
controlling access to the scrolls.  Not sure
if it was actually a copyright issue (the scrolls themselves are just a bit
out of copyright) or a contract issue or something else.

On Wed, May 18, 2011 at 4:24 PM, Lawrence Rosen <lr...@rosenlaw.com> wrote:

> You might also argue that a statistical transformation of a work doesn't
> create a copyrightable work, hence it is not even a derivative work. I'm not
> sure what it is.... Perhaps just a set of numbers that means something only
> to a statistician? Is the reduced data an "expressive work"?
>

Re: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by Ralph Goers <ra...@dslextreme.com>.
I'm not sure if you are aware that you are probably not answering the author in a manner that is visible to him.  He asked his question in Jira - which automatically sends an email here. He may not be subscribed to this list and your answer won't automatically be forwarded to Jira.

Ralph

On May 18, 2011, at 4:24 PM, Lawrence Rosen wrote:

> Steven Rowe asked:
>> What are the licensing implications for statistical information drawn
>> from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia
>> dumps?
> and 
>> I'm also interested in the more general question, as posed in the issue
>> summary: do the licenses covering arbitrary data, text or otherwise,
>> have any bearing on stastical products created over the data?
> 
> Interesting questions. 
> 
> Perhaps you could argue the fair use factors in 17 USC 107 to conclude that your transformations of those copyrighted works are fair use for scholarship or research purposes? For example, building a word index and word count for Shakespeare's plays used to be an important way to analyze whether the same person wrote all the works. Of course Shakespeare is public domain nowadays, so the example isn't precisely on point.
> 
> These are the fair use factors:
> 
> (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
> 
> (2) the nature of the copyrighted work;
> 
> (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
> 
> (4) the effect of the use upon the potential market for or value of the copyrighted work.
> 
> You might also argue that a statistical transformation of a work doesn't create a copyrightable work, hence it is not even a derivative work. I'm not sure what it is.... Perhaps just a set of numbers that means something only to a statistician? Is the reduced data an "expressive work"?
> 
> /Larry
> 
> 
>> -----Original Message-----
>> From: Steven Rowe (JIRA) [mailto:jira@apache.org]
>> Sent: Wednesday, May 18, 2011 2:29 PM
>> To: legal-discuss@apache.org
>> Subject: [jira] [Created] (LEGAL-90) What are the licensing
>> implications for statistical information drawn from non-ASL2-licensed
>> data, e.g. word frequency lists from Wikipedia dumps?
>> 
>> What are the licensing implications for statistical information drawn
>> from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia
>> dumps?
>> -----------------------------------------------------------------------
>> -----------------------------------------------------------------------
>> ----
>> 
>>                 Key: LEGAL-90
>>                 URL: https://issues.apache.org/jira/browse/LEGAL-90
>>             Project: Legal Discuss
>>          Issue Type: Question
>>            Reporter: Steven Rowe
>> 
>> 
>> I have generated word frequency lists from full Wikipedia dumps in
>> several languages.  For the purposes of inclusion in ASL2-licensed
>> products, do I need to care about the license(s) covering the original
>> text?
>> 
>> My interpretation (IANAL) of the [Creative Commons Attribution-
>> ShareAlike 3.0 Unported license|http://creativecommons.org/licenses/by-
>> sa/3.0/legalcode], under which [Wikipedia text is
>> licensed|http://wikimediafoundation.org/wiki/Terms_of_Use], is that the
>> license applies only to the Covered Works, Adaptations, and
>> Collections, and that a word frequency list qualifies as none of these:
>> Adaptations are "recognizably derived from the original"; and
>> Collections "the Work is included in its entirety in unmodified form
>> along with one or more other contributions".
>> 
>> My interpretation of the answer to the resolved question ["Can Apache
>> projects include Creative Commons Attribution-Share Alike
>> works?"|http://www.apache.org/legal/resolved.html#cc-sa] is that even
>> if the CC-SA license applies to my word frequency lists, I can still
>> include them in an ASL2-licensed product, as long as attribution is
>> provided.
>> 
>> I'm also interested in the more general question, as posed in the issue
>> summary: do the licenses covering arbitrary data, text or otherwise,
>> have any bearing on stastical products created over the data?
>> 
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see:
>> http://www.atlassian.com/software/jira
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>> For additional commands, e-mail: legal-discuss-help@apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


RE: [jira] [Created] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by Lawrence Rosen <lr...@rosenlaw.com>.
Steven Rowe asked:
> What are the licensing implications for statistical information drawn
> from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia
> dumps?
and 
> I'm also interested in the more general question, as posed in the issue
> summary: do the licenses covering arbitrary data, text or otherwise,
> have any bearing on stastical products created over the data?

Interesting questions. 

Perhaps you could argue the fair use factors in 17 USC 107 to conclude that your transformations of those copyrighted works are fair use for scholarship or research purposes? For example, building a word index and word count for Shakespeare's plays used to be an important way to analyze whether the same person wrote all the works. Of course Shakespeare is public domain nowadays, so the example isn't precisely on point.

These are the fair use factors:

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

You might also argue that a statistical transformation of a work doesn't create a copyrightable work, hence it is not even a derivative work. I'm not sure what it is.... Perhaps just a set of numbers that means something only to a statistician? Is the reduced data an "expressive work"?

/Larry


> -----Original Message-----
> From: Steven Rowe (JIRA) [mailto:jira@apache.org]
> Sent: Wednesday, May 18, 2011 2:29 PM
> To: legal-discuss@apache.org
> Subject: [jira] [Created] (LEGAL-90) What are the licensing
> implications for statistical information drawn from non-ASL2-licensed
> data, e.g. word frequency lists from Wikipedia dumps?
> 
> What are the licensing implications for statistical information drawn
> from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia
> dumps?
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
> ----
> 
>                  Key: LEGAL-90
>                  URL: https://issues.apache.org/jira/browse/LEGAL-90
>              Project: Legal Discuss
>           Issue Type: Question
>             Reporter: Steven Rowe
> 
> 
> I have generated word frequency lists from full Wikipedia dumps in
> several languages.  For the purposes of inclusion in ASL2-licensed
> products, do I need to care about the license(s) covering the original
> text?
> 
> My interpretation (IANAL) of the [Creative Commons Attribution-
> ShareAlike 3.0 Unported license|http://creativecommons.org/licenses/by-
> sa/3.0/legalcode], under which [Wikipedia text is
> licensed|http://wikimediafoundation.org/wiki/Terms_of_Use], is that the
> license applies only to the Covered Works, Adaptations, and
> Collections, and that a word frequency list qualifies as none of these:
> Adaptations are "recognizably derived from the original"; and
> Collections "the Work is included in its entirety in unmodified form
> along with one or more other contributions".
> 
> My interpretation of the answer to the resolved question ["Can Apache
> projects include Creative Commons Attribution-Share Alike
> works?"|http://www.apache.org/legal/resolved.html#cc-sa] is that even
> if the CC-SA license applies to my word frequency lists, I can still
> include them in an ASL2-licensed product, as long as attribution is
> provided.
> 
> I'm also interested in the more general question, as posed in the issue
> summary: do the licenses covering arbitrary data, text or otherwise,
> have any bearing on stastical products created over the data?
> 
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


[jira] [Commented] (LEGAL-90) What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LEGAL-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035831#comment-13035831 ] 

Benson Margulies commented on LEGAL-90:
---------------------------------------

FWIW, I have received qualified legal advice that statistical models of this kind are not derived works. However, this leaves another problem not precisely asked by Mr. Rowe: stuff incorporated into an Apache product is supposed to be *open source*, and the sources in this case are not open. The spam-assassin project has worked out a compromise approach that might be apropos.

> What are the licensing implications for statistical information drawn from non-ASL2-licensed data, e.g. word frequency lists from Wikipedia dumps?
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LEGAL-90
>                 URL: https://issues.apache.org/jira/browse/LEGAL-90
>             Project: Legal Discuss
>          Issue Type: Question
>            Reporter: Steven Rowe
>
> I have generated word frequency lists from full Wikipedia dumps in several languages.  For the purposes of inclusion in ASL2-licensed products, do I need to care about the license(s) covering the original text?
> My interpretation (IANAL) of the [Creative Commons Attribution-ShareAlike 3.0 Unported license|http://creativecommons.org/licenses/by-sa/3.0/legalcode], under which [Wikipedia text is licensed|http://wikimediafoundation.org/wiki/Terms_of_Use], is that the license applies only to the Covered Works, Adaptations, and Collections, and that a word frequency list qualifies as none of these: Adaptations are "recognizably derived from the original"; and Collections "the Work is included in its entirety in unmodified form along with one or more other contributions".
> My interpretation of the answer to the resolved question ["Can Apache projects include Creative Commons Attribution-Share Alike works?"|http://www.apache.org/legal/resolved.html#cc-sa] is that even if the CC-SA license applies to my word frequency lists, I can still include them in an ASL2-licensed product, as long as attribution is provided.
> I'm also interested in the more general question, as posed in the issue summary: do the licenses covering arbitrary data, text or otherwise, have any bearing on stastical products created over the data?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org