You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Clark Fitzgerald <cl...@gmail.com> on 2017/08/02 22:11:11 UTC

R arrow dependency on Rcpp?

Rcpp provides a friendlier interface between R and C++, similar to Cython
for Python.

Do people have opinions on whether R / Arrow depends on Rcpp? Alternatively
the bindings can be written directly using R's C++ API.

Here's a JIRA for work on R / Arrow:
https://issues.apache.org/jira/browse/ARROW-1325

Re: R arrow dependency on Rcpp?

Posted by Felix Cheung <fe...@hotmail.com>.
I see your view but I don't know if it is a fair assessment to judge two packages based on the set of packages they depend on ;)

In all fairness SparkR integrates nicely with the magrittr package for example and conceivably even the dplyr package if there is ever anyone interested in contributing that ;)

So one does not need to be GPL to work with GPL stuff, it all depends on how the interface is defined. I'd do agree, occasionally (frequently?), it is at somewhat disadvantage when there isn't any non-GPL alternative available (like Rcpp). As an ASF contributor working in the R ecosystem, I do find it fairly challenging, in many different ways, to make progress in this space. But I also don't think this is insurmountable, I would reasonably expect given some amount of work a late binding layer can be built to defer to alternative, perhaps not as optimal, behavior when a particular dependency is not available. So perhaps this is a people problem and not a technical one.

Going back to Arrow, I think it is perfectly valid to release as-is as source and non-published (to CRAN). We could likely benefit from having such R source code still being validated in CI and before releases to help ease the pain of users picking up something manually that is potentially dated and untested (and I would be happy to contribute to that sort of infra work)


_____________________________
From: Wes McKinney <we...@gmail.com>>
Sent: Friday, August 11, 2017 1:04 PM
Subject: Re: R arrow dependency on Rcpp?
To: Felix Cheung <fe...@hotmail.com>>
Cc: <de...@arrow.apache.org>>


hi Felix,

Thanks for this context.

From

> If the goal is to make such component a package released to CRAN however, then my take is this becomes a release by itself and what is required for the package to function becomes the area for discussion, as per my understanding.

My take is that releasing an Arrow source release or binary release to
CRAN, because of the extent of the R world's copyleft leanings, is
probably not the business of the Arrow PMC, as it is too fraught with
licensing concerns. It would be better to leave CRAN deployment to
downstream members of the R community. So the R community would be
free to release ArrowR to CRAN, but we the PMC would not vote on the
package artifacts.

This seems consistent with the way that software vendors are already
releasing Apache Spark and components of the Hadoop ecosystem in
downstream software distributions. I see this as preferable (compared
with avoiding all copyleft R software) as it enables a community of R
developers to thrive within the Apache Arrow project while avoiding
the redistribution questions.

The other side of this coin is that, IMHO, to eschew R's GPL ecosystem
(which describes, AFAICT, the vast majority of the R ecosystem, would
be nice to see some more detailed data) would amount to developing
R-Arrow integration with one hand tied behind our back. I would argue
that SparkR may be worse off because it is has (by appearances, at
least) isolated itself from the rest of the R ecosystem. Contrast
SparkR's import list [1] with sparklyr's [2], for example.

This is not an ideological viewpoint, merely a practical one. sparklyr
uses the Apache 2.0 license but depends on many GPLv2/3 packages at
runtime, and there is no conflict with this. The conflict is with the
ASF's position on releasing official software artifacts on behalf of
the project PMC, which we should absolutely respect.

I have no particular horse in this race as I do not do much R work
myself, except that I want to do what is best to grow the Arrow
community and enable the R world to benefit from our collective
efforts to the maximum extent possible.

Thanks,
Wes

[1]: https://github.com/apache/spark/blob/master/R/pkg/DESCRIPTION
[2:] https://github.com/rstudio/sparklyr/blob/master/DESCRIPTION

On Fri, Aug 11, 2017 at 11:57 AM, Felix Cheung
<fe...@hotmail.com>> wrote:
> Thanks Wes. I think the discussion is in line with my understanding of the
> release of optional component as well.
>
> In Spark, which is often used as an example in various discussions, actually
> does not have GPL dependencies that are required to function at runtime. We
> have build and test dependencies (that's very hard to avoid) but they are
> not needed to install and run the package (besides R itself). We have some
> native C code in the source, but they are not required and not built (and
> not tested with, and to be honest, it has been more than 2 years, not likely
> to work at all).
>
> So going back to the context of Arrow, and optional component. If the goal
> is the have R source that are released with Arrow as source, and that an
> user will need to make a choice to manually extract the R pieces,
> build/install manually, my interpretation is that will be ok.
>
> If the goal is to make such component a package released to CRAN however,
> then my take is this becomes a release by itself and what is required for
> the package to function becomes the area for discussion, as per my
> understanding.
>
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>>
> Sent: Friday, August 11, 2017 7:29:29 AM
> To: dev@arrow.apache.org<ma...@arrow.apache.org>
> Cc: Felix Cheung
> Subject: Re: R arrow dependency on Rcpp?
>
> It seems that using Rcpp is fine because an R library for Arrow is an
> optional component of the project, but will await more opinions on
> LEGAL-324.
>
> + Felix Cheung -- I wonder if you could comment further on your
> concerns about licensing of R build dependencies, which were mentioned
> elsewhere.
>
> Thanks
>
> On Thu, Aug 10, 2017 at 2:12 PM, Wes McKinney <we...@gmail.com>> wrote:
>> I started a discussion explaining the issue here:
>>
>> https://issues.apache.org/jira/browse/LEGAL-324
>>
>> On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <we...@gmail.com>> wrote:
>>> Thanks for weighing in on this, Hadley.
>>>
>>> To your point
>>>
>>>> You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>
>>> If someone wanted to create an all-GPLv2 software distribution
>>> containing R and a bunch of libraries, then including the R Arrow
>>> library would be problematic as Apache 2.0 is not compatible
>>> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
>>> think this is really a problem since R users generally just install
>>> things from CRAN.
>>>
>>> My understanding is that ASF legal has taken issue when an Apache
>>> project _cannot be used at all_ without a hard GPL dependency (outside
>>> certain exceptions, e.g. generated build files by GPL tools). This
>>> makes it impossible to create a self-contained software distribution
>>> of the project whose code and all dependencies are Apache 2.0
>>> compatible. There was the recent BSD+Patents discussion on LEGAL where
>>> projects were disallowed from using projects under that license as a
>>> hard dependency.
>>>
>>> I will open a LEGAL issue on the JIRA to discuss, but since the R
>>> portion of Arrow is an _optional_ part of the project, I am hopeful
>>> this will be deemed OK.
>>>
>>> - Wes
>>>
>>> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com>>
>>> wrote:
>>>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com>>
>>>> wrote:
>>>>> I can open a ticket to get a definitive answer to these questions.
>>>>>
>>>>> From http://www.apache.org/legal/resolved.html#platform and the
>>>>> subsequent questions there, I view the R language and build tools like
>>>>> Rcpp as part of the "R platform", which is, for the most part, all
>>>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>>>> beyond the R runtime. I think it is challenging to build high quality
>>>>> software for the R platform relying only on the main R runtime and the
>>>>> limited third party components which happens to be released under
>>>>> non-CategoryX licenses.
>>>>
>>>> Some legal advice is probably needed, but do also see this statement
>>>> from the R Foundation about package licenses:
>>>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>>>
>>>> In general, the R community has taken the opinion that it is ok to
>>>> license code that links to R with non-GPL (but GPL-compatible)
>>>> licenses. You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>>
>>>> So including an R arrow package would be fine according to the general
>>>> standards of the R community. The Apache legal counsel may of course
>>>> disagree.
>>>>
>>>> Hadley
>>>>
>>>> --
>>>> http://hadley.nz



Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
hi Felix,

Thanks for this context.

From

> If the goal is to make such component a package released to CRAN however, then my take is this becomes a release by itself and what is required for the package to function becomes the area for discussion, as per my understanding.

My take is that releasing an Arrow source release or binary release to
CRAN, because of the extent of the R world's copyleft leanings, is
probably not the business of the Arrow PMC, as it is too fraught with
licensing concerns. It would be better to leave CRAN deployment to
downstream members of the R community. So the R community would be
free to release ArrowR to CRAN, but we the PMC would not vote on the
package artifacts.

This seems consistent with the way that software vendors are already
releasing Apache Spark and components of the Hadoop ecosystem in
downstream software distributions. I see this as preferable (compared
with avoiding all copyleft R software) as it enables a community of R
developers to thrive within the Apache Arrow project while avoiding
the redistribution questions.

The other side of this coin is that, IMHO, to eschew R's GPL ecosystem
(which describes, AFAICT, the vast majority of the R ecosystem, would
be nice to see some more detailed data) would amount to developing
R-Arrow integration with one hand tied behind our back. I would argue
that SparkR may be worse off because it is has (by appearances, at
least) isolated itself from the rest of the R ecosystem. Contrast
SparkR's import list [1] with sparklyr's [2], for example.

This is not an ideological viewpoint, merely a practical one. sparklyr
uses the Apache 2.0 license but depends on many GPLv2/3 packages at
runtime, and there is no conflict with this. The conflict is with the
ASF's position on releasing official software artifacts on behalf of
the project PMC, which we should absolutely respect.

I have no particular horse in this race as I do not do much R work
myself, except that I want to do what is best to grow the Arrow
community and enable the R world to benefit from our collective
efforts to the maximum extent possible.

Thanks,
Wes

[1]: https://github.com/apache/spark/blob/master/R/pkg/DESCRIPTION
[2:] https://github.com/rstudio/sparklyr/blob/master/DESCRIPTION

On Fri, Aug 11, 2017 at 11:57 AM, Felix Cheung
<fe...@hotmail.com> wrote:
> Thanks Wes. I think the discussion is in line with my understanding of the
> release of optional component as well.
>
> In Spark, which is often used as an example in various discussions, actually
> does not have GPL dependencies that are required to function at runtime. We
> have build and test dependencies (that's very hard to avoid) but they are
> not needed to install and run the package (besides R itself). We have some
> native C code in the source, but they are not required and not built (and
> not tested with, and to be honest, it has been more than 2 years, not likely
> to work at all).
>
> So going back to the context of Arrow, and optional component. If the goal
> is the have R source that are released with Arrow as source, and that an
> user will need to make a choice to manually extract the R pieces,
> build/install manually, my interpretation is that will be ok.
>
> If the goal is to make such component a package released to CRAN however,
> then my take is this becomes a release by itself and what is required for
> the package to function becomes the area for discussion, as per my
> understanding.
>
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Friday, August 11, 2017 7:29:29 AM
> To: dev@arrow.apache.org
> Cc: Felix Cheung
> Subject: Re: R arrow dependency on Rcpp?
>
> It seems that using Rcpp is fine because an R library for Arrow is an
> optional component of the project, but will await more opinions on
> LEGAL-324.
>
> + Felix Cheung -- I wonder if you could comment further on your
> concerns about licensing of R build dependencies, which were mentioned
> elsewhere.
>
> Thanks
>
> On Thu, Aug 10, 2017 at 2:12 PM, Wes McKinney <we...@gmail.com> wrote:
>> I started a discussion explaining the issue here:
>>
>> https://issues.apache.org/jira/browse/LEGAL-324
>>
>> On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <we...@gmail.com> wrote:
>>> Thanks for weighing in on this, Hadley.
>>>
>>> To your point
>>>
>>>> You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>
>>> If someone wanted to create an all-GPLv2 software distribution
>>> containing R and a bunch of libraries, then including the R Arrow
>>> library would be problematic as Apache 2.0 is not compatible
>>> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
>>> think this is really a problem since R users generally just install
>>> things from CRAN.
>>>
>>> My understanding is that ASF legal has taken issue when an Apache
>>> project _cannot be used at all_ without a hard GPL dependency (outside
>>> certain exceptions, e.g. generated build files by GPL tools). This
>>> makes it impossible to create a self-contained software distribution
>>> of the project whose code and all dependencies are Apache 2.0
>>> compatible. There was the recent BSD+Patents discussion on LEGAL where
>>> projects were disallowed from using projects under that license as a
>>> hard dependency.
>>>
>>> I will open a LEGAL issue on the JIRA to discuss, but since the R
>>> portion of Arrow is an _optional_ part of the project, I am hopeful
>>> this will be deemed OK.
>>>
>>> - Wes
>>>
>>> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com>
>>> wrote:
>>>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com>
>>>> wrote:
>>>>> I can open a ticket to get a definitive answer to these questions.
>>>>>
>>>>> From http://www.apache.org/legal/resolved.html#platform and the
>>>>> subsequent questions there, I view the R language and build tools like
>>>>> Rcpp as part of the "R platform", which is, for the most part, all
>>>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>>>> beyond the R runtime. I think it is challenging to build high quality
>>>>> software for the R platform relying only on the main R runtime and the
>>>>> limited third party components which happens to be released under
>>>>> non-CategoryX licenses.
>>>>
>>>> Some legal advice is probably needed, but do also see this statement
>>>> from the R Foundation about package licenses:
>>>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>>>
>>>> In general, the R community has taken the opinion that it is ok to
>>>> license code that links to R with non-GPL (but GPL-compatible)
>>>> licenses. You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>>
>>>> So including an R arrow package would be fine according to the general
>>>> standards of the R community. The Apache legal counsel may of course
>>>> disagree.
>>>>
>>>> Hadley
>>>>
>>>> --
>>>> http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Felix Cheung <fe...@hotmail.com>.
Thanks Wes. I think the discussion is in line with my understanding of the release of optional component as well.

In Spark, which is often used as an example in various discussions, actually does not have GPL dependencies that are required to function at runtime. We have build and test dependencies (that's very hard to avoid) but they are not needed to install and run the package (besides R itself). We have some native C code in the source, but they are not required and not built (and not tested with, and to be honest, it has been more than 2 years, not likely to work at all).

So going back to the context of Arrow, and optional component. If the goal is the have R source that are released with Arrow as source, and that an user will need to make a choice to manually extract the R pieces, build/install manually, my interpretation is that will be ok.

If the goal is to make such component a package released to CRAN however, then my take is this becomes a release by itself and what is required for the package to function becomes the area for discussion, as per my understanding.


________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Friday, August 11, 2017 7:29:29 AM
To: dev@arrow.apache.org
Cc: Felix Cheung
Subject: Re: R arrow dependency on Rcpp?

It seems that using Rcpp is fine because an R library for Arrow is an
optional component of the project, but will await more opinions on
LEGAL-324.

+ Felix Cheung -- I wonder if you could comment further on your
concerns about licensing of R build dependencies, which were mentioned
elsewhere.

Thanks

On Thu, Aug 10, 2017 at 2:12 PM, Wes McKinney <we...@gmail.com> wrote:
> I started a discussion explaining the issue here:
>
> https://issues.apache.org/jira/browse/LEGAL-324
>
> On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <we...@gmail.com> wrote:
>> Thanks for weighing in on this, Hadley.
>>
>> To your point
>>
>>> You can distribute the package code according to its
>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>> the GPL will apply to the whole conglomerate.
>>
>> If someone wanted to create an all-GPLv2 software distribution
>> containing R and a bunch of libraries, then including the R Arrow
>> library would be problematic as Apache 2.0 is not compatible
>> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
>> think this is really a problem since R users generally just install
>> things from CRAN.
>>
>> My understanding is that ASF legal has taken issue when an Apache
>> project _cannot be used at all_ without a hard GPL dependency (outside
>> certain exceptions, e.g. generated build files by GPL tools). This
>> makes it impossible to create a self-contained software distribution
>> of the project whose code and all dependencies are Apache 2.0
>> compatible. There was the recent BSD+Patents discussion on LEGAL where
>> projects were disallowed from using projects under that license as a
>> hard dependency.
>>
>> I will open a LEGAL issue on the JIRA to discuss, but since the R
>> portion of Arrow is an _optional_ part of the project, I am hopeful
>> this will be deemed OK.
>>
>> - Wes
>>
>> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com> wrote:
>>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com> wrote:
>>>> I can open a ticket to get a definitive answer to these questions.
>>>>
>>>> From http://www.apache.org/legal/resolved.html#platform and the
>>>> subsequent questions there, I view the R language and build tools like
>>>> Rcpp as part of the "R platform", which is, for the most part, all
>>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>>> beyond the R runtime. I think it is challenging to build high quality
>>>> software for the R platform relying only on the main R runtime and the
>>>> limited third party components which happens to be released under
>>>> non-CategoryX licenses.
>>>
>>> Some legal advice is probably needed, but do also see this statement
>>> from the R Foundation about package licenses:
>>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>>
>>> In general, the R community has taken the opinion that it is ok to
>>> license code that links to R with non-GPL (but GPL-compatible)
>>> licenses. You can distribute the package code according to its
>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>> the GPL will apply to the whole conglomerate.
>>>
>>> So including an R arrow package would be fine according to the general
>>> standards of the R community. The Apache legal counsel may of course
>>> disagree.
>>>
>>> Hadley
>>>
>>> --
>>> http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
It seems that using Rcpp is fine because an R library for Arrow is an
optional component of the project, but will await more opinions on
LEGAL-324.

+ Felix Cheung -- I wonder if you could comment further on your
concerns about licensing of R build dependencies, which were mentioned
elsewhere.

Thanks

On Thu, Aug 10, 2017 at 2:12 PM, Wes McKinney <we...@gmail.com> wrote:
> I started a discussion explaining the issue here:
>
> https://issues.apache.org/jira/browse/LEGAL-324
>
> On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <we...@gmail.com> wrote:
>> Thanks for weighing in on this, Hadley.
>>
>> To your point
>>
>>> You can distribute the package code according to its
>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>> the GPL will apply to the whole conglomerate.
>>
>> If someone wanted to create an all-GPLv2 software distribution
>> containing R and a bunch of libraries, then including the R Arrow
>> library would be problematic as Apache 2.0 is not compatible
>> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
>> think this is really a problem since R users generally just install
>> things from CRAN.
>>
>> My understanding is that ASF legal has taken issue when an Apache
>> project _cannot be used at all_ without a hard GPL dependency (outside
>> certain exceptions, e.g. generated build files by GPL tools). This
>> makes it impossible to create a self-contained software distribution
>> of the project whose code and all dependencies are Apache 2.0
>> compatible. There was the recent BSD+Patents discussion on LEGAL where
>> projects were disallowed from using projects under that license as a
>> hard dependency.
>>
>> I will open a LEGAL issue on the JIRA to discuss, but since the R
>> portion of Arrow is an _optional_ part of the project, I am hopeful
>> this will be deemed OK.
>>
>> - Wes
>>
>> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com> wrote:
>>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com> wrote:
>>>> I can open a ticket to get a definitive answer to these questions.
>>>>
>>>> From http://www.apache.org/legal/resolved.html#platform and the
>>>> subsequent questions there, I view the R language and build tools like
>>>> Rcpp as part of the "R platform", which is, for the most part, all
>>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>>> beyond the R runtime. I think it is challenging to build high quality
>>>> software for the R platform relying only on the main R runtime and the
>>>> limited third party components which happens to be released under
>>>> non-CategoryX licenses.
>>>
>>> Some legal advice is probably needed, but do also see this statement
>>> from the R Foundation about package licenses:
>>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>>
>>> In general, the R community has taken the opinion that it is ok to
>>> license code that links to R with non-GPL (but GPL-compatible)
>>> licenses. You can distribute the package code according to its
>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>> the GPL will apply to the whole conglomerate.
>>>
>>> So including an R arrow package would be fine according to the general
>>> standards of the R community. The Apache legal counsel may of course
>>> disagree.
>>>
>>> Hadley
>>>
>>> --
>>> http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
I started a discussion explaining the issue here:

https://issues.apache.org/jira/browse/LEGAL-324

On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <we...@gmail.com> wrote:
> Thanks for weighing in on this, Hadley.
>
> To your point
>
>> You can distribute the package code according to its
>> license, but whenever you bundle it with R (i.e. to actually use it)
>> the GPL will apply to the whole conglomerate.
>
> If someone wanted to create an all-GPLv2 software distribution
> containing R and a bunch of libraries, then including the R Arrow
> library would be problematic as Apache 2.0 is not compatible
> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
> think this is really a problem since R users generally just install
> things from CRAN.
>
> My understanding is that ASF legal has taken issue when an Apache
> project _cannot be used at all_ without a hard GPL dependency (outside
> certain exceptions, e.g. generated build files by GPL tools). This
> makes it impossible to create a self-contained software distribution
> of the project whose code and all dependencies are Apache 2.0
> compatible. There was the recent BSD+Patents discussion on LEGAL where
> projects were disallowed from using projects under that license as a
> hard dependency.
>
> I will open a LEGAL issue on the JIRA to discuss, but since the R
> portion of Arrow is an _optional_ part of the project, I am hopeful
> this will be deemed OK.
>
> - Wes
>
> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com> wrote:
>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com> wrote:
>>> I can open a ticket to get a definitive answer to these questions.
>>>
>>> From http://www.apache.org/legal/resolved.html#platform and the
>>> subsequent questions there, I view the R language and build tools like
>>> Rcpp as part of the "R platform", which is, for the most part, all
>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>> beyond the R runtime. I think it is challenging to build high quality
>>> software for the R platform relying only on the main R runtime and the
>>> limited third party components which happens to be released under
>>> non-CategoryX licenses.
>>
>> Some legal advice is probably needed, but do also see this statement
>> from the R Foundation about package licenses:
>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>
>> In general, the R community has taken the opinion that it is ok to
>> license code that links to R with non-GPL (but GPL-compatible)
>> licenses. You can distribute the package code according to its
>> license, but whenever you bundle it with R (i.e. to actually use it)
>> the GPL will apply to the whole conglomerate.
>>
>> So including an R arrow package would be fine according to the general
>> standards of the R community. The Apache legal counsel may of course
>> disagree.
>>
>> Hadley
>>
>> --
>> http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
Thanks for weighing in on this, Hadley.

To your point

> You can distribute the package code according to its
> license, but whenever you bundle it with R (i.e. to actually use it)
> the GPL will apply to the whole conglomerate.

If someone wanted to create an all-GPLv2 software distribution
containing R and a bunch of libraries, then including the R Arrow
library would be problematic as Apache 2.0 is not compatible
(https://www.apache.org/licenses/GPL-compatibility.html). I don't
think this is really a problem since R users generally just install
things from CRAN.

My understanding is that ASF legal has taken issue when an Apache
project _cannot be used at all_ without a hard GPL dependency (outside
certain exceptions, e.g. generated build files by GPL tools). This
makes it impossible to create a self-contained software distribution
of the project whose code and all dependencies are Apache 2.0
compatible. There was the recent BSD+Patents discussion on LEGAL where
projects were disallowed from using projects under that license as a
hard dependency.

I will open a LEGAL issue on the JIRA to discuss, but since the R
portion of Arrow is an _optional_ part of the project, I am hopeful
this will be deemed OK.

- Wes

On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h....@gmail.com> wrote:
> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com> wrote:
>> I can open a ticket to get a definitive answer to these questions.
>>
>> From http://www.apache.org/legal/resolved.html#platform and the
>> subsequent questions there, I view the R language and build tools like
>> Rcpp as part of the "R platform", which is, for the most part, all
>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>> beyond the R runtime. I think it is challenging to build high quality
>> software for the R platform relying only on the main R runtime and the
>> limited third party components which happens to be released under
>> non-CategoryX licenses.
>
> Some legal advice is probably needed, but do also see this statement
> from the R Foundation about package licenses:
> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>
> In general, the R community has taken the opinion that it is ok to
> license code that links to R with non-GPL (but GPL-compatible)
> licenses. You can distribute the package code according to its
> license, but whenever you bundle it with R (i.e. to actually use it)
> the GPL will apply to the whole conglomerate.
>
> So including an R arrow package would be fine according to the general
> standards of the R community. The Apache legal counsel may of course
> disagree.
>
> Hadley
>
> --
> http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Hadley Wickham <h....@gmail.com>.
On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <we...@gmail.com> wrote:
> I can open a ticket to get a definitive answer to these questions.
>
> From http://www.apache.org/legal/resolved.html#platform and the
> subsequent questions there, I view the R language and build tools like
> Rcpp as part of the "R platform", which is, for the most part, all
> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
> beyond the R runtime. I think it is challenging to build high quality
> software for the R platform relying only on the main R runtime and the
> limited third party components which happens to be released under
> non-CategoryX licenses.

Some legal advice is probably needed, but do also see this statement
from the R Foundation about package licenses:
https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html

In general, the R community has taken the opinion that it is ok to
license code that links to R with non-GPL (but GPL-compatible)
licenses. You can distribute the package code according to its
license, but whenever you bundle it with R (i.e. to actually use it)
the GPL will apply to the whole conglomerate.

So including an R arrow package would be fine according to the general
standards of the R community. The Apache legal counsel may of course
disagree.

Hadley

-- 
http://hadley.nz

Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
I can open a ticket to get a definitive answer to these questions.

From http://www.apache.org/legal/resolved.html#platform and the
subsequent questions there, I view the R language and build tools like
Rcpp as part of the "R platform", which is, for the most part, all
GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
beyond the R runtime. I think it is challenging to build high quality
software for the R platform relying only on the main R runtime and the
limited third party components which happens to be released under
non-CategoryX licenses.

On Thu, Aug 3, 2017 at 4:02 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> For the license issues, I would either refer to Spark how they took care
> of these licensing things in SparkR or file a ticket in the LEGAL
> tracker in JIRA to get help on these licensing issues from the ASF.
> Having GPL runtime dependency looks very problematic for me in an Apache
> project.
>
> Uwe
>
>
> On Thu, Aug 3, 2017, at 12:19 AM, Wes McKinney wrote:
>> As some context: the R language runtime and a significant portion of
>> R's third party libraries are released under GPLv2 and/or GPLv3.
>>
>> Rcpp, a toolkit for developing C++ extensions for R, is dual-licensed
>> under GPLv2/GPLv3. It is a build and runtime dependency of many
>> packages (https://cran.r-project.org/web/packages/Rcpp/index.html). It
>> does not need to be vendored with a project or redistributed in any
>> form with Apache Arrow.
>>
>> Since R bindings would be an optional component of Arrow, this seems
>> less concerning from a licensing point of view, but I am not an expert
>> on the policies for project component dependencies.
>>
>> Thanks
>> Wes
>>
>> On Wed, Aug 2, 2017 at 6:11 PM, Clark Fitzgerald <cl...@gmail.com>
>> wrote:
>> > Rcpp provides a friendlier interface between R and C++, similar to Cython
>> > for Python.
>> >
>> > Do people have opinions on whether R / Arrow depends on Rcpp? Alternatively
>> > the bindings can be written directly using R's C++ API.
>> >
>> > Here's a JIRA for work on R / Arrow:
>> > https://issues.apache.org/jira/browse/ARROW-1325

Re: R arrow dependency on Rcpp?

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
For the license issues, I would either refer to Spark how they took care
of these licensing things in SparkR or file a ticket in the LEGAL
tracker in JIRA to get help on these licensing issues from the ASF.
Having GPL runtime dependency looks very problematic for me in an Apache
project.

Uwe


On Thu, Aug 3, 2017, at 12:19 AM, Wes McKinney wrote:
> As some context: the R language runtime and a significant portion of
> R's third party libraries are released under GPLv2 and/or GPLv3.
> 
> Rcpp, a toolkit for developing C++ extensions for R, is dual-licensed
> under GPLv2/GPLv3. It is a build and runtime dependency of many
> packages (https://cran.r-project.org/web/packages/Rcpp/index.html). It
> does not need to be vendored with a project or redistributed in any
> form with Apache Arrow.
> 
> Since R bindings would be an optional component of Arrow, this seems
> less concerning from a licensing point of view, but I am not an expert
> on the policies for project component dependencies.
> 
> Thanks
> Wes
> 
> On Wed, Aug 2, 2017 at 6:11 PM, Clark Fitzgerald <cl...@gmail.com>
> wrote:
> > Rcpp provides a friendlier interface between R and C++, similar to Cython
> > for Python.
> >
> > Do people have opinions on whether R / Arrow depends on Rcpp? Alternatively
> > the bindings can be written directly using R's C++ API.
> >
> > Here's a JIRA for work on R / Arrow:
> > https://issues.apache.org/jira/browse/ARROW-1325

Re: R arrow dependency on Rcpp?

Posted by Wes McKinney <we...@gmail.com>.
As some context: the R language runtime and a significant portion of
R's third party libraries are released under GPLv2 and/or GPLv3.

Rcpp, a toolkit for developing C++ extensions for R, is dual-licensed
under GPLv2/GPLv3. It is a build and runtime dependency of many
packages (https://cran.r-project.org/web/packages/Rcpp/index.html). It
does not need to be vendored with a project or redistributed in any
form with Apache Arrow.

Since R bindings would be an optional component of Arrow, this seems
less concerning from a licensing point of view, but I am not an expert
on the policies for project component dependencies.

Thanks
Wes

On Wed, Aug 2, 2017 at 6:11 PM, Clark Fitzgerald <cl...@gmail.com> wrote:
> Rcpp provides a friendlier interface between R and C++, similar to Cython
> for Python.
>
> Do people have opinions on whether R / Arrow depends on Rcpp? Alternatively
> the bindings can be written directly using R's C++ API.
>
> Here's a JIRA for work on R / Arrow:
> https://issues.apache.org/jira/browse/ARROW-1325

Re: R arrow dependency on Rcpp?

Posted by Hadley Wickham <h....@gmail.com>.
I think it's important to use Rcpp. Not using Rcpp will require a
bunch of extra work for little gain.

Hadley

On Wed, Aug 2, 2017 at 5:11 PM, Clark Fitzgerald <cl...@gmail.com> wrote:
> Rcpp provides a friendlier interface between R and C++, similar to Cython
> for Python.
>
> Do people have opinions on whether R / Arrow depends on Rcpp? Alternatively
> the bindings can be written directly using R's C++ API.
>
> Here's a JIRA for work on R / Arrow:
> https://issues.apache.org/jira/browse/ARROW-1325



-- 
http://hadley.nz