You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2019/10/04 18:07:00 UTC

[jira] [Created] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

Neal Richardson created ARROW-6793:
--------------------------------------

             Summary: [R] Arrow C++ binary packaging for Linux
                 Key: ARROW-6793
                 URL: https://issues.apache.org/jira/browse/ARROW-6793
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Neal Richardson
            Assignee: Neal Richardson
             Fix For: 1.0.0


Our current installation experience on Linux isn't ideal. Unless you've already installed the Arrow C++ library, when you install the R package, you get a shell that tells you to install the C++ library. That was a useful approach to allow us to get the package on CRAN, which makes it easy for macOS and Windows users to install, but it doesn't improve the installation experience for Linux users. This is an impediment to adoption of arrow not only by users but also by package maintainers who might want to depend on arrow. 

macOS and Windows have a better experience because at installation time, the configure scripts download and statically link a prebuilt C++ library. CRAN bundles the whole thing up and delivers that as a binary R package. 

Python wheels do a similar thing: they're binaries that contain all external dependencies. And there are pyarrow wheels for Linux. This suggests that we could do something similar for R: build a generic Linux binary of the C++ library and download it in the R package configure script at install time.

I experimented with using the Arrow C++ binaries included in the Python wheels in R. See discussion at the end of ARROW-5956. This worked on macOS (not useful for R, but it proved the concept) and almost worked on Linux, but it turned out that the "manylinux2010" standard is too archaic to work with contemporary Rcpp. 

Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, just with slightly more modern compiler/settings. Publish that C++ binary package to bintray. Then download it in the R configure script if a local/system package isn't found.

Once we have a basic version working, test against various distros on [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere and/or ensure the current fallback behavior when we encounter a distro that this doesn't work for. If necessary, we can make multiple flavors of this C++ binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [jira] [Created] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

Posted by Wes McKinney <we...@gmail.com>.
hi Thomas -- can you reply on the JIRA (ARROW-6793) or start a new
thread? Thanks

On Fri, Oct 4, 2019 at 4:53 PM Thomas S <th...@gmail.com> wrote:
>
> Very recently i had the pleasure to install arrow on Linux. At this stage
> let me first remark that without the help of @xhochy and @kou I certainly
> would have failed. I have now managed to install(? still quite a lot of
> warning messages) in a rocker container. I have published the docker-image
> here:
>
> https://hub.docker.com/r/tschm/rocker-arrow
>
> Maybe one of the experts could fix and/or improve it? Many thanks
>
> Thomas
>
>
>
> On Fri, 4 Oct 2019 at 20:07, Neal Richardson (Jira) <ji...@apache.org> wrote:
>
> > Neal Richardson created ARROW-6793:
> > --------------------------------------
> >
> >              Summary: [R] Arrow C++ binary packaging for Linux
> >                  Key: ARROW-6793
> >                  URL: https://issues.apache.org/jira/browse/ARROW-6793
> >              Project: Apache Arrow
> >           Issue Type: Improvement
> >           Components: R
> >             Reporter: Neal Richardson
> >             Assignee: Neal Richardson
> >              Fix For: 1.0.0
> >
> >
> > Our current installation experience on Linux isn't ideal. Unless you've
> > already installed the Arrow C++ library, when you install the R package,
> > you get a shell that tells you to install the C++ library. That was a
> > useful approach to allow us to get the package on CRAN, which makes it easy
> > for macOS and Windows users to install, but it doesn't improve the
> > installation experience for Linux users. This is an impediment to adoption
> > of arrow not only by users but also by package maintainers who might want
> > to depend on arrow.
> >
> > macOS and Windows have a better experience because at installation time,
> > the configure scripts download and statically link a prebuilt C++ library.
> > CRAN bundles the whole thing up and delivers that as a binary R package.
> >
> > Python wheels do a similar thing: they're binaries that contain all
> > external dependencies. And there are pyarrow wheels for Linux. This
> > suggests that we could do something similar for R: build a generic Linux
> > binary of the C++ library and download it in the R package configure script
> > at install time.
> >
> > I experimented with using the Arrow C++ binaries included in the Python
> > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS
> > (not useful for R, but it proved the concept) and almost worked on Linux,
> > but it turned out that the "manylinux2010" standard is too archaic to work
> > with contemporary Rcpp.
> >
> > Proposal: do a similar workflow to what the manylinux2010 pyarrow build
> > does, just with slightly more modern compiler/settings. Publish that C++
> > binary package to bintray. Then download it in the R configure script if a
> > local/system package isn't found.
> >
> > Once we have a basic version working, test against various distros on
> > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid
> > everywhere and/or ensure the current fallback behavior when we encounter a
> > distro that this doesn't work for. If necessary, we can make multiple
> > flavors of this C++ binary for debian, centos, etc.
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>
>
> --
> Dr. Thomas Schmelzer
> *post: *Rue Louis-de-Savoie 60, 1110 Morges, Switzerland
> *mobile:* +41 786 928 942
> *skype: *thomas.schmelzer

Re: [jira] [Created] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

Posted by Thomas S <th...@gmail.com>.
Very recently i had the pleasure to install arrow on Linux. At this stage
let me first remark that without the help of @xhochy and @kou I certainly
would have failed. I have now managed to install(? still quite a lot of
warning messages) in a rocker container. I have published the docker-image
here:

https://hub.docker.com/r/tschm/rocker-arrow

Maybe one of the experts could fix and/or improve it? Many thanks

Thomas



On Fri, 4 Oct 2019 at 20:07, Neal Richardson (Jira) <ji...@apache.org> wrote:

> Neal Richardson created ARROW-6793:
> --------------------------------------
>
>              Summary: [R] Arrow C++ binary packaging for Linux
>                  Key: ARROW-6793
>                  URL: https://issues.apache.org/jira/browse/ARROW-6793
>              Project: Apache Arrow
>           Issue Type: Improvement
>           Components: R
>             Reporter: Neal Richardson
>             Assignee: Neal Richardson
>              Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've
> already installed the Arrow C++ library, when you install the R package,
> you get a shell that tells you to install the C++ library. That was a
> useful approach to allow us to get the package on CRAN, which makes it easy
> for macOS and Windows users to install, but it doesn't improve the
> installation experience for Linux users. This is an impediment to adoption
> of arrow not only by users but also by package maintainers who might want
> to depend on arrow.
>
> macOS and Windows have a better experience because at installation time,
> the configure scripts download and statically link a prebuilt C++ library.
> CRAN bundles the whole thing up and delivers that as a binary R package.
>
> Python wheels do a similar thing: they're binaries that contain all
> external dependencies. And there are pyarrow wheels for Linux. This
> suggests that we could do something similar for R: build a generic Linux
> binary of the C++ library and download it in the R package configure script
> at install time.
>
> I experimented with using the Arrow C++ binaries included in the Python
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS
> (not useful for R, but it proved the concept) and almost worked on Linux,
> but it turned out that the "manylinux2010" standard is too archaic to work
> with contemporary Rcpp.
>
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build
> does, just with slightly more modern compiler/settings. Publish that C++
> binary package to bintray. Then download it in the R configure script if a
> local/system package isn't found.
>
> Once we have a basic version working, test against various distros on
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid
> everywhere and/or ensure the current fallback behavior when we encounter a
> distro that this doesn't work for. If necessary, we can make multiple
> flavors of this C++ binary for debian, centos, etc.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


-- 
Dr. Thomas Schmelzer
*post: *Rue Louis-de-Savoie 60, 1110 Morges, Switzerland
*mobile:* +41 786 928 942
*skype: *thomas.schmelzer