You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/12/12 14:50:00 UTC

[jira] [Commented] (ARROW-17652) [R] R Arrow install fails at Thrift build step

    [ https://issues.apache.org/jira/browse/ARROW-17652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646132#comment-17646132 ] 

Dewey Dunnington commented on ARROW-17652:
------------------------------------------

ARROW-17967 Also reported problems with Thrift. Would it be possible to check to see if this issue still exists with Arrow 10.0.0?

> [R] R Arrow install fails at Thrift build step
> ----------------------------------------------
>
>                 Key: ARROW-17652
>                 URL: https://issues.apache.org/jira/browse/ARROW-17652
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 9.0.0
>         Environment: Amazon Linux 2, R 4.1, Arrow 9.0.0
>            Reporter: Adam Giles
>            Priority: Major
>         Attachments: error.log
>
>
> We use EC2 machines to read parquet datasets from S3. One user has a problem with getting this to work. The initial issue was trying to read parquet files from S3 failed, with what looked like an AWS credentials error:
> {code:R}
> Error: IOError: When getting information for key 'X' in bucket 'Y': AWS Error [code 15]: No response body.
> {code}
> The AWS CLI doesn't have a permissions problem, nor does the `pyarrow` package in Python, and neither do other packages in R (eg `paws`). I assumed it was to do with not having a full install of the Arrow package with S3 support. Trying a few different versions of reinstalling from source all failed at a stage building Thrift, which in turn seemed to be related to not finding Boost libraries -- see the attached error log.
> Usual approach we use, which worked on other machines
> {code:R}
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Trying to use an Arrow binary, I think probably not the right way, and it's just falling back to building from source.
> {code:R}
> Sys.setenv("LIBARROW_BINARY" = TRUE) # I'm not sure this is actually the correct use
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Following some advice in another issue I now can't find, I tried specifying the compilers
> {code:R}
> Sys.setenv(CC="/usr/bin/gcc")
> Sys.setenv(CXX="/usr/bin/g++")
> Sys.setenv(LIBARROW_MINIMAL="FALSE")
> Sys.setenv(LIBARROW_BINARY="FALSE")
> Sys.setenv(ARROW_R_DEV="TRUE")
> install.packages("arrow")
> {code}
> We've also tried getting the missing dependencies from `yum`:
> {code:sh}
> yum install -y libcurl-devel
> yum install -y openssl-devel
> yum install -y thrift
> yum install -y boost boost-thread boost-devel
> {code}
> But are still failing at the same point.
> If we install from the RSPM repo for Centos7 (We're running Amazon Linux 2), the package installs seemingly happily but we still have the AWS Error message when trying to read from an S3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)