You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2019/09/19 16:57:00 UTC

[jira] [Commented] (ARROW-5956) [R] Ability for R to link to C++ libraries from pyarrow Wheel

    [ https://issues.apache.org/jira/browse/ARROW-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933587#comment-16933587 ] 

Neal Richardson commented on ARROW-5956:
----------------------------------------

It turns out that you can't (safely) use the pyarrow wheel in R on Linux, though you can on macOS. See discussion starting around here: https://github.com/apache/arrow/pull/5408#issuecomment-532438681)  

You can fix the dyn.load error you originally reported by setting {{ARROW_USE_OLD_CXXABI=1}} (https://github.com/apache/arrow/blob/master/r/configure#L99-L102). That lets the package install and load. But then any C++ error status that leads to {{Rcpp::stop()}} being called will cause a core dump. Our analysis led us to conclude that the problem is a mismatch between the standard library versions in the (dated) manylinux2010 wheel build and the more contemporary one used on the host OS and by Rcpp there. Apparently Rcpp relies on more modern C++ conventions than are used in Python. 

So, it seems that right now, if you want to use the same .so for Python and R on Linux, your options are:

1. Install the C++ library system packages and install pyarrow and the R package from source locally, linking to that;
2. Build everything locally;
3. Use conda

Once manylinux2014 happens, maybe the wheels will be more suitable and we can try again.

> [R] Ability for R to link to C++ libraries from pyarrow Wheel
> -------------------------------------------------------------
>
>                 Key: ARROW-5956
>                 URL: https://issues.apache.org/jira/browse/ARROW-5956
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>         Environment: Ubuntu 16.04, R 3.4.4, python 3.6.5
>            Reporter: Jeffrey Wong
>            Priority: Major
>
> I have installed pyarrow 0.14.0 and want to be able to also use R arrow. In my work I use rpy2 a lot to exchange python data structures with R data structures, so would like R arrow to link against the exact same .so files found in pyarrow
>  
>  
> When I pass in include_dir and lib_dir to R's configure, pointing to pyarrow's include and pyarrow's root directories, I am able to compile R's arrow.so file. However, I am unable to load it in an R session, getting the error:
>  
> {code:java}
> > dyn.load('arrow.so')
> Error in dyn.load("arrow.so") :
>  unable to load shared object '/tmp/arrow2/r/src/arrow.so':
>  /tmp/arrow2/r/src/arrow.so: undefined symbol: _ZNK5arrow11StructArray14GetFieldByNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE{code}
>  
>  
> Steps to reproduce:
>  
> Install pyarrow, which also ships libarrow.so and libparquet.so
>  
> {code:java}
> pip3 install pyarrow --upgrade --user
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> ln -s $PY_ARROW_PATH/libarrow.so.14 $PY_ARROW_PATH/libarrow.so
> ln -s $PY_ARROW_PATH/libparquet.so.14 $PY_ARROW_PATH/libparquet.so
> {code}
>  
>  
> Add to LD_LIBRARY_PATH
>  
> {code:java}
> sudo tee -a /usr/lib/R/etc/ldpaths <<LINES
> LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> sudo tee -a /usr/lib/rstudio-server/bin/r-ldpath <<LINES
> LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> {code}
>  
>  
> Install r arrow from source
> {code:java}
> git clone https://github.com/apache/arrow.git /tmp/arrow2
> cd /tmp/arrow2/r
> git checkout tags/apache-arrow-0.14.0
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH"{code}
>  
> I have noticed that the R package for arrow no longer has an RcppExports, but instead an arrowExports. Could it be that the lack of RcppExports has made it difficult to find GetFieldByName?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)