You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jeffrey Wong (JIRA)" <ji...@apache.org> on 2019/01/22 05:50:00 UTC
[jira] [Commented] (ARROW-4316) Reusing arrow.so for both Python
and R
[ https://issues.apache.org/jira/browse/ARROW-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748403#comment-16748403 ]
Jeffrey Wong commented on ARROW-4316:
-------------------------------------
I just built arrow 0.12.0 from github using the tag apache-arrow-0.12.0. I compiled arrow with the flags
cmake .. -DCMAKE_BUILD_TYPE=Release -DARROW_PARQUET=ON -DARROW_PYTHON=ON
make install
which produces the .so files in /usr/local/lib. I noticed that the libarrow.so file is 8.3 Mb, but the one that is shipped with pyarrow is 9.7 Mb. What else is inside pyarrow's .so files? When I link against the .so's from github (and remove the command
sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.infrom my script) the R package will build successfully. To debug further, I think I need to know the difference between the .so files in pyarrow and the .so files that are built from source
> Reusing arrow.so for both Python and R
> --------------------------------------
>
> Key: ARROW-4316
> URL: https://issues.apache.org/jira/browse/ARROW-4316
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python, R
> Affects Versions: 0.12.0
> Environment: Ubuntu 16.04, R 3.4.4, pyarrow 0.12, cmake 3.12
> Reporter: Jeffrey Wong
> Priority: Major
>
> My team uses both pyarrow and R arrow, we'd like both libraries to link to the same arrow.so file for consistency. pyarrow ships both arrow.so and parquet.so, if I can reuse those .so's to link R that would guarantee consistency.
> Under arrow v0.11.1 I was able to link R against libarrow.so found under pyarrow by passing LIB_DIR to the R [configure file|https://github.com/apache/arrow/blob/master/r/configure]. However, in v0.12.0 I am no longer able to do that. Here is a reproducible example on Ubuntu 16.04 which produces the error:
>
> {code:java}
> sh: line 1: 5404 Segmentation fault (core dumped) '/usr/lib/R/bin/R' --no-save --slave 2>&1 < '/tmp/RtmpyOuz4g/file14716feda8fc'
> *** caught segfault ***
> address 0x7f160f026250, cause 'invalid permissions'
> An irrecoverable exception occurred. R is aborting now ...
> {code}
>
> Reproducible example:
> {code:java}
> # get the parquet headers which are not shipped with pyarrow
>
> tee /etc/apt/sources.list.d/apache-arrow.list <<APT_LINE
> deb [arch=amd64] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
> deb-src [] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
> APT_LINE
> apt-get update
> mkdir /tmp/arrow_headers; cd /tmp/arrow_headers
> apt-get download --allow-unauthenticated libparquet-dev
> ar -x libparquet-dev_0.12.0-1_amd64.deb
> tar -xJvf data.tar.xz
>
> #get pyarrow v0.12
>
> pip3 install pyarrow --upgrade
> #figure out where pyarrow is
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> PYTHON_LIBDIR=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
>
> # pyarrow doesn't ship parquet headers. Copy the ones from apt into the pyarrow dir
> mkdir $PY_ARROW_PATH/include/parquet
> cp -r /tmp/arrow_headers/usr/include/parquet/* $PY_ARROW_PATH/include/parquet/
>
> #install R arrow
> echo "export LD_LIBRARY_PATH=\"\${LD_LIBRARY_PATH}:${PYTHON_LIBDIR}:${PY_ARROW_PATH}\"" | tee -a /usr/lib/R/etc/ldpaths
> git clone https://github.com/apache/arrow.git /tmp/arrow
> cd /tmp/arrow/r
> git checkout "apache-arrow-${PY_ARROW_VERSION}"
> sed -i "/Depends: R/c\Depends: R (>= 3.4)" DESCRIPTION
> sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.in
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH" {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)