You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jeffrey Wong (JIRA)" <ji...@apache.org> on 2019/01/22 04:36:00 UTC

[jira] [Updated] (ARROW-4316) Reusing arrow.so for both Python and R

     [ https://issues.apache.org/jira/browse/ARROW-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Wong updated ARROW-4316:
--------------------------------
    Description: 
My team uses both pyarrow and R arrow, we'd like both libraries to link to the same arrow.so file for consistency. pyarrow ships both arrow.so and parquet.so, if I can reuse those .so's to  link R that would guarantee consistency. 
 Under arrow v0.11.1 I was able to link R against libarrow.so found under pyarrow by passing LIB_DIR to the R [configure file|https://github.com/apache/arrow/blob/master/r/configure]. However, in v0.12.0 I am no longer able to do that. Here is a reproducible example on Ubuntu 16.04 which produces the error:
 Reproducible example:
{code:java}
 # get the parquet headers which are not shipped with pyarrow
  
 tee /etc/apt/sources.list.d/apache-arrow.list <<APT_LINE
 deb [arch=amd64] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
 deb-src [] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
 APT_LINE
 apt-get update
 mkdir /tmp/arrow_headers; cd/tmp/arrow_headers
 apt-get download --allow-unauthenticated libparquet-dev
 ar -x libparquet-dev_0.12.0-1_amd64.deb
 tar -xJvf data.tar.xz
  
 #get pyarrow v0.12
  
 pip3 install pyarrow --upgrade
 #figure out where pyarrow is
 PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow._file_))")
 PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow._version_)")
 PYTHON_LIBDIR=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
  

 # pyarrow doesn't ship parquet headers. Copy the ones from apt into the pyarrow dir
 mkdir $PY_ARROW_PATH/include/parquet
 cp -r /tmp/arrow_headers/usr/include/parquet/* $PY_ARROW_PATH/include/parquet/
  
 #install R arrow
 echo export LD_LIBRARY_PATH=\"\${LD_LIBRARY_PATH}:${PYTHON_LIBDIR}:${PY_ARROW_PATH}\"" | tee -a /usr/lib/R/etc/ldpaths
 git clone https://github.com/apache/arrow.git /tmp/arrow
 cd /tmp/arrow/r
 git checkout "apache-arrow-${PY_ARROW_VERSION}"
 sed -i "/Depends: R/c\Depends: R (>= 3.4)" DESCRIPTION
 sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.in
 R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH" {code}

  was:
My team uses both pyarrow and R arrow, we'd like both libraries to link to the same arrow.so file for consistency. pyarrow ships both arrow.so and parquet.so, if I can reuse those .so's to  link R that would guarantee consistency. 
Under arrow v0.11.1 I was able to link R against libarrow.so found under pyarrow by passing LIB_DIR to the R [configure file|https://github.com/apache/arrow/blob/master/r/configure]. However, in v0.12.0 I am no longer able to do that. Here is a reproducible example on Ubuntu 16.04 which produces the error:
Reproducible example:

# get the parquet headers which are not shipped with pyarrow
 
tee /etc/apt/sources.list.d/apache-arrow.list <<APT_LINE
deb [arch=amd64] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
deb-src [] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
APT_LINE
apt-get update
mkdir /tmp/arrow_headers; cd/tmp/arrow_headers
apt-get download --allow-unauthenticated libparquet-dev
ar -x libparquet-dev_0.12.0-1_amd64.deb
tar -xJvf data.tar.xz
 
#get pyarrow v0.12
 
pip3 install pyarrow --upgrade

#figure out where pyarrow is

PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow.__file__))")
PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
PYTHON_LIBDIR=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
 
# pyarrow doesn't ship parquet headers. Copy the ones from apt into the pyarrow dir
mkdir $PY_ARROW_PATH/include/parquet
cp -r /tmp/arrow_headers/usr/include/parquet/* $PY_ARROW_PATH/include/parquet/
 
#install R arrow
echo export LD_LIBRARY_PATH=\"\${LD_LIBRARY_PATH}:${PYTHON_LIBDIR}:${PY_ARROW_PATH}\"" | tee -a /usr/lib/R/etc/ldpaths
git clone https://github.com/apache/arrow.git /tmp/arrow
cd /tmp/arrow/r
git checkout "apache-arrow-${PY_ARROW_VERSION}"
sed -i "/Depends: R/c\Depends: R (>= 3.4)" DESCRIPTION
sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.in
R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH"


> Reusing arrow.so for both Python and R
> --------------------------------------
>
>                 Key: ARROW-4316
>                 URL: https://issues.apache.org/jira/browse/ARROW-4316
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 0.12.0
>         Environment: Ubuntu 16.04, R 3.4.4, pyarrow 0.12, cmake 3.12
>            Reporter: Jeffrey Wong
>            Priority: Major
>
> My team uses both pyarrow and R arrow, we'd like both libraries to link to the same arrow.so file for consistency. pyarrow ships both arrow.so and parquet.so, if I can reuse those .so's to  link R that would guarantee consistency. 
>  Under arrow v0.11.1 I was able to link R against libarrow.so found under pyarrow by passing LIB_DIR to the R [configure file|https://github.com/apache/arrow/blob/master/r/configure]. However, in v0.12.0 I am no longer able to do that. Here is a reproducible example on Ubuntu 16.04 which produces the error:
>  Reproducible example:
> {code:java}
>  # get the parquet headers which are not shipped with pyarrow
>   
>  tee /etc/apt/sources.list.d/apache-arrow.list <<APT_LINE
>  deb [arch=amd64] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
>  deb-src [] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
>  APT_LINE
>  apt-get update
>  mkdir /tmp/arrow_headers; cd/tmp/arrow_headers
>  apt-get download --allow-unauthenticated libparquet-dev
>  ar -x libparquet-dev_0.12.0-1_amd64.deb
>  tar -xJvf data.tar.xz
>   
>  #get pyarrow v0.12
>   
>  pip3 install pyarrow --upgrade
>  #figure out where pyarrow is
>  PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow._file_))")
>  PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow._version_)")
>  PYTHON_LIBDIR=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
>   
>  # pyarrow doesn't ship parquet headers. Copy the ones from apt into the pyarrow dir
>  mkdir $PY_ARROW_PATH/include/parquet
>  cp -r /tmp/arrow_headers/usr/include/parquet/* $PY_ARROW_PATH/include/parquet/
>   
>  #install R arrow
>  echo export LD_LIBRARY_PATH=\"\${LD_LIBRARY_PATH}:${PYTHON_LIBDIR}:${PY_ARROW_PATH}\"" | tee -a /usr/lib/R/etc/ldpaths
>  git clone https://github.com/apache/arrow.git /tmp/arrow
>  cd /tmp/arrow/r
>  git checkout "apache-arrow-${PY_ARROW_VERSION}"
>  sed -i "/Depends: R/c\Depends: R (>= 3.4)" DESCRIPTION
>  sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.in
>  R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH" {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)