You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Rodrigo Tobar (Jira)" <ji...@apache.org> on 2021/04/28 13:42:00 UTC

[jira] [Created] (ARROW-12585) Published apt packages incompatible with pip binary wheels

Rodrigo Tobar created ARROW-12585:
-------------------------------------

             Summary: Published apt packages incompatible with pip binary wheels
                 Key: ARROW-12585
                 URL: https://issues.apache.org/jira/browse/ARROW-12585
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Packaging, Python
    Affects Versions: 4.0.0
            Reporter: Rodrigo Tobar


We have a shared library that uses the shared {{libarrow}} and {{libplasma}} plasma libraries. Our shared library is then eventually loaded by a python process where we use also {{pyarrow}}. To avoid compilation of arrow/plasma we are installing the {{libarrow-dev}} and {{libplasma-dev}} apt packages (as per the official [instructions|https://arrow.apache.org/install/]) and the binary wheel of {{pyarrow}}.

Each method brings its own copy of {{libarrow.so.400}}, and it turns out the two libraries are not equal: the library contained within {{pyarrow}} is compiled most probably with an older gcc version than that installed via apt, which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't have any visible effects, except that {{std::string}} is used (and maybe more affected types) in some arrow API points. The difference in the ABI used to compile {{libarrow.so.400}} eventually means they contain differently named symbols. 

Back to our shared library, we load it in a python process. When this happens, and if the {{pyarrow}} has already been imported, then *its* copy of {{libarrow.so.400}} is already in memory, and loading our shared library doesn't load the "apt" copy of {{libarrow.so.400}}. This means our library doesn't trigger the loading of the copy of {{libarrow.so.400}} that it was compiled against, and if our library refers to one of the symbols that has changed name then it fails to load due to this missing symbol.

I've attached a fairly minimal example: a Dockerfile prepares a system with libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles a shared library against libarrow-dev. The command ran by default by the container is a small test that runs python and loads the example shared library, both with and without loading pyarrow first. When pyarrow is loaded first then a missing symbol error happens and the shared library fails to load.

I've experienced this in an Ubuntu-based linux distro and against Arrow 4.0.0, but I'd assume this happens in other distros and versions.

The workaround we are using at the moment is simple: we are installing a pyarrow version that is different from the arrow version installed via apt. We are lucky we can run in this mixed-version, multiple-libraries-loaded scenario, but it might not be for everyone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)