You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kouhei Sutou (Jira)" <ji...@apache.org> on 2021/05/01 21:53:00 UTC

[jira] [Closed] (ARROW-12585) [Packaging][C++][Python] Published apt packages incompatible with pip binary wheels

     [ https://issues.apache.org/jira/browse/ARROW-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kouhei Sutou closed ARROW-12585.
--------------------------------
    Resolution: Not A Problem

Thanks for sharing your try.
I close this.

> [Packaging][C++][Python] Published apt packages incompatible with pip binary wheels
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-12585
>                 URL: https://issues.apache.org/jira/browse/ARROW-12585
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Packaging, Python
>    Affects Versions: 4.0.0
>            Reporter: Rodrigo Tobar
>            Priority: Major
>         Attachments: example.tar.gz
>
>
> We have a shared library that uses the shared {{libarrow}} and {{libplasma}} plasma libraries. Our shared library is then eventually loaded by a python process where we use also {{pyarrow}}. To avoid compilation of arrow/plasma we are installing the {{libarrow-dev}} and {{libplasma-dev}} apt packages (as per the official [instructions|https://arrow.apache.org/install/]) and the binary wheel of {{pyarrow}}.
> Each method brings its own copy of {{libarrow.so.400}}, and it turns out the two libraries are not equal: the library contained within {{pyarrow}} is compiled most probably with an older gcc version than that installed via apt, which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't have any visible effects, except that {{std::string}} is used (and maybe more affected types) in some arrow API points. The difference in the ABI used to compile {{libarrow.so.400}} eventually means they contain differently named symbols. 
> Back to our shared library, we load it in a python process. When this happens, and if the {{pyarrow}} has already been imported, then *its* copy of {{libarrow.so.400}} is already in memory, and loading our shared library doesn't load the "apt" copy of {{libarrow.so.400}}. This means our library doesn't trigger the loading of the copy of {{libarrow.so.400}} that it was compiled against, and if our library refers to one of the symbols that has changed name then it fails to load due to this missing symbol.
> I've attached a fairly minimal example: a Dockerfile prepares a system with libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles a shared library against libarrow-dev. The command ran by default by the container is a small test that runs python and loads the example shared library, both with and without loading pyarrow first. When pyarrow is loaded first then a missing symbol error happens and the shared library fails to load.
> I've experienced this in an Ubuntu-based linux distro and against Arrow 4.0.0, but I'd assume this happens in other distros and versions.
> The workaround we are using at the moment is simple: we are installing a pyarrow version that is different from the arrow version installed via apt. We are lucky we can run in this mixed-version, multiple-libraries-loaded scenario, but it might not be for everyone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)