You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Michael Marino (Jira)" <ji...@apache.org> on 2020/02/06 12:33:00 UTC

[jira] [Commented] (ARROW-5158) [Packaging][Wheel] Symlink libraries in wheels

    [ https://issues.apache.org/jira/browse/ARROW-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031544#comment-17031544 ] 

Michael Marino commented on ARROW-5158:
---------------------------------------

Hi Wes, thanks for the response.  Indeed, I understand the issue and that this isn't a critical part of the immediate timeline.  We currently work around this, and so it is not yet critical for us, but, especially with AWS pushing serverless for handling data workflows, I do expect this to become an issue for us and for others sometime soon. 

 

I personally have started looking at some possible solutions and will try to submit a PR here, but I would need some guidance as to the external requirements of the package.  Given the conversation about this [here|https://discuss.python.org/t/symbolic-links-in-wheels/1945/5], it sounds like the libraries are packaged in such a way so as to be usable by other tools (e.g. pyspark?).  If this is *not* the case, then I would focus on trying to update how the library is loaded from within pyarrow itself to handle the case when the library is coming from within the wheel.  

 

 

> [Packaging][Wheel] Symlink libraries in wheels
> ----------------------------------------------
>
>                 Key: ARROW-5158
>                 URL: https://issues.apache.org/jira/browse/ARROW-5158
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Packaging, Python
>            Reporter: Krisztian Szucs
>            Priority: Major
>              Labels: wheel
>
> Libraries are copied instead of symlinking in linux and osx wheels, which result quiet big binaries:
>  
> This is what the wheel contains before running auditwheel:
>  
> {code}
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
> -rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
> -rwxr-xr-x  1 root root 2.4M Apr  3 09:02 [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14
>  {code}
> After running auditwheel, the repaired wheel contains:
>  
> {code}
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x  1 root root 1.6M Apr  3 09:55 libarrow_python.so
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
> -rwxr-xr-x  1 root root  12M Apr  3 09:55 libarrow.so
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
> -rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
> -rwxr-xr-x  1 root root 2.5M Apr  3 09:55 [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
> -rwxr-xr-x  1 root root  59M Apr  3 09:55 libgandiva.so
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
> -rwxr-xr-x  1 root root 3.5M Apr  3 09:55 libparquet.so
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
> -rwxr-xr-x  1 root root 345K Apr  3 09:55 libplasma.so
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14
> {code}
>  
> Here is the output of auditwheel [https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340]
> They should be symlinks, we have special code for this: https://github.com/apache/arrow/blob/4495305092411e8551c60341e273c8aa3c14b282/python/setup.py#L489-L499 This is probably not going into the wheel as wheels are zip-files and they don't support symlinks by default. So we probably need to pass the `--symlinks` parameter to the wheel code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)