You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "raulcd (via GitHub)" <gi...@apache.org> on 2024/03/22 16:35:11 UTC

[I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

raulcd opened a new issue, #40749:
URL: https://github.com/apache/arrow/issues/40749

   ### Describe the enhancement requested
   
   There has been some effort in order to reduce pyarrow size and there are some issues opened in order to split pyarrow wheels in order to have `pyarrow-core`, `pyarrow-all`, etcetera.
   
   There seems to be the possibility to strip unnecessary symbols via `strip --discard-all`. The `libarrow.so` file seems to be reduced from 61MB to 45 MB.
   
   Another example:
   ```
   -rwxrwxr-x 1 user group  49M Feb 22 18:55 libarrow.so.1600.0.0
   
   $ strip --strip-unneeded libarrow.so.1600.0.0
   
   -rwxrwxr-x 1 user group  33M Mar 12 15:27 libarrow.so.1600.0.0
   ```
   This issue is to investigate the possibility of using string and the possible disadvantages.
   
   ### Component(s)
   
   Packaging, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #40749:
URL: https://github.com/apache/arrow/issues/40749#issuecomment-2045843515

   A user-reported stack trace would be possible with `--strip-debug`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #40749:
URL: https://github.com/apache/arrow/issues/40749#issuecomment-2045699345

   For pyarrow, there are also nightly wheel builds as well, correct? Might it also be an option to keep the debug symbols in those (so that there is a route to getting a user-reported stack trace) but strip them from the wheel most users get with a default `pip`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #40749:
URL: https://github.com/apache/arrow/issues/40749#issuecomment-2017118850

   Can we check whether backtrace on crash is still available with `strip --discard-all`/`--strip-unneeded`? (We may need to use `gdb` for it.) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #40749:
URL: https://github.com/apache/arrow/issues/40749#issuecomment-2045586715

   I've tried on the reproducer in https://github.com/apache/arrow/issues/38770 and only `--strip-debug` preserves the full backtrace including non-public functions:
   ```
   [...]
   #7  0x00007ffff310b277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
   #8  0x00007ffff310b4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
   #9  0x00007ffff373737b in std::__throw_bad_variant_access(char const*) () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #10 0x00007ffff3737399 in std::__throw_bad_variant_access(bool) () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #11 0x00007ffff37960e7 in arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::vector<arrow::Datum, std::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const [clone .cold] ()
      from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #12 0x00007ffff42cc3e0 in arrow::compute::MetaFunction::Execute(std::vector<arrow::Datum, std::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   [...]
   ```
   
   More powerful strip options make the backtrace less useful because the symbols of non-public functions are removed:
   ```
   [...]
   #7  0x00007ffff310b277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
   #8  0x00007ffff310b4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
   #9  0x00007ffff373737b in ?? () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #10 0x00007ffff3737399 in ?? () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #11 0x00007ffff37960e7 in ?? () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   #12 0x00007ffff42cc3e0 in arrow::compute::MetaFunction::Execute(std::vector<arrow::Datum, std::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const () from /home/antoine/t/venv-3.10/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
   [...]
   ```
   
   This is on the PyArrow 15.0.2 wheels. The savings are still significant: from 61MB (original) to 54MB (stripped) for `libarrow.so`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python][Packaging] Strip unnecessary symbols from libarrow.so to reduce wheel package size [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #40749:
URL: https://github.com/apache/arrow/issues/40749#issuecomment-2045545534

   Potentially related issue with some useful content (but specifically for the wheels, so maybe mostly relevant for our cython code?): https://github.com/pypa/cibuildwheel/issues/331


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org