You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/10/08 02:21:00 UTC

[jira] [Updated] (ARROW-8518) [Python] Create tools to enable optional components (like Gandiva, Flight) to be built and deployed as separate Python packages

     [ https://issues.apache.org/jira/browse/ARROW-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-8518:
----------------------------------
    Labels: pull-request-available  (was: )

> [Python] Create tools to enable optional components (like Gandiva, Flight) to be built and deployed as separate Python packages
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8518
>                 URL: https://issues.apache.org/jira/browse/ARROW-8518
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Packaging, Python
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Our current monolithic approach to Python packaging isn't likely to be sustainable long-term.
> At a high level, I would propose a structure like this:
> {code}
> pip install pyarrow  # core package containing libarrow, libarrow_python, and any other common bundled C++ library dependencies
> pip install pyarrow-flight  # installs pyarrow, pyarrow_flight
> pip install pyarrow-gandiva # installs pyarrow, pyarrow_gandiva
> {code}
> We can maintain the semantic appearance of a single {{pyarrow}} package by having thin API modules that would look like
> {code}
> CONTENTS OF pyarrow/flight.py
> from pyarrow_flight import *
> {code}
> Obviously, this is more difficult to build and package:
> * CMake and setup.py files must be refactored a bit so that we can reuse code between the parent and child packages
> * Separate conda and wheel packages must be produced. With conda this seems more straightforward but since the child wheels depend on the parent core wheel, the build process seems more complicated
> In any case, I don't think these challenges are insurmountable. This will have several benefits:
> * Smaller installation footprint for simple use cases (though note we are STILL duplicating shared libraries in the wheels, which is quite bad)
> * Less developer anxiety about expanding the scope of what Python code is shipped from apache/arrow. If in 5 years we are shipping 5 different Python wheels with each Apache Arrow release, that sounds completely fine to me. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)