You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yue Ni (Jira)" <ji...@apache.org> on 2022/10/05 14:53:00 UTC

[jira] [Comment Edited] (ARROW-16340) [C++][Python] Move all Python related code into PyArrow

    [ https://issues.apache.org/jira/browse/ARROW-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613028#comment-17613028 ] 

Yue Ni edited comment on ARROW-16340 at 10/5/22 2:52 PM:
---------------------------------------------------------

[~alenka]  [~jorisvandenbossche] Thanks for the help.

> it will still be installed by the python package, only not in the standard "include" directory

Where is the header is expected to be installed? Is it expected to be installed by pyarrow's python wheel to some where and I have to add this path to my compiler's include path?

In my C++ project, I use vcpkg to manage dependency. And one of the module is a python binding for the C++ library, where I use pyarrow's C++ API like `arrow::py::wrap_table` together with pybind11 to create the python binding. Since I use vcpkg to manage dependency, I expect all C++ dependencies available via vcpkg.

1) Previously pyarrow C++ is part of vcpkg arrow port, and I can use CMake's `find_library(arrow_python)` to find the library

2) and use `find_path(arrow/python/pyarrow.h)` to find the path to the include directory

What I find for the latest arrow version:
1) the `libarrow_python.a` is not built, even if I set ARROW_PYTHON CMake option to `ON`. 

2) the `arrow/python/pyarrow.h` cannot be found in `include` directory after building the C++ library (at least using vcpkg arrow port)

I went through most of the comments in PR for this issue [1] and read this ARROW_PYTHON option issue [2] as well, and the current behavior seems to be the expected behavior. The `python` directory will NOT be built even if ARROW_PYTHON=ON.

I am not sure what the recommended approach for using pyarrow in C++. According to the document here [3], to make the build automated, it seems these are the steps:

1) install pyarrow package

2) launch python, run `pyarrow.get_include()` to get the `include` directory

3) add the `include` directory to compiler's include search path (probably via CMake)

4) where is it expected to find the `libarrow_python` so that CMake can find and use it for link?

5) build

Is this the recommended approach for doing this? I am not quite sure step #4, any comments on this?

I can think of another approach, which is creating another vcpkg port like `arrow_python` and build the `arrow_python` library explicitly, so that projects can use this port for such purpose. Is this a recommended approach after this issue? Thanks.

 

[1] [https://github.com/apache/arrow/pull/13311]

[2]https://issues.apache.org/jira/browse/ARROW-17868

[3]Using pyarrow from C++ and Cython Code, [https://arrow.apache.org/docs/dev/python/integration/extending.html#c-api]

 


was (Author: niyue):
[~alenka]  [~jorisvandenbossche] Thanks for the help.

> it will still be installed by the python package, only not in the standard "include" directory

Where is the header is expected to be installed? Is it expected to be installed by pyarrow's python wheel to some where and I have to add this path to my compiler's include path?

In my C++ project, I use vcpkg to manage dependency. And one of the module is a python binding for the C++ library, where I use pyarrow's C++ API like `arrow::py::wrap_table` together with pybind11 to create the python binding. Since I use vcpkg to manage dependency, I expect all C++ dependencies available via vcpkg.

1) Previously pyarrow C++ is part of vcpkg arrow port, and I can use CMake's `find_library(arrow_python)` to find the library

2) and use `find_path(arrow/python/pyarrow.h)` to find the path to the include directory

What I find for the latest arrow version:
1) the `libarrow_python.a` is not built, even if I set ARROW_PYTHON CMake option to `ON`. 

2) the `arrow/python/pyarrow.h` cannot be found in `include` directory after building the C++ library (at least using vcpkg arrow port)

I went through most of the comments in PR for this issue [1] and read this ARROW_PYTHON option issue [2] as well, and this seems to be the expected behavior. The `python` directory will NOT be built even if ARROW_PYTHON=ON.

I am not sure what the recommended approach for using pyarrow in C++. According to the document here [3], to make the build automated, it seems these are the steps:

1) install pyarrow package

2) launch python, run `pyarrow.get_include()` to get the `include` directory

3) add the `include` directory to compiler's include search path (probably via CMake)

4) where is it expected to find the `libarrow_python` so that CMake can find and use it for link?

5) build

Is this the recommended approach for doing this? I am not quite sure step #4, any comments on this?

I can think of another approach, which is creating another vcpkg port like `arrow_python` and build the `arrow_python` library explicitly, so that projects can use this port for such purpose. Is this a recommended approach after this issue? Thanks.

 

[1] [https://github.com/apache/arrow/pull/13311]

[2]https://issues.apache.org/jira/browse/ARROW-17868

[3]Using pyarrow from C++ and Cython Code, [https://arrow.apache.org/docs/dev/python/integration/extending.html#c-api]

 

> [C++][Python] Move all Python related code into PyArrow
> -------------------------------------------------------
>
>                 Key: ARROW-16340
>                 URL: https://issues.apache.org/jira/browse/ARROW-16340
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Alenka Frim
>            Assignee: Alenka Frim
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 33h 10m
>  Remaining Estimate: 0h
>
> Move {{src/arrow/python}} directory into {{pyarrow}} and arrange PyArrow to build it.
> More details can be found on this thread:
> https://lists.apache.org/thread/jbxyldhqff4p9z53whhs95y4jcomdgd2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)