You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Raúl Bocanegra Algarra <ra...@shapelets.io> on 2020/01/05 03:45:21 UTC

Using Pyarrow and C++ API

Hi!

I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.

Thanks for your work.

Best regards,

Raúl Bocanegra Algarra. C++ Software Engineer.


Re: Using Pyarrow and C++ API

Posted by Wes McKinney <we...@gmail.com>.
I missed that you have an application that embeds the Python
interpreter. That definitely makes things more complicated, I don't
know enough to be able to help with this further

On Wed, Jan 8, 2020 at 10:01 AM Raúl Bocanegra Algarra
<ra...@shapelets.io> wrote:
>
> Hi Wes,
>
> I am checking out tag 0.15.1 from arrow repo then build my own pyarrow to have the c++11 ABI as you recommended here: http://mail-archives.apache.org/mod_mbox/arrow-dev/201808.mbox/%3cCAJPUwMA2rF4ZqbLnRD0-bmfGf6ybj8n3Xi=HNKfEXs3DVc5PMw@mail.gmail.com%3e.
> So I do:
> $ export PYARROW_CMAKE_OPTIONS=-DCMAKE_TOOLCHAIN_FILE=/home/raul.bocanegra/Code/vcpkg/scripts/buildsystems/vcpkg.cmake
> $ export LD_LIBRARY_PATH=$VCPKG_ROOT/installed/x64-linux/lib
> $ python setup.py install
>
> I modified arrow/cpp/CMakeLists.txt the line: "find_package(Arrow REQUIRED)" to "find_package(arrow CONFIG REQUIRED)" for CMake to find the config scripts from vcpkg.
> This way as you say, I don't have the bundled arrow libs, but the vcpkg ones. If I run ldd on the python module _fs for example inside the pyarrow folder it links against the vcpkg arrow and arrow_python libs. But on runtime it throws the fatal python error.
>
> Anyway I am going to try building Arrow without vcpkg. I assume that the correct way to use Arrow C++ and Arrow Python C++ api is to tell CMake that arrow libs are in pyarrow pip installation, right? I mean, we must avoid linking libarrow.so on vcpkg folder or any other install folder and link libarrow_python.so from pyarrow pip installation. We must link both from pyarrow installation. Is that correct?
>
> Thanks for your support!
>
> Regards,
>
> Raúl Bocanegra Algarra
>
> Software Engineer
>
> M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
>
>
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Tuesday, January 7, 2020 6:28 PM
> To: user@arrow.apache.org <us...@arrow.apache.org>
> Cc: Sutou Kouhei <ko...@clear-code.com>
> Subject: Re: Using Pyarrow and C++ API
>
> How did you install pyarrow, with pip? It has its own C++ libraries
> bundled inside and so will surely conflict with the vcpkg libraries
>
> On Tue, Jan 7, 2020 at 11:18 AM Raúl Bocanegra Algarra
> <ra...@shapelets.io> wrote:
> >
> > Hi Wes,
> >
> > I just added "-DARROW_PYTHON=ON" on vcpkg/ports/arrow/portfile.cmake on the "vcpkg_configure_cmake" function call.
> > I also created an improvement for this in its repo: https://github.com/microsoft/vcpkg/issues/9350
> > Also edited its vcpkg/triplets/x64-linux.cmake as they recommend on the docs: https://vcpkg.readthedocs.io/en/latest/users/triplets/#per-port-customization
> > Just edited:
> > set(VCPKG_LIBRARY_LINKAGE static)
> > if(PORT MATCHES "arrow")
> >     set(VCPKG_LIBRARY_LINKAGE dynamic)
> > endif()
> >
> > Thanks for your work.
> >
> > Raúl.
> > ________________________________
> > From: Wes McKinney <we...@gmail.com>
> > Sent: Tuesday, January 7, 2020 6:06:21 PM
> > To: user@arrow.apache.org <us...@arrow.apache.org>
> > Cc: Sutou Kouhei <ko...@clear-code.com>
> > Subject: Re: Using Pyarrow and C++ API
> >
> > It doesn't seem like vcpkg should have libarrow_python available based on
> >
> > https://github.com/microsoft/vcpkg/tree/master/ports/arrow
> >
> > How are you installing pyarrow?
> >
> > On Tue, Jan 7, 2020 at 10:14 AM Raúl Bocanegra Algarra
> > <ra...@shapelets.io> wrote:
> > >
> > > Hi Sutou,
> > >
> > > Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
> > > 1.- In the python script that my embedded python interpreter runs I do:
> > > "from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
> > > 2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
> > > "Fatal Python error: PyThreadState_Get: no current thread".
> > > I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
> > > I don't know what to do next. Did any of you experienced something similar?
> > >
> > > Thanks!
> > >
> > >
> > > Raúl Bocanegra Algarra
> > >
> > > Software Engineer
> > >
> > > M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
> > >
> > >
> > >
> > > ________________________________
> > > From: Sutou Kouhei <ko...@clear-code.com>
> > > Sent: Sunday, January 5, 2020 10:29 PM
> > > To: user@arrow.apache.org <us...@arrow.apache.org>
> > > Subject: Re: Using Pyarrow and C++ API
> > >
> > > Hi,
> > >
> > > How about install pyarrow with "pip install --no-binary :all: pyarrow"?
> > > Then you will be able to build your pyarrow with your
> > > libarrow.so and libarrow_python.so.
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In
> > >  <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
> > >   "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
> > >   Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:
> > >
> > > > Hi!
> > > >
> > > > I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> > > >
> > > > Thanks for your work.
> > > >
> > > > Best regards,
> > > >
> > > > Raúl Bocanegra Algarra. C++ Software Engineer.
> > > >

Re: Using Pyarrow and C++ API

Posted by Raúl Bocanegra Algarra <ra...@shapelets.io>.
Hi Wes,

I am checking out tag 0.15.1 from arrow repo then build my own pyarrow to have the c++11 ABI as you recommended here: http://mail-archives.apache.org/mod_mbox/arrow-dev/201808.mbox/%3cCAJPUwMA2rF4ZqbLnRD0-bmfGf6ybj8n3Xi=HNKfEXs3DVc5PMw@mail.gmail.com%3e.
So I do:
$ export PYARROW_CMAKE_OPTIONS=-DCMAKE_TOOLCHAIN_FILE=/home/raul.bocanegra/Code/vcpkg/scripts/buildsystems/vcpkg.cmake
$ export LD_LIBRARY_PATH=$VCPKG_ROOT/installed/x64-linux/lib
$ python setup.py install

I modified arrow/cpp/CMakeLists.txt the line: "find_package(Arrow REQUIRED)" to "find_package(arrow CONFIG REQUIRED)" for CMake to find the config scripts from vcpkg.
This way as you say, I don't have the bundled arrow libs, but the vcpkg ones. If I run ldd on the python module _fs for example inside the pyarrow folder it links against the vcpkg arrow and arrow_python libs. But on runtime it throws the fatal python error.

Anyway I am going to try building Arrow without vcpkg. I assume that the correct way to use Arrow C++ and Arrow Python C++ api is to tell CMake that arrow libs are in pyarrow pip installation, right? I mean, we must avoid linking libarrow.so on vcpkg folder or any other install folder and link libarrow_python.so from pyarrow pip installation. We must link both from pyarrow installation. Is that correct?

Thanks for your support!

Regards,


Raúl Bocanegra Algarra

Software Engineer

M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io<ma...@shapelets.io>


________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Tuesday, January 7, 2020 6:28 PM
To: user@arrow.apache.org <us...@arrow.apache.org>
Cc: Sutou Kouhei <ko...@clear-code.com>
Subject: Re: Using Pyarrow and C++ API

How did you install pyarrow, with pip? It has its own C++ libraries
bundled inside and so will surely conflict with the vcpkg libraries

On Tue, Jan 7, 2020 at 11:18 AM Raúl Bocanegra Algarra
<ra...@shapelets.io> wrote:
>
> Hi Wes,
>
> I just added "-DARROW_PYTHON=ON" on vcpkg/ports/arrow/portfile.cmake on the "vcpkg_configure_cmake" function call.
> I also created an improvement for this in its repo: https://github.com/microsoft/vcpkg/issues/9350
> Also edited its vcpkg/triplets/x64-linux.cmake as they recommend on the docs: https://vcpkg.readthedocs.io/en/latest/users/triplets/#per-port-customization
> Just edited:
> set(VCPKG_LIBRARY_LINKAGE static)
> if(PORT MATCHES "arrow")
>     set(VCPKG_LIBRARY_LINKAGE dynamic)
> endif()
>
> Thanks for your work.
>
> Raúl.
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Tuesday, January 7, 2020 6:06:21 PM
> To: user@arrow.apache.org <us...@arrow.apache.org>
> Cc: Sutou Kouhei <ko...@clear-code.com>
> Subject: Re: Using Pyarrow and C++ API
>
> It doesn't seem like vcpkg should have libarrow_python available based on
>
> https://github.com/microsoft/vcpkg/tree/master/ports/arrow
>
> How are you installing pyarrow?
>
> On Tue, Jan 7, 2020 at 10:14 AM Raúl Bocanegra Algarra
> <ra...@shapelets.io> wrote:
> >
> > Hi Sutou,
> >
> > Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
> > 1.- In the python script that my embedded python interpreter runs I do:
> > "from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
> > 2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
> > "Fatal Python error: PyThreadState_Get: no current thread".
> > I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
> > I don't know what to do next. Did any of you experienced something similar?
> >
> > Thanks!
> >
> >
> > Raúl Bocanegra Algarra
> >
> > Software Engineer
> >
> > M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
> >
> >
> >
> > ________________________________
> > From: Sutou Kouhei <ko...@clear-code.com>
> > Sent: Sunday, January 5, 2020 10:29 PM
> > To: user@arrow.apache.org <us...@arrow.apache.org>
> > Subject: Re: Using Pyarrow and C++ API
> >
> > Hi,
> >
> > How about install pyarrow with "pip install --no-binary :all: pyarrow"?
> > Then you will be able to build your pyarrow with your
> > libarrow.so and libarrow_python.so.
> >
> > Thanks,
> > --
> > kou
> >
> > In
> >  <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
> >   "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
> >   Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:
> >
> > > Hi!
> > >
> > > I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> > >
> > > Thanks for your work.
> > >
> > > Best regards,
> > >
> > > Raúl Bocanegra Algarra. C++ Software Engineer.
> > >

Re: Using Pyarrow and C++ API

Posted by Wes McKinney <we...@gmail.com>.
How did you install pyarrow, with pip? It has its own C++ libraries
bundled inside and so will surely conflict with the vcpkg libraries

On Tue, Jan 7, 2020 at 11:18 AM Raúl Bocanegra Algarra
<ra...@shapelets.io> wrote:
>
> Hi Wes,
>
> I just added "-DARROW_PYTHON=ON" on vcpkg/ports/arrow/portfile.cmake on the "vcpkg_configure_cmake" function call.
> I also created an improvement for this in its repo: https://github.com/microsoft/vcpkg/issues/9350
> Also edited its vcpkg/triplets/x64-linux.cmake as they recommend on the docs: https://vcpkg.readthedocs.io/en/latest/users/triplets/#per-port-customization
> Just edited:
> set(VCPKG_LIBRARY_LINKAGE static)
> if(PORT MATCHES "arrow")
>     set(VCPKG_LIBRARY_LINKAGE dynamic)
> endif()
>
> Thanks for your work.
>
> Raúl.
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Tuesday, January 7, 2020 6:06:21 PM
> To: user@arrow.apache.org <us...@arrow.apache.org>
> Cc: Sutou Kouhei <ko...@clear-code.com>
> Subject: Re: Using Pyarrow and C++ API
>
> It doesn't seem like vcpkg should have libarrow_python available based on
>
> https://github.com/microsoft/vcpkg/tree/master/ports/arrow
>
> How are you installing pyarrow?
>
> On Tue, Jan 7, 2020 at 10:14 AM Raúl Bocanegra Algarra
> <ra...@shapelets.io> wrote:
> >
> > Hi Sutou,
> >
> > Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
> > 1.- In the python script that my embedded python interpreter runs I do:
> > "from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
> > 2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
> > "Fatal Python error: PyThreadState_Get: no current thread".
> > I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
> > I don't know what to do next. Did any of you experienced something similar?
> >
> > Thanks!
> >
> >
> > Raúl Bocanegra Algarra
> >
> > Software Engineer
> >
> > M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
> >
> >
> >
> > ________________________________
> > From: Sutou Kouhei <ko...@clear-code.com>
> > Sent: Sunday, January 5, 2020 10:29 PM
> > To: user@arrow.apache.org <us...@arrow.apache.org>
> > Subject: Re: Using Pyarrow and C++ API
> >
> > Hi,
> >
> > How about install pyarrow with "pip install --no-binary :all: pyarrow"?
> > Then you will be able to build your pyarrow with your
> > libarrow.so and libarrow_python.so.
> >
> > Thanks,
> > --
> > kou
> >
> > In
> >  <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
> >   "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
> >   Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:
> >
> > > Hi!
> > >
> > > I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> > >
> > > Thanks for your work.
> > >
> > > Best regards,
> > >
> > > Raúl Bocanegra Algarra. C++ Software Engineer.
> > >

Re: Using Pyarrow and C++ API

Posted by Raúl Bocanegra Algarra <ra...@shapelets.io>.
Hi Wes,

I just added "-DARROW_PYTHON=ON" on vcpkg/ports/arrow/portfile.cmake on the "vcpkg_configure_cmake" function call.
I also created an improvement for this in its repo: https://github.com/microsoft/vcpkg/issues/9350
Also edited its vcpkg/triplets/x64-linux.cmake as they recommend on the docs: https://vcpkg.readthedocs.io/en/latest/users/triplets/#per-port-customization
Just edited:
set(VCPKG_LIBRARY_LINKAGE static)
if(PORT MATCHES "arrow")
    set(VCPKG_LIBRARY_LINKAGE dynamic)
endif()

Thanks for your work.

Raúl.
________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Tuesday, January 7, 2020 6:06:21 PM
To: user@arrow.apache.org <us...@arrow.apache.org>
Cc: Sutou Kouhei <ko...@clear-code.com>
Subject: Re: Using Pyarrow and C++ API

It doesn't seem like vcpkg should have libarrow_python available based on

https://github.com/microsoft/vcpkg/tree/master/ports/arrow

How are you installing pyarrow?

On Tue, Jan 7, 2020 at 10:14 AM Raúl Bocanegra Algarra
<ra...@shapelets.io> wrote:
>
> Hi Sutou,
>
> Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
> 1.- In the python script that my embedded python interpreter runs I do:
> "from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
> 2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
> "Fatal Python error: PyThreadState_Get: no current thread".
> I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
> I don't know what to do next. Did any of you experienced something similar?
>
> Thanks!
>
>
> Raúl Bocanegra Algarra
>
> Software Engineer
>
> M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
>
>
>
> ________________________________
> From: Sutou Kouhei <ko...@clear-code.com>
> Sent: Sunday, January 5, 2020 10:29 PM
> To: user@arrow.apache.org <us...@arrow.apache.org>
> Subject: Re: Using Pyarrow and C++ API
>
> Hi,
>
> How about install pyarrow with "pip install --no-binary :all: pyarrow"?
> Then you will be able to build your pyarrow with your
> libarrow.so and libarrow_python.so.
>
> Thanks,
> --
> kou
>
> In
>  <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
>   "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
>   Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:
>
> > Hi!
> >
> > I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> >
> > Thanks for your work.
> >
> > Best regards,
> >
> > Raúl Bocanegra Algarra. C++ Software Engineer.
> >

Re: Using Pyarrow and C++ API

Posted by Wes McKinney <we...@gmail.com>.
It doesn't seem like vcpkg should have libarrow_python available based on

https://github.com/microsoft/vcpkg/tree/master/ports/arrow

How are you installing pyarrow?

On Tue, Jan 7, 2020 at 10:14 AM Raúl Bocanegra Algarra
<ra...@shapelets.io> wrote:
>
> Hi Sutou,
>
> Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
> 1.- In the python script that my embedded python interpreter runs I do:
> "from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
> 2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
> "Fatal Python error: PyThreadState_Get: no current thread".
> I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
> I don't know what to do next. Did any of you experienced something similar?
>
> Thanks!
>
>
> Raúl Bocanegra Algarra
>
> Software Engineer
>
> M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io
>
>
>
> ________________________________
> From: Sutou Kouhei <ko...@clear-code.com>
> Sent: Sunday, January 5, 2020 10:29 PM
> To: user@arrow.apache.org <us...@arrow.apache.org>
> Subject: Re: Using Pyarrow and C++ API
>
> Hi,
>
> How about install pyarrow with "pip install --no-binary :all: pyarrow"?
> Then you will be able to build your pyarrow with your
> libarrow.so and libarrow_python.so.
>
> Thanks,
> --
> kou
>
> In
>  <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
>   "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
>   Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:
>
> > Hi!
> >
> > I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> >
> > Thanks for your work.
> >
> > Best regards,
> >
> > Raúl Bocanegra Algarra. C++ Software Engineer.
> >

Re: Using Pyarrow and C++ API

Posted by Raúl Bocanegra Algarra <ra...@shapelets.io>.
Hi Sutou,

Thanks for your help. I tried that option but unfortunately arrow build scripts can't find the libarrow installation from vcpkg it seems to be using a custom findArrow.cmake script. Anyway I hacked the CMakeLists from arrow in order to find vcpkg's arrow libraries and it worked, I run ldd and the libraries are the ones from vcpkg. So now I have my Python extension and my application linked against vcpkg's arrow and arrow_python. But now it still doesn't work I am observing a couple of weird issues.
1.- In the python script that my embedded python interpreter runs I do:
"from statsmodels.tsa.stattools import adfuller as adf" but it gets stuck there. So I removed that import. Then:
2.- After removing the import now I have a SIGABRT and the following message when my extensions calls "arrow::py::wrap_array":
"Fatal Python error: PyThreadState_Get: no current thread".
I did some Google and I only found a similar issue on Macs with different python interpreters installed, but I am on an Ubuntu 18.4 with only python 3.6.9 installed and I am using a venv for pyarrow.
I don't know what to do next. Did any of you experienced something similar?

Thanks!



Raúl Bocanegra Algarra

Software Engineer

M: + 34 617 83 64 45 -  E: raul.bocanegra@shapelets.io<ma...@shapelets.io>


________________________________
From: Sutou Kouhei <ko...@clear-code.com>
Sent: Sunday, January 5, 2020 10:29 PM
To: user@arrow.apache.org <us...@arrow.apache.org>
Subject: Re: Using Pyarrow and C++ API

Hi,

How about install pyarrow with "pip install --no-binary :all: pyarrow"?
Then you will be able to build your pyarrow with your
libarrow.so and libarrow_python.so.

Thanks,
--
kou

In
 <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
  "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
  Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:

> Hi!
>
> I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
>
> Thanks for your work.
>
> Best regards,
>
> Raúl Bocanegra Algarra. C++ Software Engineer.
>

Re: Using Pyarrow and C++ API

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

How about install pyarrow with "pip install --no-binary :all: pyarrow"?
Then you will be able to build your pyarrow with your
libarrow.so and libarrow_python.so.

Thanks,
--
kou

In 
 <AM...@AM0PR01MB6417.eurprd01.prod.exchangelabs.com>
  "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +0000,
  Raúl Bocanegra Algarra <ra...@shapelets.io> wrote:

> Hi!
> 
> I am trying to use pyarrow with arrow C++ API in an application that embeds a python3 interpreter and loads an extension module using pybind11. Documentation says C++ headers and libraries are bundled with pyarrow but I am having some segfaults when calling some API functions like the wrap/unwrap ones. I am calling import_pyarrow and also import_numpy but segfaults still happening. I feel the reason is that I compile and link with my own arrow and arrow_python libs built with vcpkg so my app links with those, but the extension module imported by the embedded python interpreter is loading the arrow_python from the site-packages folder where pip installed pyarrow, and that mismatch makes the segfault happen. So I was wondering if the correct approach for a situation like this with an embedded interpreter and an extension module that imports pyarrow is to use the headers and libs from the pyarrow installation removing the ones from vcpkg or if you know another option I haven't contemplated yet.
> 
> Thanks for your work.
> 
> Best regards,
> 
> Raúl Bocanegra Algarra. C++ Software Engineer.
>