You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Yaron Gvili <rt...@hotmail.com> on 2022/05/10 17:06:55 UTC

PyArrow builds but fails to load pyarrow._dataset

Hello,

I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.

Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:

$ conda activate pyarrow-dev
$ mkdir -p arrow/cpp/build/pyarrow-release
$ pushd arrow/cpp/build/pyarrow-release
$ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
$ ninja -j 6
$ cmake --build . --target install
$ popd
$ pushd arrow/python
$ export PYARROW_WITH_PARQUET=1
$ export PYARROW_WITH_PARQUET_ENCRYPTION=1
$ python setup.py build_ext --inplace
$ python -m pytest pyarrow/
...
FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
...
$ python
Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow._dataset
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyarrow._dataset'


Cheers,
Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Yaron Gvili <rt...@hotmail.com>.
> I think you need to add:
>
>      export PYARROW_WITH_DATASET=1

This worked, thanks. I think the documentation [1] may need be fixed to clarify that DATASET is also an optional component.

[1] https://arrow.apache.org/docs/developers/python.html#build-and-test


Yaron.
________________________________
From: Yaron Gvili <rt...@hotmail.com>
Sent: Tuesday, May 10, 2022 1:24 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: PyArrow builds but fails to load pyarrow._dataset

> Does `import pyarrow` work?

Yes. Also, all but one unit test succeeded:

========================================================================================= short test summary info ==========================================================================================
FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
============================================================= 1 failed, 3382 passed, 834 skipped, 17 xfailed, 2 xpassed, 14 warnings in 44.92s =============================================================


Yaron.
________________________________
From: Antoine Pitrou <an...@python.org>
Sent: Tuesday, May 10, 2022 1:17 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: PyArrow builds but fails to load pyarrow._dataset



Le 10/05/2022 à 19:16, Antoine Pitrou a écrit :
>
> That said, tests which require should be skipped gracefully instead of
> failing.

Oops... some words got swallowed:

tests which require *the dataset module* should be skipped gracefully
instead of failing.


>
>
> Le 10/05/2022 à 19:13, Weston Pace a écrit :
>> I think you need to add:
>>
>>       export PYARROW_WITH_DATASET=1
>>
>> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.
>>>
>>> Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:
>>>
>>> $ conda activate pyarrow-dev
>>> $ mkdir -p arrow/cpp/build/pyarrow-release
>>> $ pushd arrow/cpp/build/pyarrow-release
>>> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
>>> $ ninja -j 6
>>> $ cmake --build . --target install
>>> $ popd
>>> $ pushd arrow/python
>>> $ export PYARROW_WITH_PARQUET=1
>>> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
>>> $ python setup.py build_ext --inplace
>>> $ python -m pytest pyarrow/
>>> ...
>>> FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
>>> ...
>>> $ python
>>> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
>>> [GCC 10.3.0] on linux
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import pyarrow._dataset
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in <module>
>>> ModuleNotFoundError: No module named 'pyarrow._dataset'
>>>
>>>
>>> Cheers,
>>> Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Yaron Gvili <rt...@hotmail.com>.
> Does `import pyarrow` work?

Yes. Also, all but one unit test succeeded:

========================================================================================= short test summary info ==========================================================================================
FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
============================================================= 1 failed, 3382 passed, 834 skipped, 17 xfailed, 2 xpassed, 14 warnings in 44.92s =============================================================


Yaron.
________________________________
From: Antoine Pitrou <an...@python.org>
Sent: Tuesday, May 10, 2022 1:17 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: PyArrow builds but fails to load pyarrow._dataset



Le 10/05/2022 à 19:16, Antoine Pitrou a écrit :
>
> That said, tests which require should be skipped gracefully instead of
> failing.

Oops... some words got swallowed:

tests which require *the dataset module* should be skipped gracefully
instead of failing.


>
>
> Le 10/05/2022 à 19:13, Weston Pace a écrit :
>> I think you need to add:
>>
>>       export PYARROW_WITH_DATASET=1
>>
>> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.
>>>
>>> Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:
>>>
>>> $ conda activate pyarrow-dev
>>> $ mkdir -p arrow/cpp/build/pyarrow-release
>>> $ pushd arrow/cpp/build/pyarrow-release
>>> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
>>> $ ninja -j 6
>>> $ cmake --build . --target install
>>> $ popd
>>> $ pushd arrow/python
>>> $ export PYARROW_WITH_PARQUET=1
>>> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
>>> $ python setup.py build_ext --inplace
>>> $ python -m pytest pyarrow/
>>> ...
>>> FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
>>> ...
>>> $ python
>>> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
>>> [GCC 10.3.0] on linux
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import pyarrow._dataset
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in <module>
>>> ModuleNotFoundError: No module named 'pyarrow._dataset'
>>>
>>>
>>> Cheers,
>>> Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Antoine Pitrou <an...@python.org>.

Le 10/05/2022 à 19:16, Antoine Pitrou a écrit :
> 
> That said, tests which require should be skipped gracefully instead of
> failing.

Oops... some words got swallowed:

tests which require *the dataset module* should be skipped gracefully 
instead of failing.


> 
> 
> Le 10/05/2022 à 19:13, Weston Pace a écrit :
>> I think you need to add:
>>
>>       export PYARROW_WITH_DATASET=1
>>
>> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.
>>>
>>> Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:
>>>
>>> $ conda activate pyarrow-dev
>>> $ mkdir -p arrow/cpp/build/pyarrow-release
>>> $ pushd arrow/cpp/build/pyarrow-release
>>> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
>>> $ ninja -j 6
>>> $ cmake --build . --target install
>>> $ popd
>>> $ pushd arrow/python
>>> $ export PYARROW_WITH_PARQUET=1
>>> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
>>> $ python setup.py build_ext --inplace
>>> $ python -m pytest pyarrow/
>>> ...
>>> FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
>>> ...
>>> $ python
>>> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
>>> [GCC 10.3.0] on linux
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import pyarrow._dataset
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in <module>
>>> ModuleNotFoundError: No module named 'pyarrow._dataset'
>>>
>>>
>>> Cheers,
>>> Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Antoine Pitrou <an...@python.org>.
That said, tests which require should be skipped gracefully instead of 
failing.


Le 10/05/2022 à 19:13, Weston Pace a écrit :
> I think you need to add:
> 
>      export PYARROW_WITH_DATASET=1
> 
> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote:
>>
>> Hello,
>>
>> I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.
>>
>> Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:
>>
>> $ conda activate pyarrow-dev
>> $ mkdir -p arrow/cpp/build/pyarrow-release
>> $ pushd arrow/cpp/build/pyarrow-release
>> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
>> $ ninja -j 6
>> $ cmake --build . --target install
>> $ popd
>> $ pushd arrow/python
>> $ export PYARROW_WITH_PARQUET=1
>> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
>> $ python setup.py build_ext --inplace
>> $ python -m pytest pyarrow/
>> ...
>> FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
>> ...
>> $ python
>> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
>> [GCC 10.3.0] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import pyarrow._dataset
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> ModuleNotFoundError: No module named 'pyarrow._dataset'
>>
>>
>> Cheers,
>> Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Weston Pace <we...@gmail.com>.
I think you need to add:

    export PYARROW_WITH_DATASET=1

On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote:
>
> Hello,
>
> I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error.
>
> Below are the commands I used to build and test along with the failure console-output (other console-output, for successful commands, is not included), followed by my manual confirmation:
>
> $ conda activate pyarrow-dev
> $ mkdir -p arrow/cpp/build/pyarrow-release
> $ pushd arrow/cpp/build/pyarrow-release
> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
> $ ninja -j 6
> $ cmake --build . --target install
> $ popd
> $ pushd arrow/python
> $ export PYARROW_WITH_PARQUET=1
> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
> $ python setup.py build_ext --inplace
> $ python -m pytest pyarrow/
> ...
> FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset'
> ...
> $ python
> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
> [GCC 10.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow._dataset
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ModuleNotFoundError: No module named 'pyarrow._dataset'
>
>
> Cheers,
> Yaron.

Re: PyArrow builds but fails to load pyarrow._dataset

Posted by Niranda Perera <ni...@gmail.com>.
Hi Yaron,

Does `import pyarrow` work?

On Tue, May 10, 2022 at 1:07 PM Yaron Gvili <rt...@hotmail.com> wrote:

> Hello,
>
> I ran into a problem with running PyArrow that I locally built. The build
> worked fine (or so it seems) but then the testing procedure had a failure
> due to not being able to load pyarrow._dataset, which I manually confirmed.
> I'd appreciate any guidance on how to fix this error.
>
> Below are the commands I used to build and test along with the failure
> console-output (other console-output, for successful commands, is not
> included), followed by my manual confirmation:
>
> $ conda activate pyarrow-dev
> $ mkdir -p arrow/cpp/build/pyarrow-release
> $ pushd arrow/cpp/build/pyarrow-release
> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
> -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC
> PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY
> WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done)
> -DPARQUET_REQUIRE_ENCRYPTION=ON ../..
> $ ninja -j 6
> $ cmake --build . --target install
> $ popd
> $ pushd arrow/python
> $ export PYARROW_WITH_PARQUET=1
> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1
> $ python setup.py build_ext --inplace
> $ python -m pytest pyarrow/
> ...
> FAILED
> pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] -
> ModuleNotFoundError: No module named 'pyarrow._dataset'
> ...
> $ python
> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
> [GCC 10.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow._dataset
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ModuleNotFoundError: No module named 'pyarrow._dataset'
>
>
> Cheers,
> Yaron.
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>