You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dustin Moriarty (Jira)" <ji...@apache.org> on 2022/03/17 21:52:00 UTC

[jira] (ARROW-15966) pytz required by pyarrow but not included in package metadata requirements.

    [ https://issues.apache.org/jira/browse/ARROW-15966 ]


    Dustin Moriarty deleted comment on ARROW-15966:
    -----------------------------------------

was (Author: JIRAUSER286757):
I also noticed that this is "ModuleNotFound" instead of "ImportError." I was never able to track down the import statement. It looks like pyarrow is pulling in this dependency in some sort of non-conventional way. Maybe someone who knows the code better can see how pyarrow even manages this. I imagine this is a subset of how pyarrow handles other optional packages such as pandas. I am surprised to not see pandas in extras as well because that would make the same problem. 

> pytz required by pyarrow but not included in package metadata requirements.
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-15966
>                 URL: https://issues.apache.org/jira/browse/ARROW-15966
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Dustin Moriarty
>            Priority: Major
>
> Pyarrow raises a ModuleNotFoundError for pytz when a timestamped timezone is used. However, pytz is not included in the pyarrow package metadata as a standard requirement or an extra.
> Pyarrow Version: 7.0.0
> Python Version: 3.10.2
> OS Version: macOS 12.3
> How to reproduce.
> 1. Create a clean environment. I use pyenv but there are lots of ways to get to the same result. As long as you are using something deterministic and not something like conda you get the idea.
> {code:java}
> pyenv virtualenv 3.10.2 pyarrow_test_env 
> pyenv activate pyarrow_test_env{code}
> 2. Install pyarrow.
> {code:java}
> pip install pyarrow{code}
> 3. Create a table with a datetime with a timezone.
> {code:java}
> >>> import pyarrow
> >>> from datetime import datetime
> >>> from datetime import timezone 
> >>> pyarrow.table({"my_time": [datetime(2022,1,1, tzinfo=timezone.utc)]}) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "pyarrow/table.pxi", line 2577, in pyarrow.lib.table   File "pyarrow/table.pxi", line 1868, in pyarrow.lib.Table.from_pydict   File "pyarrow/table.pxi", line 2658, in pyarrow.lib._from_pydict   File "pyarrow/array.pxi", line 342, in pyarrow.lib.asarray   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array   File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status ModuleNotFoundError: No module named 'pytz'{code}
> The only package required by pyarrow is numpy. There are no extra requirements defined. If there are optional extras they should be defined in the package metadata (e.g. setup.py extras_require).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)