You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matteo Santamaria (Jira)" <ji...@apache.org> on 2022/09/30 15:12:00 UTC

[jira] [Comment Edited] (ARROW-17901) [Python] pyarrow missing py.typed marker file

    [ https://issues.apache.org/jira/browse/ARROW-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611641#comment-17611641 ] 

Matteo Santamaria edited comment on ARROW-17901 at 9/30/22 3:11 PM:
--------------------------------------------------------------------

[~alenka] the proposed changes are so minimal that it might be easiest just to describe them, although I am happy to open a branch if that would help.

Basically, we just need to add an (empty) file called {{py.typed}} in {{arrow/python/pyarrow}} and then make sure it gets included in the distribution. [PEP-561|https://peps.python.org/pep-0561/#packaging-type-information] suggests this might be as easy as adding it to the package data [here.|https://github.com/apache/arrow/blob/7b44e6140bb35f3fb45fb32d439f0fff360509d6/python/setup.py#L734]


was (Author: JIRAUSER285760):
[~alenka] the proposed changes are so minimal that it might be easiest just to describe them, although I am happy to open a branch if that would help.

Basically, we just need to add an (empty) file called {{py.typed}} in {{arrow/python/pyarrow }}and then make sure it gets included in the distribution. [PEP-561|https://peps.python.org/pep-0561/#packaging-type-information] suggests this might be as easy as adding it to the package data [here.|https://github.com/apache/arrow/blob/7b44e6140bb35f3fb45fb32d439f0fff360509d6/python/setup.py#L734]

> [Python] pyarrow missing py.typed marker file
> ---------------------------------------------
>
>                 Key: ARROW-17901
>                 URL: https://issues.apache.org/jira/browse/ARROW-17901
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Matteo Santamaria
>            Priority: Minor
>
> I understand that, in general, {{pyarrow}} does not support type hints. However, I think it is still sensible to add a {{py.typed}} marker file to the library. Let me demonstrate why,
> {code:java}
> $ pip install mypy pyarrow {code}
> {code:java}
> # test.py
> import pyarrow as pa
>  
> table = pa.Table()
>  
> reveal_type(table) {code}
> {code:java}
> $ mypy test.py
> test.py:1: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
> test.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
> test.py:5: note: Revealed type is "Any"
> Found 1 error in 1 file (checked 1 source file) {code}
> Note that {{mypy}} identifies {{table}} as being an {{Any}} type, when obviously it is a {{{}Table{}}}. If we include a {{py.typed}} file, {{mypy}} will be able to make these trivial inferences. The motivating example is this,
> {code:java}
> @overload
> def from_arrow(a: pa.Table) -> DataFrame:
>     ...
> @overload
> def from_arrow(a: pa.Array | pa.ChunkedArray) -> Series:
>     ...
> def from_arrow(a: pa.Table | pa.Array | pa.ChunkedArray) -> DataFrame | Series:
>     pass {code}
> The problem is that since all of {{{}pa.Table{}}}, {{{}pa.Array{}}}, and {{pa.ChunkedArray}} are determined to be {{{}Any{}}}, the overloads effectively become 
> {code:java}
> @overload
> def from_arrow(a: Any) -> DataFrame:
>     ...
> @overload
> def from_arrow(a: Any) -> Series:
>     ... {code}
> and {{mypy}} complains that overload 2 is covered entirely by overload 1.
>  
> I tried to test what adding a {{py.typed}} file would do, but I ran into compilation issues. I was hoping someone with a little more experience here could quickly test this out for me :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)