You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Matteo Santamaria (Jira)" <ji...@apache.org> on 2022/09/29 23:54:00 UTC

[jira] [Created] (ARROW-17901) `pyarrow` missing `py.typed` marker

Matteo Santamaria created ARROW-17901:
-----------------------------------------

             Summary: `pyarrow` missing `py.typed` marker
                 Key: ARROW-17901
                 URL: https://issues.apache.org/jira/browse/ARROW-17901
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Matteo Santamaria


I understand that, in general, `pyarrow` does not support type hints. However, I think it is still sensible to add a `py.typed` marker file to the library. Let me demonstrate why,

 

```

$ pip install mypy pyarrow

```

 

```python

# test.py

import pyarrow as pa

 

table = pa.Table()

 

reveal_type(table)

```

 

```

$ mypy test.py

test.py:1: *error:* Skipping analyzing {*}"pyarrow"{*}: module is installed, but missing library stubs or py.typed marker

test.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

test.py:5: note: Revealed type is *"Any"*

*Found 1 error in 1 file (checked 1 source file)*

```

 

Note that `mypy` identifies `table` as being an `Any` type, when obviously it is a `Table`. If we include a `py.typed` file, `mypy` will be able to make these trivial inferences. 

 

The motivating example is this,

 

```python

@overload
def from_arrow(a: pa.Table) -> DataFrame:
    ...


@overload
def from_arrow(a: pa.Array | pa.ChunkedArray) -> Series:
    ...


def from_arrow(a: pa.Table | pa.Array | pa.ChunkedArray) -> DataFrame | Series:
    pass

```

 

The problem is that all of `pa.Table`, `pa.Array`, and `pa.ChunkedArray` are determined to be `Any`, so the overloads effectively become 

 

```python

@overload
def from_arrow(a: Any) -> DataFrame:
    ...


@overload
def from_arrow(a: Any) -> Series:
    ...

```

 

and `mypy` complains that overload 2 is covered entirely by overload 1.

 

I tried to test what adding a `py.typed` file would do, but I ran into compilation issues. I was hoping someone with a little more experience could quickly test this out for me :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)