You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorrick Sleijster (Jira)" <ji...@apache.org> on 2022/08/08 05:36:00 UTC

[jira] [Created] (ARROW-17335) [Python] Type checking support

Jorrick Sleijster created ARROW-17335:
-----------------------------------------

             Summary: [Python] Type checking support
                 Key: ARROW-17335
                 URL: https://issues.apache.org/jira/browse/ARROW-17335
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
            Reporter: Jorrick Sleijster


As of Python3.6, it has been possible to introduce typing information in the code. This became immensely popular in a short period of time. Shortly after, the tool `mypy` arrived and this has become the industry standard for static type checking inside Python. It is able to check very quickly for invalid types which makes it possible to serve as a pre-commit. It has raised many bugs that I did not see myself and has been a very valuable tool.

Now what does this mean for PyArrow?

When we run code using PyArrow inside mypy you get the following error message:

```
some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing "pyarrow.fs": module is installed, but missing library stubs or py.typed marker
```

More information is available here: [https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker]

You can solve this in three ways:
 # Ignore the message. This, however, will put all types from PyArrow to `Any`, making it unable to find user errors with the PyArrow library
 # Create a Python stub file. This is what previously used to be the standard, however, it no longer a popular option. This is because stubs are extra, next to the source code, while you can also inline the code with type hints, which brings me to our third option.
 # Create a `py.typed` file and use inline type hints. This is the most popular option today because it requires no extra files (except for the py.typed file), allows all the type hints to be with the code (like now in the documentation) and not only provides your users but also the developers of the library themselves with type hints (and hinting of issues inside your IDE).

 

My personal opinion already shines through the options, it is 3 as this has shortly become the industry standard since the introduction.

I'd very much like to work on this, however, I don't feel like wasting time. Therefore, I am raising this ticket to see if this had been considered before or if we just didn't get to this yet.

I'd like to open the discussion here:
 # Do you agree with number #3 as type hints.
 # Should we remove the documentation annotations for the type hints given they will be inside the functions? Or should we keep it and specify it in the code? Which would make it double.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)