You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Szymkiewicz (Jira)" <ji...@apache.org> on 2021/10/12 19:25:00 UTC
[jira] [Updated] (SPARK-36989) Migrate type hint data tests

     [ https://issues.apache.org/jira/browse/SPARK-36989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maciej Szymkiewicz updated SPARK-36989:
---------------------------------------
    Description: 
Before the migration, {{pyspark-stubs}} contained a set of [data tests|https://github.com/zero323/pyspark-stubs/tree/branch-3.0/test-data/unit], modeled after, and using internal test utilities, of mypy.

These were omitted during the migration for a few reasons:
 * Simplicity.
 * Relative slowness.
 * Dependence on non public API.

 

Data tests are useful for a number of reasons:

 
 * Improve test coverage for type hints.
 * Checking if type checkers infer expected types.
 * Checking if type checkers reject incorrect code.
 * Detecting unusual errors with code that otherwise type checks,

 

Especially, the last two functions are not fulfilled by simple validation of existing codebase.

 

Data tests are not required for all annotations and can be restricted to code that has high possibility of failure:
 * Complex overloaded signatures.
 * Complex generics.
 * Generic {{self}} annotations
 * Code containing {{type: ignore}}

The biggest risk, is that output matchers have to be updated when signature changes and / or mypy output changes.

Example of problem detected with data tests can be found in SPARK-36894 PR ([https://github.com/apache/spark/pull/34146]).

 

 

  was:
Before the migration, {{pyspark-stubs}} contained a set of data tests, modeled after, and using internal test utilities, of mypy.

These were omitted during the migration for a few reasons:
 * Simplicity.
 * Relative slowness.
 * Dependence on non public API.

 

Data tests are useful for a number of reasons:

 
 * Improve test coverage for type hints.
 * Checking if type checkers infer expected types.
 * Checking if type checkers reject incorrect code.
 * Detecting unusual errors with code that otherwise type checks,

 

Especially, the last two functions are not fulfilled by simple validation of existing codebase.

 

Data tests are not required for all annotations and can be restricted to code that has high possibility of failure:
 * Complex overloaded signatures.
 * Complex generics.
 * Generic {{self}} annotations
 * Code containing {{type: ignore}}

The biggest risk, is that output matchers have to be updated when signature changes and / or mypy output changes.

Example of problem detected with data tests can be found in SPARK-36894 PR ([https://github.com/apache/spark/pull/34146]).

 

 


> Migrate type hint data tests
> ----------------------------
>
>                 Key: SPARK-36989
>                 URL: https://issues.apache.org/jira/browse/SPARK-36989
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Maciej Szymkiewicz
>            Priority: Major
>
> Before the migration, {{pyspark-stubs}} contained a set of [data tests|https://github.com/zero323/pyspark-stubs/tree/branch-3.0/test-data/unit], modeled after, and using internal test utilities, of mypy.
> These were omitted during the migration for a few reasons:
>  * Simplicity.
>  * Relative slowness.
>  * Dependence on non public API.
>  
> Data tests are useful for a number of reasons:
>  
>  * Improve test coverage for type hints.
>  * Checking if type checkers infer expected types.
>  * Checking if type checkers reject incorrect code.
>  * Detecting unusual errors with code that otherwise type checks,
>  
> Especially, the last two functions are not fulfilled by simple validation of existing codebase.
>  
> Data tests are not required for all annotations and can be restricted to code that has high possibility of failure:
>  * Complex overloaded signatures.
>  * Complex generics.
>  * Generic {{self}} annotations
>  * Code containing {{type: ignore}}
> The biggest risk, is that output matchers have to be updated when signature changes and / or mypy output changes.
> Example of problem detected with data tests can be found in SPARK-36894 PR ([https://github.com/apache/spark/pull/34146]).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org