You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/21 21:04:59 UTC

[GitHub] [spark] Fokko opened a new pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Fokko opened a new pull request #29180:
URL: https://github.com/apache/spark/pull/29180


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   
   Enable [mypy](http://mypy-lang.org/) type checking on the Python code. 
   
   ## Annotation troubles
   
   Unfortunately the type checker has some issues with Spark's `@since` annotation.
   ```
   @property  # type: ignore
   ```
   https://github.com/python/mypy/issues/1362
   
   ## Abstractproperty
   
   Replaced [abstractproperty](https://docs.python.org/3/library/abc.html#abc.abstractproperty) since it is deprecated 
   
   
   
   
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kyprifog edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
kyprifog edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711348733


   Could we have a better explanation for why this is being closed?  I agree with @holdenk and others that this is worth doing and I've been watching mypy closely and it's making definite strides especially in the last year.  Im happy to pick up the mantle on this if needed.  It's not going to be to the level of Scala or the frameless library but I still think the effort is better than nothing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663564404


   > Sounds good to me. I'm in favor of having it inline, but having the annotation in the pyi is a good first step :)
   
   There are trade-offs related to both strategies and even withing a single projects there can be modules that are better fit for one or another. My personal objection to adding a few hints here and there is not about preference towards stubs, but avoiding breaking things. Since hints already exist and are moderately popular taking step back and having inferior coverage seem like a rather counter-intuitive approach, that makes community worse, at least for the time being.
   
     


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kyprifog commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
kyprifog commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r504330439



##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi
+
+    echo "starting $MYPY_BUILD test..."
+    MYPY_REPORT=$( ($MYPY_BUILD --ignore-missing-imports --config-file python/mypy.ini python/) 2>&1)

Review comment:
       Second this, I think it should be in the mypy.ini under specific sections like is suggested here:  http://calpaterson.com/mypy-hints.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kyprifog commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
kyprifog commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711406788


   Thank you! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662106048


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663311159


   @holdenk, I think there is a bit of differences between Scala/Java and Python. For example, Python type hinting is optional and still premature.
   
   We could think about adding the type hints into the code but maybe we should port it first (or have a separate repo) and see how it goes to the end-users. Also, porting/managing stub files separately is much easier than fixing them in the codes in place. It will take a low risk with the same output. For example, fixing the codes will likely bring some mistakes. As a bonus, that will likely keep the author as @zero323 easily as it will be ported as is ideally in one PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r495487007



##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi

Review comment:
       I think we should make this optional for now, i.e.
   
   ```bash
       if ! hash "$MYPY_BUILD" 2> /dev/null; then
           echo "The $MYPY_BUILD command was not found. Skipping MyPy test check build for now."
           echo
           return
       fi
   ```

##########
File path: python/pyspark/serializers.pyi
##########
@@ -0,0 +1,128 @@
+#

Review comment:
       This has been already ported.

##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi
+
+    echo "starting $MYPY_BUILD test..."
+    MYPY_REPORT=$( ($MYPY_BUILD --ignore-missing-imports --config-file python/mypy.ini python/) 2>&1)

Review comment:
       I wouldn't use `--ignore-missing-imports` here. It is an atomic option.
   
   Instead I would use either specific excludes (as we have right now in mypy.ini) or targeted `type: ignore` (as in some test modules and such).

##########
File path: python/pyspark/accumulators.pyi
##########
@@ -0,0 +1,76 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/pyspark/broadcast.pyi
##########
@@ -0,0 +1,51 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/mypy.ini
##########
@@ -0,0 +1,20 @@
+[mypy]
+python_version = 3.6
+
+[mypy-pyspark.cloudpickle.*]
+ignore_errors = True
+
+[mypy-pyspark.streaming.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.sql.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.testing.*]
+ignore_errors = True
+
+[mypy-pyspark.tests.*]
+ignore_errors = True

Review comment:
       I am strongly against excluding tests and testing here.
   
   These already type check, with only a bunch of `ignores` and  it is an additional coverage.

##########
File path: dev/lint-python
##########
@@ -246,6 +272,7 @@ PYTHON_SOURCE="$(find . -name "*.py")"
 compile_python_test "$PYTHON_SOURCE"
 pycodestyle_test "$PYTHON_SOURCE"
 flake8_test
+mypy_test

Review comment:
       Personally I'd consider having type checks in a separate script, as we might want to extend things over time (I plan to add tests on examples, same as in [`pyspark-stubs`](https://github.com/zero323/pyspark-stubs/blob/d317c27083549b6727542d825e642ad91f275f9d/.travis.yml#L18) and thinking about migrating original data driven tests).
   
   But it is something we can discuss later.

##########
File path: python/pyspark/mllib/linalg/__init__.py
##########
@@ -469,7 +469,7 @@ def __getattr__(self, item):
     def __neg__(self):
         return DenseVector(-self.array)
 
-    def _delegate(op):
+    def _delegate(op: str):  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/ml/base.py
##########
@@ -309,7 +309,8 @@ def setPredictionCol(self, value):
         """
         return self._set(predictionCol=value)
 
-    @abstractproperty
+    @property  # type: ignore

Review comment:
       These ignores are not strictly required now, but I think it is OK to keep them, as there is not real harm here and they're likely to prove required in the future.

##########
File path: python/pyspark/sql/session.py
##########
@@ -76,7 +76,7 @@ class Builder(object):
         """
 
         _lock = RLock()
-        _options = {}
+        _options = {}  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/sql/types.pyi
##########
@@ -0,0 +1,238 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/setup.py
##########
@@ -28,7 +28,8 @@
     print("Failed to load PySpark version file for packaging. You must be in Spark's python dir.",
           file=sys.stderr)
     sys.exit(-1)
-VERSION = __version__  # noqa
+
+VERSION = __version__  # type: ignore  # noqa

Review comment:
       I don't think it is required

##########
File path: python/pyspark/util.pyi
##########
@@ -0,0 +1,33 @@
+#

Review comment:
       This has been already ported, though I am not sure if we're going to consider it a part of the API.

##########
File path: python/pyspark/storagelevel.py
##########
@@ -19,6 +19,14 @@
 
 
 class StorageLevel(object):
+    DISK_ONLY: 'StorageLevel'

Review comment:
       This is already covered with stubs and I think it is better to avoid annotations in `.py` unless they're strictly needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662106048


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663316562


   It is getting popular and more stable but the typing hint is still being evolved even between the minor versions. See https://docs.python.org/3/library/typing.html. One example might be [PEP 544](https://www.python.org/dev/peps/pep-0544/) that allows structural subtyping (Python 3.8). It could change the fundamental way of how we type.
   
   I don't strongly object to have the codes inlined but I would think it's more safer and easier way to port the files first.
   
   > How should we best keep the type hints up to date if it's in a seperate file?
   > Could we add something to the PR templated for Python requests?
   > As for a separate repo what do you envision?
   
   I am sure we will figure a way.
   - We could take NumPy approach e.g. [numpy/__init__.pyi](https://github.com/numpy/numpy/blob/master/numpy/__init__.pyi).
   - Or we can port his repo to ASF (is it feasible?).
   
   I think IDE can detect. If I am not mistaken, mypy can detect too as a linter. If dev people fine it confusing, we should maybe document it in PR template as you suggest or in the contribution guide.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kyprifog edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
kyprifog edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711348733


   Could we have a better explanation for why this is being closed?  I agree with @holdenk and others that this is worth doing and I've been watching mypy closely and it's making definite strides especially in the last year.  Im happy to pick up the mantle on this if needed.  It's not going to be to the level of Scala or the frameless library but I still think the effort is better than nothing


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663320589


   > That's from three years ago. Scala type system has certainly evolved more than once every three years.
   
   Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I guess certainly we don't want to manage multiple versions of stub files or have such typing in codes with if-else of Python versions.
   
   Yes, so I meant to port stubs files into `python/pyspark`. Maybe we could discuss more about how we'll port in the ongoing dev mailing thread.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662291534


   I don't really have standing here, but I still think that putting random annotations here and there are make things exclusively worse.
   
   Other than that ‒ I'd strongly advise against using unqualified ` # type: ignore` anywyhere. It tends to escalate and render large chunks of annotations useless. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663313791


   > @holdenk, I think there is a bit of differences between Scala/Java and Python. For example, Python type hinting is optional and still premature.
   Optional yes.
   Why do you believe it is premature? The PEP is 6 years old ( https://www.python.org/dev/peps/pep-0484/ ).
   I did an informal survey once we decided to drop old versions of Python and supporting typing seems fairly popular ( https://twitter.com/holdenkarau/status/1282486803249762304 ). We could do a survey with users@ if you don't think this feature would be impactful.
   > 
   > We could think about adding the type hints into the code but maybe we should port it first (or have a separate repo) and see how it goes to the end-users. Also, porting/managing stub files separately is much easier than fixing them in the codes in place. It will take a low risk and same output. For example, fixing the codes will likely bring some mistakes. As a bonus, that will likely keep the author as @zero323 easily as it will be ported as is ideally in one PR.
   
   If folks are against in-line types in the Python code I suppose porting the hints file is an OK compromise.
   I'd rather have them in-line as a developer but, if our options are no types or having it live in a separate file I'll take the second one. How should we best keep the type hints up to date if it's in a seperate file? Could we add something to the PR templated for Python requests?
   
   As for a separate repo what do you envision?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711309815


   Closing this one for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r481926390



##########
File path: dev/lint-python
##########
@@ -122,6 +123,32 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        # exit 1
+        pip3 install mypy

Review comment:
       Oh let's don't do this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662334741


   I also tend to think it's not a good idea to put the types into the codes partially. It would make things more difficult to manage. Let's avoid to do it for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663316562


   It is getting popular and more stable but the typing hint is still being evolved even between the minor versions. See https://docs.python.org/3/library/typing.html. One example might be [PEP 544](https://www.python.org/dev/peps/pep-0544/) that allows structural subtyping (Python 3.8). It could change the fundamental way of how we type.
   
   I don't strongly object to have the codes inlined but I would think it's more safer and easier way to port the files first.
   
   > How should we best keep the type hints up to date if it's in a seperate file?
   > Could we add something to the PR templated for Python requests?
   > As for a separate repo what do you envision?
   
   I am sure we will figure a way out. For example,
   - We could take NumPy approach e.g. [numpy/__init__.pyi](https://github.com/numpy/numpy/blob/master/numpy/__init__.pyi).
   - Or we can port his repo to ASF (is it feasible?).
   
   I think IDE can detect. If I am not mistaken, mypy can detect too as a linter. If dev people fine it confusing, we should maybe document it in PR template as you suggest or in the contribution guide.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711392076


   I force pushed, so I recreated the PR under https://github.com/apache/spark/pull/30088


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663320589


   > That's from three years ago. Scala type system has certainly evolved more than once every three years.
   
   Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I guess certainly we don't want to manage multiple versions of stub files.
   
   Yes, so I meant to port stubs files into `python/pyspark`. Maybe we could discuss more about how we'll port in the ongoing thread.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663251077


   Jenkins ok to test
   @HyukjinKwon : we have the types along with the code in Scala and Java. I think having the types along side the code makes it easier to remember to update them and gives hints while someones reading the code.
   Jenkins add to whitelist


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663320589


   > That's from three years ago. Scala type system has certainly evolved more than once every three years.
   
   Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I guess certainly we don't want to manage multiple versions of stub files or have such typing in codes with if-else of Python versions.
   
   Yes, so I meant to port stubs files into `python/pyspark`. Maybe we could discuss more about how we'll port in the ongoing thread.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662106495


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r481926616



##########
File path: dev/lint-python
##########
@@ -122,6 +123,32 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        # exit 1
+        pip3 install mypy

Review comment:
       We should open a separate ticket to do this in Jenkins and ask @shaneknapp to install




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663316562


   It is getting popular and more stable but the typing hint is still being evolved even between the minor versions. See https://docs.python.org/3/library/typing.html. One example might be [PEP 544](https://www.python.org/dev/peps/pep-0544/) that allows structural subtyping (Python 3.8). It could change the fundamental way of how we type.
   
   I don't strongly object to have the codes inlined but I would think it's more safer and easier way to port the files first.
   
   > How should we best keep the type hints up to date if it's in a seperate file?
   > Could we add something to the PR templated for Python requests?
   > As for a separate repo what do you envision?
   
   I am sure we will figure a way out. For example,
   - We could take NumPy approach e.g. [numpy/__init__.pyi](https://github.com/numpy/numpy/blob/master/numpy/__init__.pyi).
   - Or we can port his repo to ASF (is it feasible?).
   
   I think IDE can detect. If I am not mistaken, mypy can detect too as a linter. If dev people find it confusing, we should maybe document it in PR template as you suggest or in the contribution guide.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663311159


   @holdenk, I think there is a bit of differences between Scala/Java and Python. For example, Python type hinting is optional and still premature.
   
   We could think about adding the type hints into the code but maybe we should port it first (or have a separate repo) and see how it goes to the end-users. Also, porting/managing stub files separately is much easier than fixing them in the codes in place. It will take a low risk and same output. For example, fixing the codes will likely bring some mistakes. As a bonus, that will likely keep the author as @zero323 easily as it will be ported as is ideally in one PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kyprifog commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
kyprifog commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711348733


   Could we have a better explanation for why this is being closed?  I agree with @holdenk and others that this is worth doing and I've been watching mypy closely and it's making definite strides especially in the last year.  Im happy to pick up the mantle on this if needed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662383147


   @zero323 The annotations aren't random at all. This PR fixes all the outstanding violations on the current master:
   ```
   fokkodriesprong@Fan python % mypy --ignore-missing-imports .       
   pyspark/storagelevel.py:52: error: "Type[StorageLevel]" has no attribute "DISK_ONLY"
   pyspark/storagelevel.py:53: error: "Type[StorageLevel]" has no attribute "DISK_ONLY_2"
   pyspark/storagelevel.py:54: error: "Type[StorageLevel]" has no attribute "MEMORY_ONLY"
   pyspark/storagelevel.py:55: error: "Type[StorageLevel]" has no attribute "MEMORY_ONLY_2"
   pyspark/storagelevel.py:56: error: "Type[StorageLevel]" has no attribute "MEMORY_AND_DISK"
   pyspark/storagelevel.py:57: error: "Type[StorageLevel]" has no attribute "MEMORY_AND_DISK_2"
   pyspark/storagelevel.py:58: error: "Type[StorageLevel]" has no attribute "OFF_HEAP"
   setup.py:31: error: Name '__version__' is not defined
   setup.py:52: error: Incompatible types in assignment (expression has type "str", variable has type "List[str]")
   setup.py:55: error: Incompatible types in assignment (expression has type "str", variable has type "List[str]")
   setup.py:120: error: Argument 1 to "symlink" has incompatible type "List[str]"; expected "Union[bytes, str, _PathLike[Any]]"
   setup.py:128: error: Argument 1 to "copytree" has incompatible type "List[str]"; expected "Union[str, _PathLike[str]]"
   pyspark/cloudpickle/cloudpickle.py:59: error: Module 'pickle' has no attribute '_getattribute'
   pyspark/cloudpickle/cloudpickle.py:66: error: Incompatible types in assignment (expression has type "None", variable has type Module)
   pyspark/cloudpickle/cloudpickle.py:66: error: Incompatible types in assignment (expression has type "None", variable has type "_SpecialForm")
   pyspark/cloudpickle/cloudpickle.py:71: error: Incompatible types in assignment (expression has type "None", variable has type "_SpecialForm")
   pyspark/cloudpickle/cloudpickle.py:74: error: Module 'types' has no attribute 'CellType'
   pyspark/cloudpickle/cloudpickle.py:93: error: Need type annotation for '_DYNAMIC_CLASS_TRACKER_BY_CLASS'
   pyspark/cloudpickle/cloudpickle.py:94: error: Need type annotation for '_DYNAMIC_CLASS_TRACKER_BY_ID'
   pyspark/cloudpickle/cloudpickle.py:104: error: Need type annotation for '_extract_code_globals_cache'
   pyspark/cloudpickle/cloudpickle_fast.py:77: error: All conditional function variants must have identical signatures
   pyspark/cloudpickle/cloudpickle_fast.py:89: error: All conditional function variants must have identical signatures
   pyspark/cloudpickle/cloudpickle_fast.py:462: error: Invalid index type "Type[TextIOWrapper]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:463: error: Invalid index type "Type[Logger]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:464: error: Invalid index type "Type[RootLogger]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:465: error: Invalid index type "Type[memoryview]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:466: error: Invalid index type "Type[property]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:467: error: Invalid index type "Type[staticmethod]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:469: error: Invalid index type "Type[CodeType]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:470: error: Invalid index type "Type[GetSetDescriptorType]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:471: error: Invalid index type "Type[Module]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:472: error: Invalid index type "Type[MethodType]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:473: error: Invalid index type "Type[MappingProxyType[Any, Any]]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:474: error: Invalid index type "Type[WeakSet[Any]]" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:475: error: Invalid index type "object" for "Dict[Type[classmethod], Callable[[Any], Any]]"; expected type "Type[classmethod]"
   pyspark/cloudpickle/cloudpickle_fast.py:477: error: Module has no attribute "dispatch_table"
   pyspark/cloudpickle/cloudpickle_fast.py:643: error: All conditional function variants must have identical signatures
   pyspark/util.py:23: error: Need type annotation for '__all__' (hint: "__all__: List[<type>] = ...")
   pyspark/heapq3.py:872: error: Module '_heapq' has no attribute '_heapreplace_max'; maybe "heapreplace"?
   pyspark/heapq3.py:872: error: Name '_heapreplace_max' already defined on line 470
   pyspark/heapq3.py:876: error: Module '_heapq' has no attribute '_heapify_max'
   pyspark/heapq3.py:876: error: Name '_heapify_max' already defined on line 477
   pyspark/heapq3.py:880: error: Module '_heapq' has no attribute '_heappop_max'
   pyspark/heapq3.py:880: error: Name '_heappop_max' already defined on line 460
   pyspark/serializers.py:345: error: Need type annotation for '__cls' (hint: "__cls: Dict[<type>, <type>] = ...")
   pyspark/accumulators.py:104: error: Need type annotation for '_accumulatorRegistry' (hint: "_accumulatorRegistry: Dict[<type>, <type>] = ...")
   pyspark/broadcast.py:34: error: Need type annotation for '_broadcastRegistry' (hint: "_broadcastRegistry: Dict[<type>, <type>] = ...")
   pyspark/sql/types.py:97: error: Need type annotation for '_instances' (hint: "_instances: Dict[<type>, <type>] = ...")
   pyspark/sql/types.py:742: error: "type" has no attribute "typeName"
   pyspark/sql/types.py:743: error: "type" has no attribute "typeName"
   pyspark/sql/types.py:971: error: Incompatible types in assignment (expression has type "Type[StringType]", target has type "Type[FractionalType]")
   pyspark/sql/streaming.py:45: error: Decorated property not supported
   pyspark/sql/streaming.py:56: error: Decorated property not supported
   pyspark/sql/streaming.py:64: error: Decorated property not supported
   pyspark/sql/streaming.py:74: error: Decorated property not supported
   pyspark/sql/streaming.py:101: error: Decorated property not supported
   pyspark/sql/streaming.py:109: error: Decorated property not supported
   pyspark/sql/streaming.py:118: error: Decorated property not supported
   pyspark/sql/streaming.py:202: error: Decorated property not supported
   pyspark/sql/session.py:79: error: Need type annotation for '_options' (hint: "_options: Dict[<type>, <type>] = ...")
   pyspark/sql/session.py:280: error: Decorated property not supported
   pyspark/sql/session.py:286: error: Decorated property not supported
   pyspark/sql/session.py:292: error: Decorated property not supported
   pyspark/sql/session.py:305: error: Decorated property not supported
   pyspark/sql/session.py:318: error: Decorated property not supported
   pyspark/sql/session.py:647: error: Decorated property not supported
   pyspark/sql/session.py:658: error: Decorated property not supported
   pyspark/sql/session.py:671: error: Decorated property not supported
   pyspark/sql/dataframe.py:79: error: Decorated property not supported
   pyspark/sql/dataframe.py:89: error: Decorated property not supported
   pyspark/sql/dataframe.py:96: error: Decorated property not supported
   pyspark/sql/dataframe.py:212: error: Decorated property not supported
   pyspark/sql/dataframe.py:223: error: Decorated property not supported
   pyspark/sql/dataframe.py:236: error: Decorated property not supported
   pyspark/sql/dataframe.py:380: error: Decorated property not supported
   pyspark/sql/dataframe.py:694: error: Decorated property not supported
   pyspark/sql/dataframe.py:1002: error: Decorated property not supported
   pyspark/sql/dataframe.py:1012: error: Decorated property not supported
   pyspark/sql/context.py:157: error: Decorated property not supported
   pyspark/sql/context.py:434: error: Decorated property not supported
   pyspark/sql/context.py:445: error: Decorated property not supported
   pyspark/sql/context.py:462: error: Decorated property not supported
   pyspark/ml/linalg/__init__.py:434: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:435: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:436: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:437: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:438: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:439: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:440: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:441: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:442: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:443: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:444: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/ml/linalg/__init__.py:445: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/testing/sqlutils.py:150: error: Incompatible types in assignment (expression has type "PythonOnlyUDT", base class "ExamplePoint" defined the type as "ExamplePointUDT")
   pyspark/mllib/evaluation.py:69: error: Decorated property not supported
   pyspark/mllib/evaluation.py:78: error: Decorated property not supported
   pyspark/mllib/evaluation.py:136: error: Decorated property not supported
   pyspark/mllib/evaluation.py:145: error: Decorated property not supported
   pyspark/mllib/evaluation.py:154: error: Decorated property not supported
   pyspark/mllib/evaluation.py:163: error: Decorated property not supported
   pyspark/mllib/evaluation.py:172: error: Decorated property not supported
   pyspark/mllib/evaluation.py:315: error: Decorated property not supported
   pyspark/mllib/evaluation.py:324: error: Decorated property not supported
   pyspark/mllib/evaluation.py:333: error: Decorated property not supported
   pyspark/mllib/evaluation.py:341: error: Decorated property not supported
   pyspark/mllib/evaluation.py:350: error: Decorated property not supported
   pyspark/mllib/evaluation.py:436: error: Decorated property not supported
   pyspark/mllib/evaluation.py:562: error: Decorated property not supported
   pyspark/mllib/evaluation.py:571: error: Decorated property not supported
   pyspark/mllib/evaluation.py:580: error: Decorated property not supported
   pyspark/mllib/evaluation.py:589: error: Decorated property not supported
   pyspark/mllib/evaluation.py:597: error: Decorated property not supported
   pyspark/mllib/evaluation.py:606: error: Decorated property not supported
   pyspark/ml/util.py:564: error: Decorated property not supported
   pyspark/ml/util.py:573: error: Decorated property not supported
   pyspark/ml/image.py:212: error: Cannot assign to a method
   pyspark/ml/base.py:311: error: Decorated property not supported
   pyspark/ml/wrapper.py:391: error: Decorated property not supported
   pyspark/ml/tuning.py:29: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/ml/tuning.py:29: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/ml/tree.py:32: error: Decorated property not supported
   pyspark/ml/tree.py:38: error: Decorated property not supported
   pyspark/ml/tree.py:44: error: Decorated property not supported
   pyspark/ml/tree.py:169: error: Decorated property not supported
   pyspark/ml/tree.py:175: error: Decorated property not supported
   pyspark/ml/tree.py:181: error: Decorated property not supported
   pyspark/ml/tree.py:187: error: Decorated property not supported
   pyspark/ml/tree.py:193: error: Decorated property not supported
   pyspark/ml/stat.py:25: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/ml/recommendation.py:499: error: Decorated property not supported
   pyspark/ml/recommendation.py:505: error: Decorated property not supported
   pyspark/ml/recommendation.py:514: error: Decorated property not supported
   pyspark/ml/fpm.py:116: error: Decorated property not supported
   pyspark/ml/fpm.py:126: error: Decorated property not supported
   pyspark/ml/feature.py:870: error: Decorated property not supported
   pyspark/ml/feature.py:1406: error: Decorated property not supported
   pyspark/ml/feature.py:1414: error: Decorated property not supported
   pyspark/ml/feature.py:1422: error: Decorated property not supported
   pyspark/ml/feature.py:1689: error: Decorated property not supported
   pyspark/ml/feature.py:1876: error: Decorated property not supported
   pyspark/ml/feature.py:2158: error: Decorated property not supported
   pyspark/ml/feature.py:2166: error: Decorated property not supported
   pyspark/ml/feature.py:2550: error: Decorated property not supported
   pyspark/ml/feature.py:3061: error: Decorated property not supported
   pyspark/ml/feature.py:3069: error: Decorated property not supported
   pyspark/ml/feature.py:3425: error: Decorated property not supported
   pyspark/ml/feature.py:3433: error: Decorated property not supported
   pyspark/ml/feature.py:3702: error: Decorated property not supported
   pyspark/ml/feature.py:4286: error: Decorated property not supported
   pyspark/ml/feature.py:4294: error: Decorated property not supported
   pyspark/ml/feature.py:4801: error: Decorated property not supported
   pyspark/ml/feature.py:4810: error: Decorated property not supported
   pyspark/ml/feature.py:5194: error: Decorated property not supported
   pyspark/ml/feature.py:5728: error: Decorated property not supported
   pyspark/ml/regression.py:319: error: Decorated property not supported
   pyspark/ml/regression.py:327: error: Decorated property not supported
   pyspark/ml/regression.py:335: error: Decorated property not supported
   pyspark/ml/regression.py:343: error: Decorated property not supported
   pyspark/ml/regression.py:379: error: Decorated property not supported
   pyspark/ml/regression.py:387: error: Decorated property not supported
   pyspark/ml/regression.py:396: error: Decorated property not supported
   pyspark/ml/regression.py:405: error: Decorated property not supported
   pyspark/ml/regression.py:414: error: Decorated property not supported
   pyspark/ml/regression.py:430: error: Decorated property not supported
   pyspark/ml/regression.py:444: error: Decorated property not supported
   pyspark/ml/regression.py:458: error: Decorated property not supported
   pyspark/ml/regression.py:471: error: Decorated property not supported
   pyspark/ml/regression.py:486: error: Decorated property not supported
   pyspark/ml/regression.py:500: error: Decorated property not supported
   pyspark/ml/regression.py:508: error: Decorated property not supported
   pyspark/ml/regression.py:516: error: Decorated property not supported
   pyspark/ml/regression.py:524: error: Decorated property not supported
   pyspark/ml/regression.py:533: error: Decorated property not supported
   pyspark/ml/regression.py:547: error: Decorated property not supported
   pyspark/ml/regression.py:561: error: Decorated property not supported
   pyspark/ml/regression.py:585: error: Decorated property not supported
   pyspark/ml/regression.py:597: error: Decorated property not supported
   pyspark/ml/regression.py:777: error: Decorated property not supported
   pyspark/ml/regression.py:785: error: Decorated property not supported
   pyspark/ml/regression.py:794: error: Decorated property not supported
   pyspark/ml/regression.py:1034: error: Decorated property not supported
   pyspark/ml/regression.py:1286: error: Decorated property not supported
   pyspark/ml/regression.py:1292: error: Decorated property not supported
   pyspark/ml/regression.py:1585: error: Decorated property not supported
   pyspark/ml/regression.py:1600: error: Decorated property not supported
   pyspark/ml/regression.py:1843: error: Decorated property not supported
   pyspark/ml/regression.py:1851: error: Decorated property not supported
   pyspark/ml/regression.py:1859: error: Decorated property not supported
   pyspark/ml/regression.py:2176: error: Decorated property not supported
   pyspark/ml/regression.py:2184: error: Decorated property not supported
   pyspark/ml/regression.py:2192: error: Decorated property not supported
   pyspark/ml/regression.py:2229: error: Decorated property not supported
   pyspark/ml/regression.py:2237: error: Decorated property not supported
   pyspark/ml/regression.py:2246: error: Decorated property not supported
   pyspark/ml/regression.py:2254: error: Decorated property not supported
   pyspark/ml/regression.py:2262: error: Decorated property not supported
   pyspark/ml/regression.py:2270: error: Decorated property not supported
   pyspark/ml/regression.py:2278: error: Decorated property not supported
   pyspark/ml/regression.py:2296: error: Decorated property not supported
   pyspark/ml/regression.py:2304: error: Decorated property not supported
   pyspark/ml/regression.py:2312: error: Decorated property not supported
   pyspark/ml/regression.py:2323: error: Decorated property not supported
   pyspark/ml/regression.py:2340: error: Decorated property not supported
   pyspark/ml/regression.py:2348: error: Decorated property not supported
   pyspark/ml/regression.py:2356: error: Decorated property not supported
   pyspark/ml/regression.py:2367: error: Decorated property not supported
   pyspark/ml/regression.py:2378: error: Decorated property not supported
   pyspark/ml/regression.py:2624: error: Decorated property not supported
   pyspark/ml/regression.py:2632: error: Decorated property not supported
   pyspark/ml/regression.py:2640: error: Decorated property not supported
   pyspark/ml/clustering.py:42: error: Decorated property not supported
   pyspark/ml/clustering.py:50: error: Decorated property not supported
   pyspark/ml/clustering.py:58: error: Decorated property not supported
   pyspark/ml/clustering.py:66: error: Decorated property not supported
   pyspark/ml/clustering.py:74: error: Decorated property not supported
   pyspark/ml/clustering.py:82: error: Decorated property not supported
   pyspark/ml/clustering.py:90: error: Decorated property not supported
   pyspark/ml/clustering.py:153: error: Decorated property not supported
   pyspark/ml/clustering.py:163: error: Decorated property not supported
   pyspark/ml/clustering.py:176: error: Decorated property not supported
   pyspark/ml/clustering.py:186: error: Decorated property not supported
   pyspark/ml/clustering.py:445: error: Decorated property not supported
   pyspark/ml/clustering.py:453: error: Decorated property not supported
   pyspark/ml/clustering.py:461: error: Decorated property not supported
   pyspark/ml/clustering.py:477: error: Decorated property not supported
   pyspark/ml/clustering.py:560: error: Decorated property not supported
   pyspark/ml/clustering.py:824: error: Decorated property not supported
   pyspark/ml/clustering.py:1017: error: Decorated property not supported
   pyspark/ml/classification.py:101: error: Decorated property not supported
   pyspark/ml/classification.py:208: error: Decorated property not supported
   pyspark/ml/classification.py:256: error: Decorated property not supported
   pyspark/ml/classification.py:264: error: Decorated property not supported
   pyspark/ml/classification.py:272: error: Decorated property not supported
   pyspark/ml/classification.py:281: error: Decorated property not supported
   pyspark/ml/classification.py:290: error: Decorated property not supported
   pyspark/ml/classification.py:304: error: Decorated property not supported
   pyspark/ml/classification.py:312: error: Decorated property not supported
   pyspark/ml/classification.py:320: error: Decorated property not supported
   pyspark/ml/classification.py:328: error: Decorated property not supported
   pyspark/ml/classification.py:343: error: Decorated property not supported
   pyspark/ml/classification.py:353: error: Decorated property not supported
   pyspark/ml/classification.py:362: error: Decorated property not supported
   pyspark/ml/classification.py:370: error: Decorated property not supported
   pyspark/ml/classification.py:379: error: Decorated property not supported
   pyspark/ml/classification.py:403: error: Decorated property not supported
   pyspark/ml/classification.py:413: error: Decorated property not supported
   pyspark/ml/classification.py:430: error: Decorated property not supported
   pyspark/ml/classification.py:439: error: Decorated property not supported
   pyspark/ml/classification.py:452: error: Decorated property not supported
   pyspark/ml/classification.py:461: error: Decorated property not supported
   pyspark/ml/classification.py:471: error: Decorated property not supported
   pyspark/ml/classification.py:480: error: Decorated property not supported
   pyspark/ml/classification.py:490: error: Decorated property not supported
   pyspark/ml/classification.py:709: error: Decorated property not supported
   pyspark/ml/classification.py:717: error: Decorated property not supported
   pyspark/ml/classification.py:1182: error: Decorated property not supported
   pyspark/ml/classification.py:1191: error: Decorated property not supported
   pyspark/ml/classification.py:1200: error: Decorated property not supported
   pyspark/ml/classification.py:1208: error: Decorated property not supported
   pyspark/ml/classification.py:1216: error: Decorated property not supported
   pyspark/ml/classification.py:1259: error: Decorated property not supported
   pyspark/ml/classification.py:1268: error: Decorated property not supported
   pyspark/ml/classification.py:1525: error: Decorated property not supported
   pyspark/ml/classification.py:1786: error: Decorated property not supported
   pyspark/ml/classification.py:1801: error: Decorated property not supported
   pyspark/ml/classification.py:1807: error: Decorated property not supported
   pyspark/ml/classification.py:2181: error: Decorated property not supported
   pyspark/ml/classification.py:2196: error: Decorated property not supported
   pyspark/ml/classification.py:2399: error: Decorated property not supported
   pyspark/ml/classification.py:2407: error: Decorated property not supported
   pyspark/ml/classification.py:2415: error: Decorated property not supported
   pyspark/ml/classification.py:2632: error: Decorated property not supported
   pyspark/ml/classification.py:3253: error: Decorated property not supported
   pyspark/ml/classification.py:3261: error: Decorated property not supported
   pyspark/ml/classification.py:3269: error: Decorated property not supported
   pyspark/sql/tests/test_udf.py:362: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_udf.py:379: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_udf.py:569: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_types.py:28: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_streaming.py:23: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_serde.py:24: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_readwriter.py:22: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'mean'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'min'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'max'
   pyspark/sql/tests/test_pandas_udf_window.py:21: error: Module 'pyspark.sql.functions' has no attribute 'rank'
   pyspark/sql/tests/test_pandas_udf_window.py:34: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_udf_typehints.py:21: error: Module 'pyspark.sql.functions' has no attribute 'mean'
   pyspark/sql/tests/test_pandas_udf_typehints.py:21: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_udf_typehints.py:37: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_udf_scalar.py:30: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_udf_scalar.py:30: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_udf_scalar.py:30: error: Module 'pyspark.sql.functions' has no attribute 'sum'
   pyspark/sql/tests/test_pandas_udf_scalar.py:49: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_udf_scalar.py:1098: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_udf_grouped_agg.py:22: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_udf_grouped_agg.py:22: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_udf_grouped_agg.py:22: error: Module 'pyspark.sql.functions' has no attribute 'mean'
   pyspark/sql/tests/test_pandas_udf_grouped_agg.py:22: error: Module 'pyspark.sql.functions' has no attribute 'sum'
   pyspark/sql/tests/test_pandas_udf_grouped_agg.py:37: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_udf.py:31: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_map.py:32: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_grouped_map.py:26: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_grouped_map.py:26: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_grouped_map.py:26: error: Module 'pyspark.sql.functions' has no attribute 'sum'
   pyspark/sql/tests/test_pandas_grouped_map.py:43: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_pandas_cogrouped_map.py:21: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_pandas_cogrouped_map.py:21: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_pandas_cogrouped_map.py:21: error: Module 'pyspark.sql.functions' has no attribute 'sum'
   pyspark/sql/tests/test_pandas_cogrouped_map.py:37: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_functions.py:24: error: Module 'pyspark.sql.functions' has no attribute 'col'
   pyspark/sql/tests/test_functions.py:24: error: Module 'pyspark.sql.functions' has no attribute 'lit'
   pyspark/sql/tests/test_dataframe.py:522: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:534: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:547: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:573: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:585: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:605: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:633: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:661: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:689: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_dataframe.py:893: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_arrow.py:43: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/sql/tests/test_arrow.py:460: error: Argument 2 to "skipIf" has incompatible type "Optional[str]"; expected "str"
   pyspark/mllib/linalg/__init__.py:479: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:480: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:481: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:482: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:483: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:484: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:485: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:486: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:487: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:488: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:489: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:490: error: Argument 1 to "_delegate" has incompatible type "str"; expected "DenseVector"
   pyspark/mllib/linalg/__init__.py:1339: error: Decorated property not supported
   pyspark/mllib/linalg/__init__.py:1348: error: Decorated property not supported
   pyspark/mllib/regression.py:84: error: Decorated property not supported
   pyspark/mllib/regression.py:90: error: Decorated property not supported
   pyspark/mllib/classification.py:58: error: Decorated property not supported
   pyspark/mllib/classification.py:177: error: Decorated property not supported
   pyspark/mllib/classification.py:185: error: Decorated property not supported
   pyspark/testing/streamingutils.py:50: error: Incompatible types in assignment (expression has type "None", variable has type "str")
   pyspark/mllib/recommendation.py:200: error: Decorated property not supported
   pyspark/mllib/feature.py:161: error: Decorated property not supported
   pyspark/mllib/feature.py:169: error: Decorated property not supported
   pyspark/mllib/feature.py:177: error: Decorated property not supported
   pyspark/mllib/feature.py:185: error: Decorated property not supported
   pyspark/mllib/clustering.py:62: error: Decorated property not supported
   pyspark/mllib/clustering.py:69: error: Decorated property not supported
   pyspark/mllib/clustering.py:223: error: Decorated property not supported
   pyspark/mllib/clustering.py:229: error: Decorated property not supported
   pyspark/mllib/clustering.py:417: error: Decorated property not supported
   pyspark/mllib/clustering.py:426: error: Decorated property not supported
   pyspark/mllib/clustering.py:437: error: Decorated property not supported
   pyspark/mllib/clustering.py:596: error: Decorated property not supported
   pyspark/mllib/clustering.py:733: error: Decorated property not supported
   pyspark/mllib/linalg/distributed.py:400: error: Decorated property not supported
   pyspark/mllib/linalg/distributed.py:417: error: Decorated property not supported
   pyspark/mllib/linalg/distributed.py:425: error: Decorated property not supported
   Found 364 errors in 54 files (checked 186 source files)
   ```
   And it enables the mypy check so we can see where it will fail in the future. This allows us to integrates the types file by file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
holdenk edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663313791


   > @holdenk, I think there is a bit of differences between Scala/Java and Python. For example, Python type hinting is optional and still premature.
   
   Optional yes.
   Why do you believe it is premature? The PEP is 6 years old ( https://www.python.org/dev/peps/pep-0484/ ).
   I did an informal survey once we decided to drop old versions of Python and supporting typing seems fairly popular ( https://twitter.com/holdenkarau/status/1282486803249762304 ). We could do a survey with users@ if you don't think this feature would be impactful.
   
   > 
   > We could think about adding the type hints into the code but maybe we should port it first (or have a separate repo) and see how it goes to the end-users. Also, porting/managing stub files separately is much easier than fixing them in the codes in place. It will take a low risk and same output. For example, fixing the codes will likely bring some mistakes. As a bonus, that will likely keep the author as @zero323 easily as it will be ported as is ideally in one PR.
   
   If folks are against in-line types in the Python code I suppose porting the hints file is an OK compromise.
   I'd rather have them in-line as a developer but, if our options are no types or having it live in a separate file I'll take the second one. How should we best keep the type hints up to date if it's in a seperate file? Could we add something to the PR templated for Python requests?
   
   As for a separate repo what do you envision?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r483431478



##########
File path: dev/lint-python
##########
@@ -122,6 +123,32 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        # exit 1
+        pip3 install mypy

Review comment:
       I've created a ticket in Jenkins: https://issues.apache.org/jira/browse/SPARK-32797




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663441753


   > For example, Python type hinting is optional and still premature.
   
   This is definitely not true anymore with Python 3.6. You could say this with Python <=3.5, but it has evolved quite well over time. I haven't seen any project that got worse by having type-hints.
   
   > It would mean going through the incubation process. Which is doable, but rather a lot of work and not something I would suggest someone else up to do if I wanted them to talk to me again. That being said if we don't have a consesus on bringing it in tree in some form, I'd be happy to serve as one of the mentors for incubation (but I'd rather not have the types in a seperate project).
   
   I think this should be possible, but this requires a lot of documents to be signed. Not sure if it requires a full incubation process, or if we can adopt it as a part of the Spark repository. An example that we had at Airflow, Google donated the Airflow Kubernetes operator, and this was just a separate repo under the Airflow project.
   
   However, having it in a separate repo would make it more cumbersome to keep everything in sync. For example, if you change a signature, you need to do this in both repositories.
   
   > Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I guess certainly we don't want to manage multiple versions of stub files or have such typing in codes with if-else of Python versions.
   
   From PEP544 itself: Therefore, in this PEP we do not propose to replace the nominal subtyping described by PEP 484 with structural subtyping completely. Instead, protocol classes as specified in this PEP complement normal classes, and users are free to choose where to apply a particular solution. See section on rejected ideas at the end of this PEP for additional motivation.
   
   Also, in the numpy annotations, we already have some awkward stuff:
   ```python
   if sys.version_info >= (3, 8):
       from typing import Literal, Protocol
   else:
       from typing_extensions import Literal, Protocol
   ```
   But I guess this is something that we have to accept. For the current annotations, there is also one single version for all the python versions, and I would suggest to keep it like that for maintainability's sake.
   
   > Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I guess certainly we don't want to manage multiple versions of stub files or have such typing in codes with if-else of Python versions.
   
   Exactly, I would keep one single version for all version of Python that we support :)
   
   > Ok sounds like that's our path forward then if @Fokko , @zero323 and other folks are K with that?
   
   Sounds good to me. I'm in favor of having it inline, but having the annotation in the pyi is a good first step :)
   
   > Yes, so I meant to port stubs files into python/pyspark. Maybe we could discuss more about how we'll port in the ongoing dev mailing thread.
   
   Sounds good. I also checked that we can keep track of the type coverage using MyPy, so we can keep track of what's covered and what not. We could generate a list of files that lack coverage and ask the community to help to increase coverage. I already know a lot of people who want to spend some time on this, and this would also be a good candidate for the Outreachy Program.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 edited a comment on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
zero323 edited a comment on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663558139


   > > For example, Python type hinting is optional and still premature.
   > 
   > This is definitely not true anymore with Python 3.6. You could say this with Python <=3.5, but it has evolved quite well over time.
   
   I am not sure if I can fully agree here. Hints are more than Python version, and parts of the ecosystem evolve faster than Python itself. For example, with pyspark-stubs I had to make significant adjustments 
   
   > I haven't seen any project that got worse by having type-hints.
   
   That's probably true, but not every project can actually benefit from hints (though that's not the case here). 
   
   
   > @HyukjinKwon : we have the types along with the code in Scala and Java. I think having the types along side the code makes it easier to remember to update them and gives hints while someones reading the code.
   
   I think it is more about good workflow than anything else. That being said I think we should keep in mind that type hints don't serve the same purpose as type in statically typed languages and certain constructs (like already mention structural sub-typing) can mean that we have no actual code to annotate (you can check all the `_typing` modules in `pyspark-stubs` for examples).
   
   Additionally certain parts of the API might require [somewhat ugly strategies](https://github.com/zero323/pyspark-stubs/blob/master/third_party/3/pyspark/sql/pandas/_typing/__init__.pyi) to get something useful in daily practice.  
   
   
   
   > Sounds good to me. I'm in favor of having it inline, but having the annotation in the pyi is a good first step :)
   
   In-lining can be mostly automated (though AFAIK no tool at the moment support  inlining of `overloaded` signatures. Speaking about stability of ecosystem).
   
   > @zero323 The annotations aren't random at all.
   
   Sorry for the choice of the word - my point is that we should look beyond making mypy happy. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711388648


   Thanks for the comment @kyprifog. I was under the impression that the PR was already superseded, but this wasn't the case. I've cleaned it up, processed the comments. Please let me know what you think


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko closed pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko closed pull request #29180:
URL: https://github.com/apache/spark/pull/29180


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-662106495


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663558139


   > > For example, Python type hinting is optional and still premature.
   > 
   > This is definitely not true anymore with Python 3.6. You could say this with Python <=3.5, but it has evolved quite well over time.
   
   I am not sure if I can fully agree here. Hints are more than Python version, and parts of the ecosystem evolve faster than Python itself. For example, with pyspark-stubs I had to make significant adjustments 
   
   > I haven't seen any project that got worse by having type-hints.
   
   That's probably true, but not every project can actually benefit from hints (though that's not the case here). 
   
   
   > @HyukjinKwon : we have the types along with the code in Scala and Java. I think having the types along side the code makes it easier to remember to update them and gives hints while someones reading the code.
   
   I think it is more about good workflow than anything else. That being said I think we should keep in mind that type hints don't serve the same purpose as type in statically typed languages and certain constructs (like already mention structural sub-typing) can mean that we have no actual code to annotate (you can check all the `_typing` modules in `pyspark-stubs` for examples).
   
   Additionally certain parts of the API might require [somewhat ugly strategies](https://github.com/zero323/pyspark-stubs/blob/master/third_party/3/pyspark/sql/pandas/_typing/__init__.pyi) to get something useful in daily practice.  
   
   
   
   > Sounds good to me. I'm in favor of having it inline, but having the annotation in the pyi is a good first step :)
   
   In-lining can be mostly automated (though AFAIK no tool at the moment support  inlining of `overloaded` signatures. Speaking about stability of ecosystem).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r507199748



##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi

Review comment:
       Better, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-711392076


   I've force pushed, so I recreated the PR under https://github.com/apache/spark/pull/30088


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663320589


   > That's from three years ago. Scala type system has certainly evolved more than once every three years.
   
   Python we should support Python 3.6+ right? It is accepted to the latest Python 3.8. If the fundamental way of typing changes in the latest Python version, I would say it's still premature and evolving. I would certainly guess we want to manage multiple versions of stub files.
   
   Yes, so I meant to port stubs files into `python/pyspark`. Maybe we could discuss more about how we'll port in the ongoing thread.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29180:
URL: https://github.com/apache/spark/pull/29180#issuecomment-663319198


   > It is getting popular and more stable but the typing hint is still being evolved even between the minor versions. See https://docs.python.org/3/library/typing.html. One example might be [PEP 544](https://www.python.org/dev/peps/pep-0544/) that allows structural subtyping (Python 3.8). It could change the fundamental way of how we type.
   
   That's from three years ago. Scala type system has certainly evolved more than once every three years.
   
   > 
   > I don't strongly object to have the codes inlined but I would think it's more safer and easier way to port the files first.
   
   Ok sounds like that's our path forward then if @Fokko , @zero323 and other folks are K with that?
   
   > 
   > > How should we best keep the type hints up to date if it's in a seperate file?
   > > Could we add something to the PR templated for Python requests?
   > > As for a separate repo what do you envision?
   > 
   > I am sure we will figure a way out. For example,
   > 
   > * We could take NumPy approach e.g. [numpy/**init**.pyi](https://github.com/numpy/numpy/blob/master/numpy/__init__.pyi).
   
   Isn't that the same repo? Or am I missing something?
   
   > * Or we can port his repo to ASF (is it feasible?).
   
   It would mean going through the incubation process. Which is doable, but rather a lot of work and not something I would suggest someone else up to do if I wanted them to talk to me again. That being said if we don't have a consesus on bringing it in tree in some form, I'd be happy to serve as one of the mentors for incubation (but I'd rather not have the types in a seperate project).
   
   > 
   > I think IDE can detect. If I am not mistaken, mypy can detect too as a linter. If dev people find it confusing, we should maybe document it in PR template as you suggest or in the contribution guide.
   
   Sounds good to me :)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org