You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/26 19:48:05 UTC

[GitHub] [spark] zero323 commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

zero323 commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r495487007



##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi

Review comment:
       I think we should make this optional for now, i.e.
   
   ```bash
       if ! hash "$MYPY_BUILD" 2> /dev/null; then
           echo "The $MYPY_BUILD command was not found. Skipping MyPy test check build for now."
           echo
           return
       fi
   ```

##########
File path: python/pyspark/serializers.pyi
##########
@@ -0,0 +1,128 @@
+#

Review comment:
       This has been already ported.

##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi
+
+    echo "starting $MYPY_BUILD test..."
+    MYPY_REPORT=$( ($MYPY_BUILD --ignore-missing-imports --config-file python/mypy.ini python/) 2>&1)

Review comment:
       I wouldn't use `--ignore-missing-imports` here. It is an atomic option.
   
   Instead I would use either specific excludes (as we have right now in mypy.ini) or targeted `type: ignore` (as in some test modules and such).

##########
File path: python/pyspark/accumulators.pyi
##########
@@ -0,0 +1,76 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/pyspark/broadcast.pyi
##########
@@ -0,0 +1,51 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/mypy.ini
##########
@@ -0,0 +1,20 @@
+[mypy]
+python_version = 3.6
+
+[mypy-pyspark.cloudpickle.*]
+ignore_errors = True
+
+[mypy-pyspark.streaming.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.sql.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.testing.*]
+ignore_errors = True
+
+[mypy-pyspark.tests.*]
+ignore_errors = True

Review comment:
       I am strongly against excluding tests and testing here.
   
   These already type check, with only a bunch of `ignores` and  it is an additional coverage.

##########
File path: dev/lint-python
##########
@@ -246,6 +272,7 @@ PYTHON_SOURCE="$(find . -name "*.py")"
 compile_python_test "$PYTHON_SOURCE"
 pycodestyle_test "$PYTHON_SOURCE"
 flake8_test
+mypy_test

Review comment:
       Personally I'd consider having type checks in a separate script, as we might want to extend things over time (I plan to add tests on examples, same as in [`pyspark-stubs`](https://github.com/zero323/pyspark-stubs/blob/d317c27083549b6727542d825e642ad91f275f9d/.travis.yml#L18) and thinking about migrating original data driven tests).
   
   But it is something we can discuss later.

##########
File path: python/pyspark/mllib/linalg/__init__.py
##########
@@ -469,7 +469,7 @@ def __getattr__(self, item):
     def __neg__(self):
         return DenseVector(-self.array)
 
-    def _delegate(op):
+    def _delegate(op: str):  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/ml/base.py
##########
@@ -309,7 +309,8 @@ def setPredictionCol(self, value):
         """
         return self._set(predictionCol=value)
 
-    @abstractproperty
+    @property  # type: ignore

Review comment:
       These ignores are not strictly required now, but I think it is OK to keep them, as there is not real harm here and they're likely to prove required in the future.

##########
File path: python/pyspark/sql/session.py
##########
@@ -76,7 +76,7 @@ class Builder(object):
         """
 
         _lock = RLock()
-        _options = {}
+        _options = {}  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/sql/types.pyi
##########
@@ -0,0 +1,238 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/setup.py
##########
@@ -28,7 +28,8 @@
     print("Failed to load PySpark version file for packaging. You must be in Spark's python dir.",
           file=sys.stderr)
     sys.exit(-1)
-VERSION = __version__  # noqa
+
+VERSION = __version__  # type: ignore  # noqa

Review comment:
       I don't think it is required

##########
File path: python/pyspark/util.pyi
##########
@@ -0,0 +1,33 @@
+#

Review comment:
       This has been already ported, though I am not sure if we're going to consider it a part of the API.

##########
File path: python/pyspark/storagelevel.py
##########
@@ -19,6 +19,14 @@
 
 
 class StorageLevel(object):
+    DISK_ONLY: 'StorageLevel'

Review comment:
       This is already covered with stubs and I think it is better to avoid annotations in `.py` unless they're strictly needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org