You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/15 08:08:52 UTC

[GitHub] [spark] Fokko opened a new pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Fokko opened a new pull request #29121:
URL: https://github.com/apache/spark/pull/29121


   Checked with flake8 to remove the unused imports, to enforce static analysis of the code in the future.
   
   Unused imports are bad because they are loaded into memory, and don't add any value :)
   
   ```
   fokkodriesprong@Fan spark % flake8 python | grep -i "imported but unused"
   python/pyspark/cloudpickle.py:46:1: F401 'functools.partial' imported but unused
   python/pyspark/cloudpickle.py:55:1: F401 'traceback' imported but unused
   python/pyspark/heapq3.py:868:5: F401 '_heapq.*' imported but unused
   python/pyspark/__init__.py:61:1: F401 'pyspark.version.__version__' imported but unused
   python/pyspark/__init__.py:62:1: F401 'pyspark._globals._NoValue' imported but unused
   python/pyspark/__init__.py:115:1: F401 'pyspark.sql.SQLContext' imported but unused
   python/pyspark/__init__.py:115:1: F401 'pyspark.sql.HiveContext' imported but unused
   python/pyspark/__init__.py:115:1: F401 'pyspark.sql.Row' imported but unused
   python/pyspark/rdd.py:21:1: F401 're' imported but unused
   python/pyspark/rdd.py:29:1: F401 'tempfile.NamedTemporaryFile' imported but unused
   python/pyspark/mllib/regression.py:26:1: F401 'pyspark.mllib.linalg.SparseVector' imported but unused
   python/pyspark/mllib/clustering.py:28:1: F401 'pyspark.mllib.linalg.SparseVector' imported but unused
   python/pyspark/mllib/clustering.py:28:1: F401 'pyspark.mllib.linalg.DenseVector' imported but unused
   python/pyspark/mllib/classification.py:26:1: F401 'pyspark.mllib.linalg.SparseVector' imported but unused
   python/pyspark/mllib/feature.py:28:1: F401 'pyspark.mllib.linalg.DenseVector' imported but unused
   python/pyspark/mllib/feature.py:28:1: F401 'pyspark.mllib.linalg.SparseVector' imported but unused
   python/pyspark/mllib/feature.py:30:1: F401 'pyspark.mllib.regression.LabeledPoint' imported but unused
   python/pyspark/mllib/tests/test_linalg.py:18:1: F401 'sys' imported but unused
   python/pyspark/mllib/tests/test_linalg.py:642:5: F401 'pyspark.mllib.tests.test_linalg.*' imported but unused
   python/pyspark/mllib/tests/test_feature.py:21:1: F401 'numpy.random' imported but unused
   python/pyspark/mllib/tests/test_feature.py:21:1: F401 'numpy.exp' imported but unused
   python/pyspark/mllib/tests/test_feature.py:23:1: F401 'pyspark.mllib.linalg.Vector' imported but unused
   python/pyspark/mllib/tests/test_feature.py:23:1: F401 'pyspark.mllib.linalg.VectorUDT' imported but unused
   python/pyspark/mllib/tests/test_feature.py:185:5: F401 'pyspark.mllib.tests.test_feature.*' imported but unused
   python/pyspark/mllib/tests/test_util.py:97:5: F401 'pyspark.mllib.tests.test_util.*' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.Vector' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.SparseVector' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.DenseVector' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.VectorUDT' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg._convert_to_vector' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.DenseMatrix' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.SparseMatrix' imported but unused
   python/pyspark/mllib/tests/test_stat.py:23:1: F401 'pyspark.mllib.linalg.MatrixUDT' imported but unused
   python/pyspark/mllib/tests/test_stat.py:181:5: F401 'pyspark.mllib.tests.test_stat.*' imported but unused
   python/pyspark/mllib/tests/test_streaming_algorithms.py:18:1: F401 'time.time' imported but unused
   python/pyspark/mllib/tests/test_streaming_algorithms.py:18:1: F401 'time.sleep' imported but unused
   python/pyspark/mllib/tests/test_streaming_algorithms.py:470:5: F401 'pyspark.mllib.tests.test_streaming_algorithms.*' imported but unused
   python/pyspark/mllib/tests/test_algorithms.py:295:5: F401 'pyspark.mllib.tests.test_algorithms.*' imported but unused
   python/pyspark/tests/test_serializers.py:90:13: F401 'xmlrunner' imported but unused
   python/pyspark/tests/test_rdd.py:21:1: F401 'sys' imported but unused
   python/pyspark/tests/test_rdd.py:29:1: F401 'pyspark.resource.ResourceProfile' imported but unused
   python/pyspark/tests/test_rdd.py:885:5: F401 'pyspark.tests.test_rdd.*' imported but unused
   python/pyspark/tests/test_readwrite.py:19:1: F401 'sys' imported but unused
   python/pyspark/tests/test_readwrite.py:22:1: F401 'array.array' imported but unused
   python/pyspark/tests/test_readwrite.py:309:5: F401 'pyspark.tests.test_readwrite.*' imported but unused
   python/pyspark/tests/test_join.py:62:5: F401 'pyspark.tests.test_join.*' imported but unused
   python/pyspark/tests/test_taskcontext.py:19:1: F401 'shutil' imported but unused
   python/pyspark/tests/test_taskcontext.py:325:5: F401 'pyspark.tests.test_taskcontext.*' imported but unused
   python/pyspark/tests/test_conf.py:36:5: F401 'pyspark.tests.test_conf.*' imported but unused
   python/pyspark/tests/test_broadcast.py:148:5: F401 'pyspark.tests.test_broadcast.*' imported but unused
   python/pyspark/tests/test_daemon.py:76:5: F401 'pyspark.tests.test_daemon.*' imported but unused
   python/pyspark/tests/test_util.py:77:5: F401 'pyspark.tests.test_util.*' imported but unused
   python/pyspark/tests/test_pin_thread.py:19:1: F401 'random' imported but unused
   python/pyspark/tests/test_pin_thread.py:149:5: F401 'pyspark.tests.test_pin_thread.*' imported but unused
   python/pyspark/tests/test_worker.py:19:1: F401 'sys' imported but unused
   python/pyspark/tests/test_worker.py:26:5: F401 'resource' imported but unused
   python/pyspark/tests/test_worker.py:203:5: F401 'pyspark.tests.test_worker.*' imported but unused
   python/pyspark/tests/test_profiler.py:101:5: F401 'pyspark.tests.test_profiler.*' imported but unused
   python/pyspark/tests/test_shuffle.py:18:1: F401 'sys' imported but unused
   python/pyspark/tests/test_shuffle.py:171:5: F401 'pyspark.tests.test_shuffle.*' imported but unused
   python/pyspark/tests/test_rddbarrier.py:43:5: F401 'pyspark.tests.test_rddbarrier.*' imported but unused
   python/pyspark/tests/test_context.py:129:13: F401 'userlibrary.UserClass' imported but unused
   python/pyspark/tests/test_context.py:140:13: F401 'userlib.UserClass' imported but unused
   python/pyspark/tests/test_context.py:310:5: F401 'pyspark.tests.test_context.*' imported but unused
   python/pyspark/tests/test_appsubmit.py:241:5: F401 'pyspark.tests.test_appsubmit.*' imported but unused
   python/pyspark/streaming/dstream.py:18:1: F401 'sys' imported but unused
   python/pyspark/streaming/tests/test_dstream.py:27:1: F401 'pyspark.RDD' imported but unused
   python/pyspark/streaming/tests/test_dstream.py:647:5: F401 'pyspark.streaming.tests.test_dstream.*' imported but unused
   python/pyspark/streaming/tests/test_kinesis.py:83:5: F401 'pyspark.streaming.tests.test_kinesis.*' imported but unused
   python/pyspark/streaming/tests/test_listener.py:152:5: F401 'pyspark.streaming.tests.test_listener.*' imported but unused
   python/pyspark/streaming/tests/test_context.py:178:5: F401 'pyspark.streaming.tests.test_context.*' imported but unused
   python/pyspark/testing/utils.py:30:5: F401 'scipy.sparse' imported but unused
   python/pyspark/testing/utils.py:36:5: F401 'numpy as np' imported but unused
   python/pyspark/ml/regression.py:25:1: F401 'pyspark.ml.tree._TreeEnsembleParams' imported but unused
   python/pyspark/ml/regression.py:25:1: F401 'pyspark.ml.tree._HasVarianceImpurity' imported but unused
   python/pyspark/ml/regression.py:29:1: F401 'pyspark.ml.wrapper.JavaParams' imported but unused
   python/pyspark/ml/util.py:19:1: F401 'sys' imported but unused
   python/pyspark/ml/__init__.py:25:1: F401 'pyspark.ml.pipeline' imported but unused
   python/pyspark/ml/pipeline.py:18:1: F401 'sys' imported but unused
   python/pyspark/ml/stat.py:22:1: F401 'pyspark.ml.linalg.DenseMatrix' imported but unused
   python/pyspark/ml/stat.py:22:1: F401 'pyspark.ml.linalg.Vectors' imported but unused
   python/pyspark/ml/tests/test_training_summary.py:18:1: F401 'sys' imported but unused
   python/pyspark/ml/tests/test_training_summary.py:364:5: F401 'pyspark.ml.tests.test_training_summary.*' imported but unused
   python/pyspark/ml/tests/test_linalg.py:381:5: F401 'pyspark.ml.tests.test_linalg.*' imported but unused
   python/pyspark/ml/tests/test_tuning.py:427:9: F401 'pyspark.sql.functions as F' imported but unused
   python/pyspark/ml/tests/test_tuning.py:757:5: F401 'pyspark.ml.tests.test_tuning.*' imported but unused
   python/pyspark/ml/tests/test_wrapper.py:120:5: F401 'pyspark.ml.tests.test_wrapper.*' imported but unused
   python/pyspark/ml/tests/test_feature.py:19:1: F401 'sys' imported but unused
   python/pyspark/ml/tests/test_feature.py:304:5: F401 'pyspark.ml.tests.test_feature.*' imported but unused
   python/pyspark/ml/tests/test_image.py:19:1: F401 'py4j' imported but unused
   python/pyspark/ml/tests/test_image.py:22:1: F401 'pyspark.testing.mlutils.PySparkTestCase' imported but unused
   python/pyspark/ml/tests/test_image.py:71:5: F401 'pyspark.ml.tests.test_image.*' imported but unused
   python/pyspark/ml/tests/test_persistence.py:456:5: F401 'pyspark.ml.tests.test_persistence.*' imported but unused
   python/pyspark/ml/tests/test_evaluation.py:56:5: F401 'pyspark.ml.tests.test_evaluation.*' imported but unused
   python/pyspark/ml/tests/test_stat.py:43:5: F401 'pyspark.ml.tests.test_stat.*' imported but unused
   python/pyspark/ml/tests/test_base.py:70:5: F401 'pyspark.ml.tests.test_base.*' imported but unused
   python/pyspark/ml/tests/test_param.py:20:1: F401 'sys' imported but unused
   python/pyspark/ml/tests/test_param.py:375:5: F401 'pyspark.ml.tests.test_param.*' imported but unused
   python/pyspark/ml/tests/test_pipeline.py:62:5: F401 'pyspark.ml.tests.test_pipeline.*' imported but unused
   python/pyspark/ml/tests/test_algorithms.py:333:5: F401 'pyspark.ml.tests.test_algorithms.*' imported but unused
   python/pyspark/ml/param/__init__.py:18:1: F401 'sys' imported but unused
   python/pyspark/resource/tests/test_resources.py:17:1: F401 'random' imported but unused
   python/pyspark/resource/tests/test_resources.py:20:1: F401 'pyspark.resource.ResourceProfile' imported but unused
   python/pyspark/resource/tests/test_resources.py:75:5: F401 'pyspark.resource.tests.test_resources.*' imported but unused
   python/pyspark/sql/functions.py:32:1: F401 'pyspark.sql.udf.UserDefinedFunction' imported but unused
   python/pyspark/sql/functions.py:34:1: F401 'pyspark.sql.pandas.functions.pandas_udf' imported but unused
   python/pyspark/sql/session.py:30:1: F401 'pyspark.sql.types.Row' imported but unused
   python/pyspark/sql/session.py:30:1: F401 'pyspark.sql.types.StringType' imported but unused
   python/pyspark/sql/readwriter.py:1084:5: F401 'pyspark.sql.Row' imported but unused
   python/pyspark/sql/context.py:26:1: F401 'pyspark.sql.types.IntegerType' imported but unused
   python/pyspark/sql/context.py:26:1: F401 'pyspark.sql.types.Row' imported but unused
   python/pyspark/sql/context.py:26:1: F401 'pyspark.sql.types.StringType' imported but unused
   python/pyspark/sql/context.py:27:1: F401 'pyspark.sql.udf.UDFRegistration' imported but unused
   python/pyspark/sql/streaming.py:1212:5: F401 'pyspark.sql.Row' imported but unused
   python/pyspark/sql/tests/test_utils.py:55:5: F401 'pyspark.sql.tests.test_utils.*' imported but unused
   python/pyspark/sql/tests/test_pandas_map.py:18:1: F401 'sys' imported but unused
   python/pyspark/sql/tests/test_pandas_map.py:22:1: F401 'pyspark.sql.functions.pandas_udf' imported but unused
   python/pyspark/sql/tests/test_pandas_map.py:22:1: F401 'pyspark.sql.functions.PandasUDFType' imported but unused
   python/pyspark/sql/tests/test_pandas_map.py:119:5: F401 'pyspark.sql.tests.test_pandas_map.*' imported but unused
   python/pyspark/sql/tests/test_catalog.py:193:5: F401 'pyspark.sql.tests.test_catalog.*' imported but unused
   python/pyspark/sql/tests/test_group.py:39:5: F401 'pyspark.sql.tests.test_group.*' imported but unused
   python/pyspark/sql/tests/test_session.py:361:5: F401 'pyspark.sql.tests.test_session.*' imported but unused
   python/pyspark/sql/tests/test_conf.py:49:5: F401 'pyspark.sql.tests.test_conf.*' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:19:1: F401 'sys' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:21:1: F401 'pyspark.sql.functions.sum' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:21:1: F401 'pyspark.sql.functions.PandasUDFType' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:29:5: F401 'pandas.util.testing.assert_series_equal' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:32:5: F401 'pyarrow as pa' imported but unused
   python/pyspark/sql/tests/test_pandas_cogrouped_map.py:248:5: F401 'pyspark.sql.tests.test_pandas_cogrouped_map.*' imported but unused
   python/pyspark/sql/tests/test_udf.py:24:1: F401 'py4j' imported but unused
   python/pyspark/sql/tests/test_pandas_udf_typehints.py:246:5: F401 'pyspark.sql.tests.test_pandas_udf_typehints.*' imported but unused
   python/pyspark/sql/tests/test_functions.py:19:1: F401 'sys' imported but unused
   python/pyspark/sql/tests/test_functions.py:362:9: F401 'pyspark.sql.functions.exists' imported but unused
   python/pyspark/sql/tests/test_functions.py:387:5: F401 'pyspark.sql.tests.test_functions.*' imported but unused
   python/pyspark/sql/tests/test_pandas_udf_scalar.py:21:1: F401 'sys' imported but unused
   python/pyspark/sql/tests/test_pandas_udf_scalar.py:45:5: F401 'pyarrow as pa' imported but unused
   python/pyspark/sql/tests/test_pandas_udf_window.py:355:5: F401 'pyspark.sql.tests.test_pandas_udf_window.*' imported but unused
   python/pyspark/sql/tests/test_arrow.py:38:5: F401 'pyarrow as pa' imported but unused
   python/pyspark/sql/tests/test_pandas_grouped_map.py:20:1: F401 'sys' imported but unused
   python/pyspark/sql/tests/test_pandas_grouped_map.py:38:5: F401 'pyarrow as pa' imported but unused
   python/pyspark/sql/tests/test_dataframe.py:382:9: F401 'pyspark.sql.DataFrame' imported but unused
   python/pyspark/sql/avro/functions.py:125:5: F401 'pyspark.sql.Row' imported but unused
   python/pyspark/sql/pandas/functions.py:19:1: F401 'sys' imported but unused
   ```
   
   After:
   ```
   fokkodriesprong@Fan spark % flake8 python | grep -i "imported but unused"
   fokkodriesprong@Fan spark %
   ```
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658900326


   I've added the error code to Flake8. This was already in the pipeline, but only looking for a few allowed-listed error codes.
   
   Sometime we have to suppress the warning because otherwise we would break backward compatibility. By suppressing it, we make an explicit choice of keeping an unused import.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r461360724



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Ah, these imports actually makes the output file paths correct. For example, if you run the unittests, the console outputs are like:
   
   ```
   test_blahblah (pyspark.sql.blahblah.BlahBlahTests) ... ok
   
   ----------------------------------------------------------------------
   Ran 3 tests in 0.001s
   
   OK
   ```
   
   if you remove the imports, the output becomes:
   
   ```
   test_blahblah (__main__.BlahBlahTests) ... ok
   
   ----------------------------------------------------------------------
   Ran 3 tests in 0.001s
   
   OK
   ```
   
   It's related to who we run the tests via our custom script `python/run-tests.py`. Let's leave them as are for now.

##########
File path: python/pyspark/ml/tests/test_wrapper.py
##########
@@ -116,9 +116,8 @@ def test_new_java_array(self):
         java_array = JavaWrapper._new_java_array(str_list, java_class)
         self.assertEqual(_java2py(self.sc, java_array), expected_str_list)
 
-if __name__ == "__main__":
-    from pyspark.ml.tests.test_wrapper import *

Review comment:
       So .. let's don't remove this.

##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Ah, these imports actually make the output file paths correct. For example, if you run the unittests, the console outputs are like:
   
   ```
   test_blahblah (pyspark.sql.blahblah.BlahBlahTests) ... ok
   
   ----------------------------------------------------------------------
   Ran 3 tests in 0.001s
   
   OK
   ```
   
   if you remove the imports, the output becomes:
   
   ```
   test_blahblah (__main__.BlahBlahTests) ... ok
   
   ----------------------------------------------------------------------
   Ran 3 tests in 0.001s
   
   OK
   ```
   
   It's related to how we run the tests via our custom script `python/run-tests.py`. Let's leave them as are for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670707796


   **[Test build #127217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)** for PR 29121 at commit [`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r458039269



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Can you move this forward to test? I can check if I remove the xmlrunner, what the impact is on the tests by taking a diff of the Jenkins logs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-659409553


   Can we enable this rule in `flake8` at `lint-python`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670783658


   Hey, @srowen . Did you see my first comment, "Without the rule enforcement, unused import always happens again"?
   - https://github.com/apache/spark/pull/29121#issuecomment-658818917
   
   I'm wondering if you disagreed my first comment or just didn't see the it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658857106


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663803155






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670786844


   I did. If most of the changes are exceptions, that makes me wonder how much this rule will just trigger false positives. I am not sure how worth it is to enforce this if mostly it turns up false positives.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663803067


   **[Test build #126525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126525/testReport)** for PR 29121 at commit [`582a7ef`](https://github.com/apache/spark/commit/582a7ef4af4ddeabda6cc449d66787364deaeaae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663816532


   **[Test build #126525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126525/testReport)** for PR 29121 at commit [`582a7ef`](https://github.com/apache/spark/commit/582a7ef4af4ddeabda6cc449d66787364deaeaae).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658857106


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460090784



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Thanks! I've send a mail on the devlist: http://apache-spark-developers-list.1001551.n3.nabble.com/Python-xmlrunner-being-used-td29896.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658628139


   I see your point, but imports are quite trivial to fix when it comes to conflicts. Otherwise we need to add a lot of exceptions, or ignore a lot of rules if we want to use static analysis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-664707155






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-665093992


   Jenkins test this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-659402844


   It's not a bad idea if there are material benefits to not importing modules. (Unlike in the JVM where generally this only has any impact during compilation).
   
   What about just removing the ones that can be removed, and not attempting to enforce / suppress F401? that's a smaller one-time cleanup.
   
   How reliable is detection of unused imports, as importing can have side-effects?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460374402



##########
File path: python/pyspark/heapq3.py
##########
@@ -865,7 +865,7 @@ def nlargest(n, iterable, key=None):
 
 # If available, use C implementation
 try:
-    from _heapq import *
+    from _heapq import *  # noqa: F401

Review comment:
       Yes, let us not touch these files.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460361924



##########
File path: python/pyspark/cloudpickle/cloudpickle.py
##########
@@ -57,7 +57,6 @@
 from .compat import pickle
 from typing import Generic, Union, Tuple, Callable
 from pickle import _getattribute
-from importlib._bootstrap import _find_spec

Review comment:
       @Fokko, can we exclude this file in the linter checking? We're already skipping other lint checks (see `dev/tox.ini`) and this is usually exactly matched to the release of cloudpickle.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794000


   No, I don't feel that way. If others would like to merge, go ahead. I would merge the narrower change myself, but do not object to the broader one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658967694


   Good point @dongjoon-hyun, I was focusing on getting the CI green again. I've updated the PR description and title.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658958309


   It would be great if you mention that in the PR title and PR description. Otherwise, the PR title is misleading.
   > By suppressing it,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663816674






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29121:
URL: https://github.com/apache/spark/pull/29121


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658958309


   It would be great if you mention `suppressing` in the PR title and PR description. Otherwise, the PR title is misleading.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705302


   Retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661894530


   **[Test build #126258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126258/testReport)** for PR 29121 at commit [`c7a791b`](https://github.com/apache/spark/commit/c7a791b10269a2a01853903bd58cd30fc3ebd9c5).
    * This patch **fails Python style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658818917


   I also agree with @HyukjinKwon .
   Without the rule enforcement, `unused import` always happens again because we review and merge PR in a parallel way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670943921


   Thank you for proposing this JIRA and working on this so far, @Fokko .
   
   Let me summarize the final review status since we discussed in many ways.
   - @holdenk supported removing unused `import`.
   - @HyukjinKwon : "I finished my review. I will leave it to you guys since I don't have a strong feeling on this."
   - @dongjoon-hyun : Approved.
   - @srowen : "If others would like to merge, go ahead."
   
   I'll merge this PR. Merged to `master`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794888


   Oh, while I wrote the above my comment, I didn't notice the last comment from @srowen . However, I believe it's better to adjust now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663803067


   **[Test build #126525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126525/testReport)** for PR 29121 at commit [`582a7ef`](https://github.com/apache/spark/commit/582a7ef4af4ddeabda6cc449d66787364deaeaae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661893891


   **[Test build #126258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126258/testReport)** for PR 29121 at commit [`c7a791b`](https://github.com/apache/spark/commit/c7a791b10269a2a01853903bd58cd30fc3ebd9c5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661894546


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670778245






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794507


   Hi, @Fokko . ~It seems that @srowen clearly gave -1 on this approach. Although I approved this, we cannot merge your PR if there is a -1.~
   
   ~I must admit that I didn't notice that Sean has been such a strong -1 for two weeks while you are working in this way.~ As the one who proposed the original direction strongly, sorry about that.
   
   Could you adjust your PR according to @srowen 's comment?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r458039269



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Can you move this forward to test? I can check when removing the xmlrunner, what the impact is on the tests by taking a diff of the Jenkins logs. I'll also ask on the devlist.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-664604888


   So admitedly this is from StackOverflow but it sounds like unused imports in Python tend to have a negligible on-load impact - https://stackoverflow.com/questions/8724045/does-unused-import-and-objects-have-an-performance-impact 
   That being said, if were in a situation where we've got a Python UDF and we're just starting the Python process briefly on each executor, I could see the aggregate of the "one time negligible load" being noticeable.
   
   Personally, regardless of the performance, I'm in favour of cleaning up unused imports and cleaning up our code even if it makes backporting a bit more painful. I think, given that we only backport bug fixes, we should focus on having a code base that is easy to develop in. And being clearer about what the code uses certainly makes it easier for me to form a mental model of a file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658818917


   I also agree with @HyukjinKwon .
   Without the rule enforcement, `unused import` always happens again because we review and merge PR in a parallel way
   
   Also, we cannot `Remove unused imports` in this PR. This PR simply suppresses warnings in some cases instead of `remove unused imports`.
   ```
   from pyspark.sql import SQLContext, HiveContext, Row  # noqa: F401
   ```
   
   Sorry, @Fokko . Given the above two factors, I'm -1 for this AS-IS PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661889963






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r461329483



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Thanks @HyukjinKwon. It looks like it is being used by Jenkins indeed. I will try to remove the imports to see if the tests are still being picked up by Jenkins. These imports look redundant since it is importing the file itself 🤔 

##########
File path: python/pyspark/sql/functions.py
##########
@@ -29,9 +29,9 @@
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.types import StringType, DataType
 # Keep UserDefinedFunction import for backwards compatible import; moved in SPARK-22409
-from pyspark.sql.udf import UserDefinedFunction, _create_udf

Review comment:
       Thanks!

##########
File path: python/pyspark/ml/tests/test_wrapper.py
##########
@@ -116,9 +116,8 @@ def test_new_java_array(self):
         java_array = JavaWrapper._new_java_array(str_list, java_class)
         self.assertEqual(_java2py(self.sc, java_array), expected_str_list)
 
-if __name__ == "__main__":
-    from pyspark.ml.tests.test_wrapper import *

Review comment:
       Just a small spike, I believe this is redundant. You're importing everything from the file itself. Let's wait for Jenkins to see if the file is being reported by Jenkins unit test report. If so, the import is not needed, if not, then it is required.

##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       Ack! Missed that one. I've reverted the imports :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663802581


   Retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670797950


   No -1 here. I don't object to the current change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670783658






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670778245






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658818917


   I also agree with @HyukjinKwon .
   Without the rule enforcement, `unused import` always happens again because we review and merge PR in a parallel way
   
   Also, we cannot `Remove unused imports` in this PR. This PR simply suppresses warnings in some cases instead of `remove unused imports`.
   ```python
   from pyspark.sql import SQLContext, HiveContext, Row  # noqa: F401
   ```
   
   Sorry, @Fokko . Given the above two factors, I'm -1 for this AS-IS PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794507


   Hi, @Fokko . It seems that @srowen clearly gave -1 on this approach. Although I approved this, we cannot merge your PR if there is a -1.
   
   I must admit that I didn't notice that Sean has been such a strong -1 for two weeks while you are working in this way. As the one who proposed the original direction strongly, sorry about that.
   
   Could you adjust your PR according to @srowen 's comment?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670690427


   Would it be possible to move this forward? :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661894556


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126258/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661893891


   **[Test build #126258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126258/testReport)** for PR 29121 at commit [`c7a791b`](https://github.com/apache/spark/commit/c7a791b10269a2a01853903bd58cd30fc3ebd9c5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670791274


   Got it. So, you decided to give -1 for the enforcing for that reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r456687430



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       I think it's not controversial to remove truly unused imports. If you have a PR that just does that, I'd support it. adding the pep rule and exclusions may just not be worth it now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670841152


   I've taken your comment into account, but it got buried somewhere in the comments :) Allow me to summarize:
   
   The rule is being enforced by the flake8 linter. I'll rebase against master in a minute to check if there are any new violations between this PR and master.
   
   The remaining suppressed imports can be put into two categories:
   
   - Imports that are convenience importers. This is a reference that allows users to import a specific class from another module as well, as demonstrated here: https://github.com/apache/spark/pull/29121#discussion_r456681138 Removing these would break backward compatibility, so we have the be cautious here.
   - The imports for the xmlrunner. This looks a bit messy, but this is required with the current way of how to run the rests. Removing this would affect the output, as demonstrated in https://github.com/apache/spark/pull/29121#discussion_r461360724 Maybe in the future, we can run the tests together. I would separate this from the current PR. I've created a ticket to keep this on the backlog: https://issues.apache.org/jira/browse/SPARK-32572


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460362184



##########
File path: python/pyspark/heapq3.py
##########
@@ -865,7 +865,7 @@ def nlargest(n, iterable, key=None):
 
 # If available, use C implementation
 try:
-    from _heapq import *
+    from _heapq import *  # noqa: F401

Review comment:
       This too. It was ported back from CPython 3, and is excluded in other linter checks. Can we exclude this file as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670777774


   **[Test build #127217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)** for PR 29121 at commit [`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460374360



##########
File path: python/pyspark/cloudpickle/cloudpickle.py
##########
@@ -57,7 +57,6 @@
 from .compat import pickle
 from typing import Generic, Union, Tuple, Callable
 from pickle import _getattribute
-from importlib._bootstrap import _find_spec

Review comment:
       Yes, good point! I've moved the flake8 configuration from the command line arguments to the `tox.ini`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670799920


   Got it. I updated my previous comment by marking my misunderstanding.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661889180


   Jenkins test this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658967694


   Good point @dongjoon-hyun, I was focusing on getting the CI green again. I've updated the PR description and title.
   
   While rereading it. Technically the title is correct. If we suppress the error, the import serves a purpose. Feel free to update if there is something that covers the content better in your opinion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663803155






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658859183


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670707796


   **[Test build #127217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)** for PR 29121 at commit [`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794888


   While I wrote the above my comment, I didn't notice the last comment from @srowen . However, I believe it's better to adjust now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r458132433



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       I can trigger tests though perhaps better to not try enforcing/excluding the import rule now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658859183


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r460362434



##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       No, no. don't remove this. It will keep the source code path in console output. It is also used in coverage report as well. XML reporter is used in Jenkins.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-663816674






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661889963






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658818917






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r455781637



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -102,6 +102,7 @@ class LogisticRegressionModel(LinearClassificationModel):
       in Multinomial Logistic Regression. By default, it is binary
       logistic regression so numClasses will be set to 2.
 
+    >>> from pyspark.mllib.linalg import SparseVector

Review comment:
       I don't think PySpark examples import relevant PySpark classes within doctests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670943921


   Thank you for proposing this JIRA and working on this so far, @Fokko .
   
   Let me summarize the final review status since we discussed in many ways.
   - @holdenk supported unused `import` removal.
   - @HyukjinKwon : "I finished my review. I will leave it to you guys since I don't have a strong feeling on this."
   - @dongjoon-hyun : Approved.
   - @srowen : "If others would like to merge, go ahead."
   
   I'll merge this PR in AS-IS status before going diverse again. Merged to `master`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670711092


   My last comment was, why do we need to add the rule and then a ton of exclusions? just remove the unused imports. That's a much narrower change


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658818917


   I also agree with @HyukjinKwon .
   Without the rule enforcement, `unused import` always happens again because we review and merge PR in a parallel way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Fokko commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r456679942



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -102,6 +102,7 @@ class LogisticRegressionModel(LinearClassificationModel):
       in Multinomial Logistic Regression. By default, it is binary
       logistic regression so numClasses will be set to 2.
 
+    >>> from pyspark.mllib.linalg import SparseVector

Review comment:
       The examples are actually ran in the tests. If the SparseVector is imported in the Python file itself, there is no issue. However, if the import isn't used outside of the example, I would suggest to move them inside of the example. An error:
   ```
   **********************************************************************
   File "/home/runner/work/spark/spark/python/pyspark/mllib/feature.py", line 287, in __main__.ChiSqSelector
   Failed example:
       data = sc.parallelize([
           LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
           LabeledPoint(1.0, SparseVector(3, {1: 9.0, 2: 6.0})),
           LabeledPoint(1.0, [0.0, 9.0, 8.0]),
           LabeledPoint(2.0, [7.0, 9.0, 5.0]),
           LabeledPoint(2.0, [8.0, 7.0, 3.0])
       ])
   Exception raised:
       Traceback (most recent call last):
         File "/opt/hostedtoolcache/Python/3.6.11/x64/lib/python3.6/doctest.py", line 1330, in __run
           compileflags, 1), test.globs)
         File "<doctest __main__.ChiSqSelector[0]>", line 2, in <module>
           LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
       NameError: name 'LabeledPoint' is not defined
   ```

##########
File path: dev/lint-python
##########
@@ -147,7 +147,7 @@ flake8 checks failed."
     fi
 
     echo "starting $FLAKE8_BUILD test..."
-    FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count --select=E901,E999,F821,F822,F823 \
+    FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count --select=E901,E999,F821,F822,F823,F401 \

Review comment:
       I've enabled the rule in Flake8 :)

##########
File path: python/pyspark/__init__.py
##########
@@ -112,7 +112,7 @@ def wrapper(self, *args, **kwargs):
 
 
 # for back compatibility
-from pyspark.sql import SQLContext, HiveContext, Row
+from pyspark.sql import SQLContext, HiveContext, Row  # noqa: F401

Review comment:
       @srowen I'd be happy to do some more cleanup, but ones like these can't be removed easily. They provide backward compatibility, and allows us to do the following:
   ```bash
   fokkodriesprong@Fan spark % python3
   Python 3.7.7 (v3.7.7:d7c567b08f, Mar 10 2020, 02:56:16) 
   [Clang 6.0 (clang-600.0.57)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from pyspark import Row
   >>> # This is allowed due to the import
   ```
   For backward compatibility I would keep these, otherwise we might break a lot of imports.

##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       @srowen We could consider removing these ones. There are some entries for running the tests using the xmlrunner. However it is unclear to me who's using this, maybe the CI?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r454870938



##########
File path: python/pyspark/sql/functions.py
##########
@@ -29,9 +29,9 @@
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.types import StringType, DataType
 # Keep UserDefinedFunction import for backwards compatible import; moved in SPARK-22409
-from pyspark.sql.udf import UserDefinedFunction, _create_udf

Review comment:
       `UserDefinedFunction` should be kept as described above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-661894546






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670943921


   Thank you for proposing this JIRA and working on this so far, @Fokko .
   
   Let me summarize the final review status since we discussed in many ways.
   - @holdenk supported removing unused `import`.
   - @HyukjinKwon : "I finished my review. I will leave it to you guys since I don't have a strong feeling on this."
   - @dongjoon-hyun : Approved.
   - @srowen : "If others would like to merge, go ahead."
   
   I'll merge this PR in AS-IS status before going diverse again. Merged to `master`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org