You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/23 04:18:31 UTC

[GitHub] [spark] Yikun opened a new pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Yikun opened a new pull request #34993:
URL: https://github.com/apache/spark/pull/34993


   ### What changes were proposed in this pull request?
   Replace `os.environ` direct access with `_find_spark_home` in pyspark testing utils.
   
   ### Why are the changes needed?
   When running PySpark unit test in WSL with PyCharm, all test raise the `KeyError: 'SPARK_HOME'`
   ```
   Launching unittests with arguments python -m unittest test_rdd.RDDTests.test_range in /home/yikun/spark/python/pyspark/testsTraceback (most recent call last):
     File "/mnt/d/Program Files/JetBrains/PyCharm 2021.1.3/plugins/python/helpers/pycharm/_jb_unittest_runner.py", line 35, in <module>
       sys.exit(main(argv=args, module=None, testRunner=unittestpy.TeamcityTestRunner, buffer=not JB_DISABLE_BUFFERING))
     File "/usr/lib/python3.8/unittest/main.py", line 100, in __init__
       self.parseArgs(argv)
     File "/usr/lib/python3.8/unittest/main.py", line 147, in parseArgs
       self.createTests()
     File "/usr/lib/python3.8/unittest/main.py", line 158, in createTests
       self.test = self.testLoader.loadTestsFromNames(self.testNames,
     File "/usr/lib/python3.8/unittest/loader.py", line 220, in loadTestsFromNames
       suites = [self.loadTestsFromName(name, module) for name in names]
     File "/usr/lib/python3.8/unittest/loader.py", line 220, in <listcomp>
       suites = [self.loadTestsFromName(name, module) for name in names]
     File "/usr/lib/python3.8/unittest/loader.py", line 154, in loadTestsFromName
       module = __import__(module_name)
     File "/home/yikun/spark/python/pyspark/tests/test_rdd.py", line 37, in <module>
       from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest
     File "/home/yikun/spark/python/pyspark/testing/utils.py", line 47, in <module>
       SPARK_HOME = os.environ["SPARK_HOME"]#_find_spark_home()
     File "/usr/lib/python3.8/os.py", line 675, in __getitem__
       raise KeyError(key) from None
   KeyError: 'SPARK_HOME' 
   ```
   
   We should use `_find_spark_home` to get SPARK_HOME rather than access `os.environ`.
   
   ### Does this PR introduce _any_ user-facing change?
   No, test only
   
   ### How was this patch tested?
   - UT passed.
   - Run test in Win WSL.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774350553



##########
File path: python/pyspark/testing/utils.py
##########
@@ -152,7 +153,7 @@ def close(self):
 def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
     # Note that 'sbt_jar_name_prefix' and 'mvn_jar_name_prefix' are used since the prefix can
     # vary for SBT or Maven specifically. See also SPARK-26856
-    project_full_path = os.path.join(os.environ["SPARK_HOME"], project_relative_path)
+    project_full_path = os.path.join(_find_spark_home(), project_relative_path)

Review comment:
       I think we can just use `SPARK_HOME`. SPARK_HOME should be set before importing this in our testing script anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000050229


   **[Test build #146500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146500/testReport)** for PR 34993 at commit [`2b1040f`](https://github.com/apache/spark/commit/2b1040fecf10cc8697c36b8c50fc62c474c84d4f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34993:
URL: https://github.com/apache/spark/pull/34993


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774372541



##########
File path: python/pyspark/sql/utils.py
##########
@@ -245,12 +246,7 @@ def require_test_compiled() -> None:
     import os
     import glob
 
-    try:
-        spark_home = os.environ["SPARK_HOME"]
-    except KeyError:
-        raise RuntimeError("SPARK_HOME is not defined in environment")
-
-    test_class_path = os.path.join(spark_home, "sql", "core", "target", "*", "test-classes")
+    test_class_path = os.path.join(_find_spark_home(), "sql", "core", "target", "*", "test-classes")

Review comment:
       It works, but I assumed `python/pyspark/testing/utils.py` should be imported py in tests/* rather than utils (even this function is only used for tests), it perhaps bring [some redundant test import and even circle dependecy](https://github.com/apache/spark/blob/6fb417e21c98283c40713698c58a73e68ea9614f/python/pyspark/testing/sqlutils.py#L30-L58) to sql utils, might bring some performance regression when someone use sql utils in non-test code, so I didn't change this in here. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000059358


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50976/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774333683



##########
File path: python/pyspark/testing/utils.py
##########
@@ -22,6 +22,7 @@
 import unittest
 from time import time, sleep
 
+from pyspark.find_spark_home import _find_spark_home
 from pyspark import SparkContext, SparkConf

Review comment:
       Actually it says:
   
   > imports should be sorted lexicographically, ignoring case, according to each module’s full package path (the `path` in `from path import ...`)
   
   Since `pyspark.` is after `pyspark` , I think placing `from pyspark import SparkContext, SparkConf` up seems correct :-). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000085510


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50976/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000110567


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146505/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774350685



##########
File path: python/pyspark/tests/test_appsubmit.py
##########
@@ -23,13 +23,15 @@
 import unittest
 import zipfile
 
+from pyspark.find_spark_home import _find_spark_home
+
 
 class SparkSubmitTests(unittest.TestCase):
     def setUp(self):
         self.programDir = tempfile.mkdtemp()
         tmp_dir = tempfile.gettempdir()
         self.sparkSubmit = [
-            os.path.join(os.environ.get("SPARK_HOME"), "bin", "spark-submit"),
+            os.path.join(_find_spark_home(), "bin", "spark-submit"),

Review comment:
       Here too. Let's just import `SPARK_HOME`. I think it's fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000042280


   **[Test build #146500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146500/testReport)** for PR 34993 at commit [`2b1040f`](https://github.com/apache/spark/commit/2b1040fecf10cc8697c36b8c50fc62c474c84d4f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000132241


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000110567


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146505/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000096203


   **[Test build #146505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146505/testReport)** for PR 34993 at commit [`90648bc`](https://github.com/apache/spark/commit/90648bc43a62656742f65bb3f657b8e154147b3c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000141782


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000085461


   **[Test build #146505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146505/testReport)** for PR 34993 at commit [`90648bc`](https://github.com/apache/spark/commit/90648bc43a62656742f65bb3f657b8e154147b3c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774372541



##########
File path: python/pyspark/sql/utils.py
##########
@@ -245,12 +246,7 @@ def require_test_compiled() -> None:
     import os
     import glob
 
-    try:
-        spark_home = os.environ["SPARK_HOME"]
-    except KeyError:
-        raise RuntimeError("SPARK_HOME is not defined in environment")
-
-    test_class_path = os.path.join(spark_home, "sql", "core", "target", "*", "test-classes")
+    test_class_path = os.path.join(_find_spark_home(), "sql", "core", "target", "*", "test-classes")

Review comment:
       It works, but I assumed `python/pyspark/testing/utils.py` should be imported in tests/* rather than utils (even this function is only used for tests), it perhaps bring [some redundant test import and even circle dependecy](https://github.com/apache/spark/blob/6fb417e21c98283c40713698c58a73e68ea9614f/python/pyspark/testing/sqlutils.py#L30-L58) to sql utils, might bring some performance regression when someone use sql utils in non-test code, so I didn't change this in here. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000104776


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774313424



##########
File path: python/pyspark/testing/utils.py
##########
@@ -22,6 +22,7 @@
 import unittest
 from time import time, sleep
 
+from pyspark.find_spark_home import _find_spark_home
 from pyspark import SparkContext, SparkConf

Review comment:
       ```suggestion
   from pyspark import SparkContext, SparkConf
   from pyspark.find_spark_home import _find_spark_home
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000192240


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000060766


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146500/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774357011



##########
File path: python/pyspark/sql/utils.py
##########
@@ -245,12 +246,7 @@ def require_test_compiled() -> None:
     import os
     import glob
 
-    try:
-        spark_home = os.environ["SPARK_HOME"]
-    except KeyError:
-        raise RuntimeError("SPARK_HOME is not defined in environment")
-
-    test_class_path = os.path.join(spark_home, "sql", "core", "target", "*", "test-classes")
+    test_class_path = os.path.join(_find_spark_home(), "sql", "core", "target", "*", "test-classes")

Review comment:
       Can we use `pyspark.testing.utils.SPARK_HOME` here too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774372541



##########
File path: python/pyspark/sql/utils.py
##########
@@ -245,12 +246,7 @@ def require_test_compiled() -> None:
     import os
     import glob
 
-    try:
-        spark_home = os.environ["SPARK_HOME"]
-    except KeyError:
-        raise RuntimeError("SPARK_HOME is not defined in environment")
-
-    test_class_path = os.path.join(spark_home, "sql", "core", "target", "*", "test-classes")
+    test_class_path = os.path.join(_find_spark_home(), "sql", "core", "target", "*", "test-classes")

Review comment:
       It works, but I assumed `python/pyspark/testing/utils.py` should be imported py in tests/* rather than utils (even this function is only used for tests), it perhaps bring [some redundant test import and even circle dependecy](https://github.com/apache/spark/blob/6fb417e21c98283c40713698c58a73e68ea9614f/python/pyspark/testing/sqlutils.py#L30-L58) to sql utils, might bring some performance regression when someone use sql utils in non-test code, so I didn't change this in here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000060766


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146500/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000141782


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000042280


   **[Test build #146500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146500/testReport)** for PR 34993 at commit [`2b1040f`](https://github.com/apache/spark/commit/2b1040fecf10cc8697c36b8c50fc62c474c84d4f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000085510


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50976/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000085461






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on pull request #34993:
URL: https://github.com/apache/spark/pull/34993#issuecomment-1000059071


   > Maybe let's fix https://github.com/apache/spark/blob/master/python/pyspark/sql/utils.py#L248-L251 too.
   
   Sure, will fix in next commit after confirm https://github.com/apache/spark/pull/34993#discussion_r774325742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774338593



##########
File path: python/pyspark/testing/utils.py
##########
@@ -22,6 +22,7 @@
 import unittest
 from time import time, sleep
 
+from pyspark.find_spark_home import _find_spark_home
 from pyspark import SparkContext, SparkConf

Review comment:
       oops, I didn't catch the ".", thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774464655



##########
File path: python/pyspark/sql/utils.py
##########
@@ -245,12 +246,7 @@ def require_test_compiled() -> None:
     import os
     import glob
 
-    try:
-        spark_home = os.environ["SPARK_HOME"]
-    except KeyError:
-        raise RuntimeError("SPARK_HOME is not defined in environment")
-
-    test_class_path = os.path.join(spark_home, "sql", "core", "target", "*", "test-classes")
+    test_class_path = os.path.join(_find_spark_home(), "sql", "core", "target", "*", "test-classes")

Review comment:
       oh okay, thats fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774325742



##########
File path: python/pyspark/testing/utils.py
##########
@@ -22,6 +22,7 @@
 import unittest
 from time import time, sleep
 
+from pyspark.find_spark_home import _find_spark_home
 from pyspark import SparkContext, SparkConf

Review comment:
       I'm okay with this change, but I'm not sure why we need this change?
   
   Actually, I deliberately put this addtion to the front 😂, because [1]: `imports should be sorted lexicographically`. And flake8 also has a plugin `import-order-style` (not enable in Spark) helps to check it. 
   
   `pysparkfind_spark_home_find_spark_home` should before `pysparkSparkContextSparkConf`.
   
   [1] https://google.github.io/styleguide/pyguide.html?showone=Imports_formatting#313-imports-formatting




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774332398



##########
File path: python/pyspark/testing/utils.py
##########
@@ -22,6 +22,7 @@
 import unittest
 from time import time, sleep
 
+from pyspark.find_spark_home import _find_spark_home
 from pyspark import SparkContext, SparkConf

Review comment:
       oh okay




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34993: [SPARK-37721][PYTHON] Fix SPARK_HOME key error when running PySpark test

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34993:
URL: https://github.com/apache/spark/pull/34993#discussion_r774350553



##########
File path: python/pyspark/testing/utils.py
##########
@@ -152,7 +153,7 @@ def close(self):
 def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
     # Note that 'sbt_jar_name_prefix' and 'mvn_jar_name_prefix' are used since the prefix can
     # vary for SBT or Maven specifically. See also SPARK-26856
-    project_full_path = os.path.join(os.environ["SPARK_HOME"], project_relative_path)
+    project_full_path = os.path.join(_find_spark_home(), project_relative_path)

Review comment:
       I think we can just use `pyspark.testing.utils.SPARK_HOME`. SPARK_HOME should be set before importing this in our testing script anyway.

##########
File path: python/pyspark/tests/test_appsubmit.py
##########
@@ -23,13 +23,15 @@
 import unittest
 import zipfile
 
+from pyspark.find_spark_home import _find_spark_home
+
 
 class SparkSubmitTests(unittest.TestCase):
     def setUp(self):
         self.programDir = tempfile.mkdtemp()
         tmp_dir = tempfile.gettempdir()
         self.sparkSubmit = [
-            os.path.join(os.environ.get("SPARK_HOME"), "bin", "spark-submit"),
+            os.path.join(_find_spark_home(), "bin", "spark-submit"),

Review comment:
       Here too. Let's just import `pyspark.testing.utils.SPARK_HOME`. I think it's fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org