You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/11/20 01:32:13 UTC

[PR] [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect [spark]

HyukjinKwon opened a new pull request, #43894:
URL: https://github.com/apache/spark/pull/43894

   ### What changes were proposed in this pull request?
   
   This PR improve the error messages for the dependency requirement for Python Spark Connect.
   
   ### Why are the changes needed?
   
   In order to improve error messages. This is what you get for now:
   
   ```
   /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
     warnings.warn("Failed to initialize Spark session.")
   Traceback (most recent call last):
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/shell.py", line 52, in <module>
       spark = SparkSession.builder.getOrCreate()
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", line 476, in getOrCreate
       from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/session.py", line 53, in <module>
       from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/client/__init__.py", line 22, in <module>
       from pyspark.sql.connect.client.core import *  # noqa: F401,F403
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/client/core.py", line 51, in <module>
       import google.protobuf.message
   ModuleNotFoundError: No module named 'google
   ```
   
   ```
   /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
     warnings.warn("Failed to initialize Spark session.")
   Traceback (most recent call last):
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/shell.py", line 52, in <module>
       spark = SparkSession.builder.getOrCreate()
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", line 476, in getOrCreate
       from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/session.py", line 53, in <module>
       from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/client/__init__.py", line 22, in <module>
       from pyspark.sql.connect.client.core import *  # noqa: F401,F403
     File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/client/core.py", line 52, in <module>
       from grpc_status import rpc_status
   ModuleNotFoundError: No module named 'grpc_status'
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it changes the user-facing error messages.
   
   ### How was this patch tested?
   
   Manually tested as below:
   
   ```bash
   ➜  spark git:(master) ✗ conda create -y -n python3.10 python=3.10
   ...
   ➜  spark git:(master) ✗ conda activate python3.10
   (python3.10) ➜  spark git:(master) ✗ ./bin/pyspark --remote local
   ...
       raise ImportError(
   ImportError: Pandas >= 1.4.4 must be installed; however, it was not found.
   (python3.10) ➜  spark git:(master) ✗ pip install 'pandas >= 1.4.4'
   ...
   (python3.10) ➜  spark git:(SPARK-45996) ✗ ./bin/pyspark --remote local
   ...
       raise ImportError(
   ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.
   (python3.10) ➜  spark git:(SPARK-45996) pip install 'PyArrow >= 4.0.0'
   ...
   (python3.10) ➜  spark git:(SPARK-45996) ./bin/pyspark --remote local
   ...
       raise ImportError(
   ImportError: grpcio >= 1.48.1 must be installed; however, it was not found.
   (python3.10) ➜  spark git:(SPARK-45996) pip install 'grpcio >= 1.48.1'
   ...
   (python3.10) ➜  spark git:(SPARK-45996) ./bin/pyspark --remote local
   ...
       raise ImportError(
   ImportError: grpc-status >= 1.48.1 must be installed; however, it was not found.
   (python3.10) ➜  spark git:(SPARK-45996) ✗ pip install 'grpcio-status >= 1.48.1'
   ...
   (python3.10) ➜  spark git:(SPARK-45996) ✗ ./bin/pyspark --remote local
   ...
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 4.0.0.dev0
         /_/
   
   Using Python version 3.10.13 (main, Sep 11 2023 08:39:02)
   Client connected to the Spark Connect server at localhost
   SparkSession available as 'spark'.
   >>> spark.range(10).show()
   +---+
   | id|
   +---+
   |  0|
   ...
   ```
   
   Note that `grpcio-status` includes the common `googleapis-common-protos` (see https://github.com/grpc/grpc/blob/master/src/python/grpcio_status/setup.py#L67-L69) so it wasn't explicitly installed.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43894:
URL: https://github.com/apache/spark/pull/43894#issuecomment-1818192423

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43894:
URL: https://github.com/apache/spark/pull/43894#discussion_r1398562718


##########
python/pyspark/sql/connect/utils.py:
##########
@@ -44,14 +46,39 @@ def require_minimum_grpc_version() -> None:
         import grpc
     except ImportError as error:
         raise ImportError(
-            "grpcio >= %s must be installed; however, " "it was not found." % minimum_grpc_version
+            f"grpcio >= {minimum_grpc_version} must be installed; however, it was not found."
         ) from error
     if LooseVersion(grpc.__version__) < LooseVersion(minimum_grpc_version):
         raise ImportError(
-            "grpcio >= %s must be installed; however, "
-            "your version was %s." % (minimum_grpc_version, grpc.__version__)
+            f"grpcio >= {minimum_grpc_version} must be installed; however, "
+            f"your version was {grpc.__version__}."
         )
 
 
+def require_minimum_grpcio_status_version() -> None:
+    """Raise ImportError if grpcio-status is not installed"""
+    minimum_grpc_version = "1.48.1"
+
+    try:
+        import grpc_status

Review Comment:
   This and below do not have `__version__` attributes ... interestingly ..



##########
dev/requirements.txt:
##########
@@ -54,7 +54,7 @@ py
 grpcio>=1.48,<1.57
 grpcio-status>=1.48,<1.57
 protobuf==4.25.1
-googleapis-common-protos==1.56.4
+googleapis-common-protos>=1.56.4

Review Comment:
   According to `setup.py`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43894:
URL: https://github.com/apache/spark/pull/43894#issuecomment-1818078573

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43894: [SPARK-45996][PYTHON][CONNECT] Show proper dependency requirement messages for Spark Connect
URL: https://github.com/apache/spark/pull/43894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org