You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/10/26 06:13:08 UTC

[PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for `SparkContext` on Spark Connect. [spark]

itholic opened a new pull request, #43537:
URL: https://github.com/apache/spark/pull/43537

   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to improve error message for `SparkContext` on Spark Connect.
   
   
   ### Why are the changes needed?
   
   To improve the usability of error message by using proper error class.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No API change, but only user-facing error message is improved as below:
   
   **Before**
   ```
   >>> spark.sparkContext
   PySparkNotImplementedError: [NOT_IMPLEMENTED] sparkContext() is not implemented.
   ```
   **After**
   ```
   >>> spark.sparkContext
   PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `sparkContext` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session.
   ```
   
   
   ### How was this patch tested?
   
   The existing CI should pass
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1373993633


##########
python/pyspark/sql/connect/session.py:
##########
@@ -694,14 +694,10 @@ def streams(self) -> "StreamingQueryManager":
     streams.__doc__ = PySparkSession.streams.__doc__
 
     def __getattr__(self, name: str) -> Any:
-        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
+        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext", "newSession"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}

Review Comment:
   Sounds good. Added link for creating regular Spark Session docs in the error message:
   
   ```
   Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1375360229


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1639,13 +1639,11 @@ def sampleBy(
     sampleBy.__doc__ = PySparkDataFrame.sampleBy.__doc__
 
     def __getattr__(self, name: str) -> "Column":
-        if name in ["_jseq", "_jdf", "_jmap", "_jcols"]:
+        if name in ["_jseq", "_jdf", "_jmap", "_jcols", "rdd", "toJSON"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
             )
         elif name in [
-            "rdd",
-            "toJSON",
             "checkpoint",
             "localCheckpoint",

Review Comment:
   Yes, so I leave them as NOT_IMPLEMENTED for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for `SparkContext` on Spark Connect. [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1372642457


##########
python/pyspark/sql/connect/session.py:
##########
@@ -694,11 +694,11 @@ def streams(self) -> "StreamingQueryManager":
     streams.__doc__ = PySparkSession.streams.__doc__
 
     def __getattr__(self, name: str) -> Any:
-        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
+        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
             )
-        elif name in ["newSession", "sparkContext"]:
+        elif name in ["newSession"]:

Review Comment:
   Sure, will add more if I find any.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1373472305


##########
python/pyspark/sql/connect/session.py:
##########
@@ -694,14 +694,10 @@ def streams(self) -> "StreamingQueryManager":
     streams.__doc__ = PySparkSession.streams.__doc__
 
     def __getattr__(self, name: str) -> Any:
-        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
+        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext", "newSession"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}

Review Comment:
   I see this in the error message:`If you need to use this attribute, do not use Spark Connect when creating your session.`
   Could we provide more detailed instructions on how users can "not use" Spark Connect?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for `SparkContext` on Spark Connect. [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #43537:
URL: https://github.com/apache/spark/pull/43537#issuecomment-1780475441

   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43537:
URL: https://github.com/apache/spark/pull/43537#issuecomment-1784059080

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1375068730


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1639,13 +1639,11 @@ def sampleBy(
     sampleBy.__doc__ = PySparkDataFrame.sampleBy.__doc__
 
     def __getattr__(self, name: str) -> "Column":
-        if name in ["_jseq", "_jdf", "_jmap", "_jcols"]:
+        if name in ["_jseq", "_jdf", "_jmap", "_jcols", "rdd", "toJSON"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
             )
         elif name in [
-            "rdd",
-            "toJSON",
             "checkpoint",
             "localCheckpoint",

Review Comment:
   For `checkpoint` and `localCheckpoint`, we still have this NOT_IMPLEMENTED error. Just curious are we planning to support them in the future?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43537: [SPARK-45674][CONNECT][PYTHON] Improve error message for JVM-dependent attributes on Spark Connect.
URL: https://github.com/apache/spark/pull/43537


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45674][CONNECT][PYTHON] Improve error message for `SparkContext` on Spark Connect. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43537:
URL: https://github.com/apache/spark/pull/43537#discussion_r1372639788


##########
python/pyspark/sql/connect/session.py:
##########
@@ -694,11 +694,11 @@ def streams(self) -> "StreamingQueryManager":
     streams.__doc__ = PySparkSession.streams.__doc__
 
     def __getattr__(self, name: str) -> Any:
-        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
+        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext"]:
             raise PySparkAttributeError(
                 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
             )
-        elif name in ["newSession", "sparkContext"]:
+        elif name in ["newSession"]:

Review Comment:
   Let's also do `newSession`. Can we fix them all since we're here? I think we shouldn't say it's not implemented but unsupported explicitly. e.g., `DataFrame.rdd`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org