You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/12/14 07:28:23 UTC

[PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

HyukjinKwon opened a new pull request, #44349:
URL: https://github.com/apache/spark/pull/44349

   ### What changes were proposed in this pull request?
   
   This PR adds new API with/without Spark Connect as below.
   
   - `getMessageParamater` working fine
   - `getQueryContext`
   
   ### Why are the changes needed?
   
   For feature parity.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it adds the new API, `QueryContext`, `QueryContextType`, `PySparkException.getQueryContext` with making `getMessageParamater` working fine.
   
   ### How was this patch tested?
   
   Unittests were added.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44349:
URL: https://github.com/apache/spark/pull/44349#issuecomment-1855311221

   cc @zhengruifeng @itholic @grundprinzip @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1427536554


##########
python/pyspark/errors/exceptions/captured.py:
##########
@@ -332,3 +345,37 @@ class UnknownException(CapturedException, BaseUnknownException):
     """
     None of the other exceptions.
     """
+
+
+class QueryContext(BaseQueryContext):
+    def __init__(self, q: JavaObject):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        context_type = self._q.contextType().toString()
+        assert context_type in ("SQL", "DataFrame")
+        if context_type == "DataFrame":
+            return QueryContextType.DataFrame
+        else:
+            return QueryContextType.SQL
+
+    def objectType(self) -> str:
+        return str(self._q.objectType())
+
+    def objectName(self) -> str:
+        return str(self._q.objectName())
+
+    def startIndex(self) -> int:

Review Comment:
   See my comment linked above https://github.com/apache/spark/pull/43352/files#r1427431997



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1426317058


##########
python/pyspark/sql/tests/connect/test_utils.py:
##########
@@ -14,13 +14,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+import unittest
 
 from pyspark.testing.connectutils import ReusedConnectTestCase
 from pyspark.sql.tests.test_utils import UtilsTestsMixin
 
 
 class ConnectUtilsTests(ReusedConnectTestCase, UtilsTestsMixin):
-    pass
+    @unittest.skip("SPARK-46397: Different exception thrown")

Review Comment:
   cc @zhengruifeng 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1427526487


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -374,3 +425,37 @@ class SparkNoSuchElementException(SparkConnectGrpcException, BaseNoSuchElementEx
     """
     No such element exception.
     """
+
+
+class QueryContext(BaseQueryContext):
+    def __init__(self, q: pb2.FetchErrorDetailsResponse.QueryContext):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        context_type = self._q.context_type
+
+        if int(context_type) == QueryContextType.DataFrame.value:
+            return QueryContextType.DataFrame
+        else:
+            return QueryContextType.SQL
+
+    def objectType(self) -> str:
+        return str(self._q.object_type)
+
+    def objectName(self) -> str:
+        return str(self._q.object_name)
+
+    def startIndex(self) -> int:
+        return int(self._q.start_index)
+
+    def stopIndex(self) -> int:
+        return int(self._q.stop_index)
+
+    def fragment(self) -> str:
+        return str(self._q.fragment)
+
+    def callSite(self) -> str:
+        return str(self._q.callSite)

Review Comment:
   probably should rename the protobuf field to `call_site`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "garlandz-db (via GitHub)" <gi...@apache.org>.
garlandz-db commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1427535520


##########
python/pyspark/errors/exceptions/captured.py:
##########
@@ -332,3 +345,37 @@ class UnknownException(CapturedException, BaseUnknownException):
     """
     None of the other exceptions.
     """
+
+
+class QueryContext(BaseQueryContext):
+    def __init__(self, q: JavaObject):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        context_type = self._q.contextType().toString()
+        assert context_type in ("SQL", "DataFrame")
+        if context_type == "DataFrame":
+            return QueryContextType.DataFrame
+        else:
+            return QueryContextType.SQL
+
+    def objectType(self) -> str:
+        return str(self._q.objectType())
+
+    def objectName(self) -> str:
+        return str(self._q.objectName())
+
+    def startIndex(self) -> int:

Review Comment:
   will `self._q.startIndex()` throw an exception if its not defined? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44349: [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support
URL: https://github.com/apache/spark/pull/44349


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1426363179


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -34,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        query_contexts: List["QueryContext"] = [],

Review Comment:
   So this is just a field but won't affect the final error message?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1427432703


##########
python/pyspark/errors/exceptions/captured.py:
##########
@@ -332,3 +345,37 @@ class UnknownException(CapturedException, BaseUnknownException):
     """
     None of the other exceptions.
     """
+
+
+class QueryContext(BaseQueryContext):
+    def __init__(self, q: JavaObject):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        context_type = self._q.contextType().toString()
+        assert context_type in ("SQL", "DataFrame")
+        if context_type == "DataFrame":
+            return QueryContextType.DataFrame
+        else:
+            return QueryContextType.SQL
+
+    def objectType(self) -> str:
+        return str(self._q.objectType())
+
+    def objectName(self) -> str:
+        return str(self._q.objectName())
+
+    def startIndex(self) -> int:

Review Comment:
   There are some bugs about this in Scala side (https://github.com/apache/spark/pull/43352/files#r1427431997). I will match the implementation to Scala side for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1427536459


##########
python/pyspark/errors/exceptions/captured.py:
##########
@@ -332,3 +345,37 @@ class UnknownException(CapturedException, BaseUnknownException):
     """
     None of the other exceptions.
     """
+
+
+class QueryContext(BaseQueryContext):
+    def __init__(self, q: JavaObject):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        context_type = self._q.contextType().toString()
+        assert context_type in ("SQL", "DataFrame")
+        if context_type == "DataFrame":
+            return QueryContextType.DataFrame
+        else:
+            return QueryContextType.SQL
+
+    def objectType(self) -> str:
+        return str(self._q.objectType())
+
+    def objectName(self) -> str:
+        return str(self._q.objectName())
+
+    def startIndex(self) -> int:

Review Comment:
   It will (should) throw an exception from the server side. But we currently have some bugs in the Spark Connect Server side so it doesn't throw an exception but returns sth like -1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1426365997


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -34,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        query_contexts: List["QueryContext"] = [],

Review Comment:
   oh, the server side has probably embedded it in the error message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1426501966


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -294,3 +315,70 @@ class PySparkImportError(PySparkException, ImportError):
     """
     Wrapper class for ImportError to support error classes.
     """
+
+
+class QueryContextType(Enum):
+    SQL = 0
+    DataFrame = 1
+
+
+class QueryContext:
+    def __init__(self, q: Any):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        if hasattr(self._q, "contextType"):
+            context_type = self._q.contextType().toString()

Review Comment:
   shall we add a `assert(context_type in ['SQL', 'DataFrame']`) here?
   



##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -294,3 +315,70 @@ class PySparkImportError(PySparkException, ImportError):
     """
     Wrapper class for ImportError to support error classes.
     """
+
+
+class QueryContextType(Enum):
+    SQL = 0
+    DataFrame = 1
+
+
+class QueryContext:
+    def __init__(self, q: Any):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        if hasattr(self._q, "contextType"):
+            context_type = self._q.contextType().toString()
+            if context_type == "SQL":
+                context_type = 0
+            else:
+                context_type = 1
+        else:
+            context_type = self._q.context_type

Review Comment:
   `q` here can be either a py4j object or a protobuf? ok



##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -294,3 +315,70 @@ class PySparkImportError(PySparkException, ImportError):
     """
     Wrapper class for ImportError to support error classes.
     """
+
+
+class QueryContextType(Enum):
+    SQL = 0
+    DataFrame = 1
+
+
+class QueryContext:
+    def __init__(self, q: Any):
+        self._q = q
+
+    def contextType(self) -> QueryContextType:
+        if hasattr(self._q, "contextType"):
+            context_type = self._q.contextType().toString()
+            if context_type == "SQL":
+                context_type = 0
+            else:
+                context_type = 1
+        else:
+            context_type = self._q.context_type

Review Comment:
   where is this `context_type` coming from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44349:
URL: https://github.com/apache/spark/pull/44349#discussion_r1426361362


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -106,10 +111,26 @@ def getMessage(self) -> str:
         --------
         :meth:`PySparkException.getErrorClass`
         :meth:`PySparkException.getMessageParameters`
+        :meth:`PySparkException.getQueryContext`
         :meth:`PySparkException.getSqlState`
         """
         return f"[{self.getErrorClass()}] {self._message}"
 
+    def getQueryContext(self) -> List["QueryContext"]:
+        """
+        Returns full error message.

Review Comment:
   it's not error message, it's error context.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46402][PYTHON] Add getMessageParameters and getQueryContext support [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44349:
URL: https://github.com/apache/spark/pull/44349#issuecomment-1857310013

   Tests passed.
   
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org