You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/24 23:14:59 UTC

[GitHub] [spark] itholic opened a new pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

itholic opened a new pull request #35656:
URL: https://github.com/apache/spark/pull/35656


   ### What changes were proposed in this pull request?
   
   This PR proposes to migrate the following errors to use error class.
   
   - pandasUDFAggregateNotSupportedInPivotError
   - groupAggPandasUDFUnsupportedByStreamingAggError
   - cannotUseMixtureOfAggFunctionAndGroupAggPandasUDFError
   - usePythonUDFInJoinConditionUnsupportedError
   
   
   ### Why are the changes needed?
   
   To throw a standardized user-facing error or exception for the python/pandas UDF related errors.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added tests to `QueryCompilationErrorsSuite`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] bjornjorgensen commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

bjornjorgensen commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1050676538


   Yes, just one thing.
   
   "Pivot does not support Pandas UDF aggregate expressions."
   
   Do you mean? 
   Pandas UDF aggregate expressions don't support pivot. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814437267



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -198,7 +198,9 @@ object QueryCompilationErrors {
   }
 
   def pandasUDFAggregateNotSupportedInPivotError(): Throwable = {
-    new AnalysisException("Pandas UDF aggregate expressions are currently not supported in pivot.")

Review comment:
       This message is changed from `"Pandas UDF aggregate expressions are currently not supported in pivot."` to `"Pivot does not support Pandas UDF aggregate expressions."` as suggested in https://spark.apache.org/error-message-guidelines.html.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r817170583



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -373,7 +373,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
 
         if (aggregateExpressions.exists(PythonUDF.isGroupedAggPandasUDF)) {
-          throw QueryCompilationErrors.groupAggPandasUDFUnsupportedByStreamingAggError()
+          throw new AnalysisException(
+            "Streaming aggregation doesn't support group aggregate pandas UDF")

Review comment:
       Because the existing `groupAggPandasUDFUnsupportedByStreamingAggError` is not user-facing error, so I think we might not want to reveal this to the error class.
   
   Also refer to https://github.com/apache/spark/pull/35656/files#r814517495.
   
   Maybe do we also want to leave some comment to the code base?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1054272680


   Thanks, @MaxGekk ! Let me address the comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1064564596


   Thanks, @MaxGekk ! Just addressed the comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814497993



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1333,9 +1333,7 @@ object QueryCompilationErrors {
   }
 
   def groupAggPandasUDFUnsupportedByStreamingAggError(): Throwable = {

Review comment:
       Can we remove this too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r817170583



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -373,7 +373,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
 
         if (aggregateExpressions.exists(PythonUDF.isGroupedAggPandasUDF)) {
-          throw QueryCompilationErrors.groupAggPandasUDFUnsupportedByStreamingAggError()
+          throw new AnalysisException(
+            "Streaming aggregation doesn't support group aggregate pandas UDF")

Review comment:
       Because the existing `groupAggPandasUDFUnsupportedByStreamingAggError` is not user-facing error, so I remove the `groupAggPandasUDFUnsupportedByStreamingAggError ` since I think we might not want to reveal the non-user-facing error to the error class.
   
   And, also refer to https://github.com/apache/spark/pull/35656/files#r814517495.
   
   Maybe do we also want to leave some comment to the code base?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1053738710


   @bjornjorgensen Yeah, let me refine the message. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r823529710



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,69 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE: Using aggregate function with grouped aggregate pandas UDF") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()

Review comment:
       Please, fix indentation here. Should be 2 spaces.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,69 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE: Using aggregate function with grouped aggregate pandas UDF") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()
+    }
+
+    assert(e.errorClass === Some("CANNOT_USE_MIXTURE"))
+    assert(e.message ===
+      "Cannot use a mixture of aggregate function and group aggregate pandas UDF")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using Python UDF with unsupported join condition") {
+    import IntegratedUDFTestUtils._
+
+    val df1 = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val df2 = Seq(
+      ("Bob", 17850),
+      ("Alice", 17850),
+      ("Tom", 17851)
+    ).toDF("CustomerName", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pythonTestUDF = TestPythonUDF(name = "python_udf")
+        df1.join(
+          df2, pythonTestUDF(df1("CustomerID") === df2("CustomerID")), "leftouter").collect()

Review comment:
       Please, fix indentation.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,69 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE: Using aggregate function with grouped aggregate pandas UDF") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()
+    }
+
+    assert(e.errorClass === Some("CANNOT_USE_MIXTURE"))
+    assert(e.message ===
+      "Cannot use a mixture of aggregate function and group aggregate pandas UDF")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using Python UDF with unsupported join condition") {
+    import IntegratedUDFTestUtils._
+
+    val df1 = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val df2 = Seq(
+      ("Bob", 17850),
+      ("Alice", 17850),
+      ("Tom", 17851)
+    ).toDF("CustomerName", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pythonTestUDF = TestPythonUDF(name = "python_udf")
+        df1.join(
+          df2, pythonTestUDF(df1("CustomerID") === df2("CustomerID")), "leftouter").collect()
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_FEATURE"))
+    assert(e.message ===
+      "The feature is not supported: " +
+      "Using PythonUDF in join condition of join type LeftOuter is not supported")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using pandas UDF aggregate expression with pivot") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy(df("CustomerID")).pivot(df("CustomerID")).agg(pandasTestUDF(df("Quantity")))

Review comment:
       Change indentation

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,69 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE: Using aggregate function with grouped aggregate pandas UDF") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()
+    }
+
+    assert(e.errorClass === Some("CANNOT_USE_MIXTURE"))
+    assert(e.message ===
+      "Cannot use a mixture of aggregate function and group aggregate pandas UDF")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using Python UDF with unsupported join condition") {
+    import IntegratedUDFTestUtils._
+
+    val df1 = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val df2 = Seq(
+      ("Bob", 17850),
+      ("Alice", 17850),
+      ("Tom", 17851)
+    ).toDF("CustomerName", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pythonTestUDF = TestPythonUDF(name = "python_udf")
+        df1.join(
+          df2, pythonTestUDF(df1("CustomerID") === df2("CustomerID")), "leftouter").collect()
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_FEATURE"))

Review comment:
       Check sqlState too

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,69 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE: Using aggregate function with grouped aggregate pandas UDF") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()
+    }
+
+    assert(e.errorClass === Some("CANNOT_USE_MIXTURE"))
+    assert(e.message ===
+      "Cannot use a mixture of aggregate function and group aggregate pandas UDF")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using Python UDF with unsupported join condition") {
+    import IntegratedUDFTestUtils._
+
+    val df1 = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val df2 = Seq(
+      ("Bob", 17850),
+      ("Alice", 17850),
+      ("Tom", 17851)
+    ).toDF("CustomerName", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pythonTestUDF = TestPythonUDF(name = "python_udf")
+        df1.join(
+          df2, pythonTestUDF(df1("CustomerID") === df2("CustomerID")), "leftouter").collect()
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_FEATURE"))
+    assert(e.message ===
+      "The feature is not supported: " +
+      "Using PythonUDF in join condition of join type LeftOuter is not supported")
+  }
+
+  test("UNSUPPORTED_FEATURE: Using pandas UDF aggregate expression with pivot") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy(df("CustomerID")).pivot(df("CustomerID")).agg(pandasTestUDF(df("Quantity")))
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_FEATURE"))

Review comment:
       check sqlState




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814452730



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +102,91 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy("CustomerId")
+          .agg(pandasTestUDF(df("Quantity")), sum(df("Quantity"))).collect()
+    }
+
+    assert(e.errorClass === Some("CANNOT_USE_MIXTURE"))
+    assert(e.message ===
+      "Cannot use a mixture of aggregate function and group aggregate pandas UDF")
+  }
+
+  test("UNSUPPORTED_JOIN_CONDITION") {
+    import IntegratedUDFTestUtils._
+
+    val df1 = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+    val df2 = Seq(
+      ("Bob", 17850),
+      ("Alice", 17850),
+      ("Tom", 17851)
+    ).toDF("CustomerName", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pythonTestUDF = TestPythonUDF(name = "python_udf")
+        df1.join(
+          df2, pythonTestUDF(df1("CustomerID") === df2("CustomerID")), "leftouter").collect()
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_JOIN_CONDITION"))
+    assert(e.message ===
+      "Using PythonUDF in join condition of join type LeftOuter is not supported")
+  }
+
+  test("UNSUPPORTED_PANDAS_UDF_AGGREGATE_EXPRESSION") {
+    import IntegratedUDFTestUtils._
+
+    val df = Seq(
+      (536361, "85123A", 2, 17850),
+      (536362, "85123B", 4, 17850),
+      (536363, "86123A", 6, 17851)
+    ).toDF("InvoiceNo", "StockCode", "Quantity", "CustomerID")
+
+    val e = intercept[AnalysisException] {
+        val pandasTestUDF = TestGroupedAggPandasUDF(name = "pandas_udf")
+        df.groupBy(df("CustomerID")).pivot(df("CustomerID")).agg(pandasTestUDF(df("Quantity")))
+    }
+
+    assert(e.errorClass === Some("UNSUPPORTED_PANDAS_UDF_AGGREGATE_EXPRESSION"))
+    assert(e.message ===
+      "Pivot does not support Pandas UDF aggregate expressions.")
+  }
+
+   test("UNSUPPORTED_STREAMING_AGGREGATION") {

Review comment:
       Let's remove this and `groupAggPandasUDFUnsupportedByStreamingAggError` for now because the `AnalysisException` class is not user-facing. cc @MaxGekk @HeartSaVioR  to confirm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1063566286


   Looks good but I think it needs @MaxGekk 's signoff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1064968150


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r817170583



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -373,7 +373,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
 
         if (aggregateExpressions.exists(PythonUDF.isGroupedAggPandasUDF)) {
-          throw QueryCompilationErrors.groupAggPandasUDFUnsupportedByStreamingAggError()
+          throw new AnalysisException(
+            "Streaming aggregation doesn't support group aggregate pandas UDF")

Review comment:
       Because the existing `groupAggPandasUDFUnsupportedByStreamingAggError` is not user-facing error, so I firstly remove it from `QueryCompilationErrors` since I think maybe we don't want to reveal the non-user-facing error to the error class.
   
   And, also refer to https://github.com/apache/spark/pull/35656/files#r814517495.
   
   Btw, maybe do we also want to leave some comment to the code base?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #35656:
URL: https://github.com/apache/spark/pull/35656


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1064650036


   @itholic mind retriggering https://github.com/itholic/spark/runs/5503408210?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on pull request #35656:
URL: https://github.com/apache/spark/pull/35656#issuecomment-1063563252


   @HyukjinKwon @MaxGekk Would you mind confirming the applied comments when you find some time ? 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814517495



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1330,18 +1332,15 @@ object QueryCompilationErrors {
       s"Expected: ${dataType.typeName}; Found: ${expression.dataType.typeName}")
   }
 
-  def groupAggPandasUDFUnsupportedByStreamingAggError(): Throwable = {

Review comment:
       We will have to manually throw an exception of `AnalysisException` where `groupAggPandasUDFUnsupportedByStreamingAggError` was referred.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814519396



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1330,18 +1332,15 @@ object QueryCompilationErrors {
       s"Expected: ${dataType.typeName}; Found: ${expression.dataType.typeName}")
   }
 
-  def groupAggPandasUDFUnsupportedByStreamingAggError(): Throwable = {

Review comment:
       Yeah, let me address




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

itholic commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r814437267



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -198,7 +198,9 @@ object QueryCompilationErrors {
   }
 
   def pandasUDFAggregateNotSupportedInPivotError(): Throwable = {
-    new AnalysisException("Pandas UDF aggregate expressions are currently not supported in pivot.")

Review comment:
       This message is changed from `"Pandas UDF aggregate expressions are currently not supported in pivot."` to `"Pivot does not support Pandas UDF aggregate expressions."` as suggested in [Error Message Guidelines](https://spark.apache.org/error-message-guidelines.html).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35656: [SPARK-38107][SQL] Use error classes in the compilation errors of python/pandas UDFs

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35656:
URL: https://github.com/apache/spark/pull/35656#discussion_r815633479



##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -152,6 +155,12 @@
   "UNSUPPORTED_GROUPING_EXPRESSION" : {
     "message" : [ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup" ]
   },
+  "UNSUPPORTED_JOIN_CONDITION" : {
+    "message" : [ "Using PythonUDF in join condition of join type %s is not supported" ]
+  },
+  "UNSUPPORTED_PANDAS_UDF_AGGREGATE_EXPRESSION" : {
+    "message" : [ "Pandas UDF aggregate expressions don't support pivot." ]
+  },

Review comment:
       Can't you re-use the existing error class `UNSUPPORTED_FEATURE`?

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
##########
@@ -101,4 +101,67 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
     assert(e.message ===
       "The argument_index of string format cannot contain position 0$.")
   }
+
+  test("CANNOT_USE_MIXTURE") {

Review comment:
       Please, look at other test titles. They contain an error class and some message which explain what the particular test checks. Could you add some message here (and below tests) too, please.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -373,7 +373,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
 
         if (aggregateExpressions.exists(PythonUDF.isGroupedAggPandasUDF)) {
-          throw QueryCompilationErrors.groupAggPandasUDFUnsupportedByStreamingAggError()
+          throw new AnalysisException(
+            "Streaming aggregation doesn't support group aggregate pandas UDF")

Review comment:
       Could you clarify why did you make this replacement?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org