You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "davidm-db (via GitHub)" <gi...@apache.org> on 2024/03/20 22:09:02 UTC

[PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

davidm-db opened a new pull request, #45622:
URL: https://github.com/apache/spark/pull/45622

   ### What changes were proposed in this pull request?
   This PR renames a few error classes related to usage of FILTER keyword with aggregation funcations and has minor error message improvements:
   - _LEGACY_ERROR_TEMP_1024 => AGGREGATE_FILTER_EXPRESSION_NON_DETERMINISTIC
   - _LEGACY_ERROR_TEMP_1025 => AGGREGATE_FILTER_EXPRESSION_NOT_BOOLEAN
   - _LEGACY_ERROR_TEMP_1026 => AGGREGATE_FILTER_EXPRESSION_CONTAINS_AGGREGATE
   - _LEGACY_ERROR_TEMP_1027 => AGGREGATE_FILTER_EXPRESSION_CONTAINS_WINDOW_FUNCTION
   
   Also, this PR changes tests in corresponding test suite to use `checkError()` method which checks the error class name, context, etc.
   
   ### Why are the changes needed?
   Proper error names and messages improve user experience with Spark SQL.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, this PR changes user-facing error class and message.
   
   ### How was this patch tested?
   By running tests from `AnalysisErrorSuite`, where tests were improved and adapted to the new changes.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2014071433

   @MaxGekk I have addressed your latest comments by adding <filterExpr>, <aggExpr> and <windowExpr> parameters to the error messages. This significantly complicates test methods though, but I have constructed Expression strings using classes derived from Expression without any hardcoding, so I think it should be fine.
   
   The only way I have found to make everything work nicely (having nice output for the user, as well as be able to compare outputs properly in tests) is to use `toPrettySql()` method to convert expression to SQL-like string.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1534891599


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1746,6 +1746,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."

Review Comment:
   To-do: for clarity, surround all <*Expr> with quotes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2012191053

   @MaxGekk one more question, should these error classes be added to some `.md` file that lists all errors returned by Spark SQL? For example, error `WRITE_STREAM_NOT_ALLOWED` from the same `error-classes.json` file is added to `docs/sql-error-conditions.md` with error class name and error description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535541672


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1746,6 +1746,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."

Review Comment:
   Not needed now since I changed from `toPrettySQL()` to `toSQLExpr()` which basically adds quotes around the SQL string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1534890289


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1746,6 +1746,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "Expected a FILTER expression without aggregation, but found <aggExpr>."
+        ]
+      },
+      "CONTAINS_WINDOW_FUNCTION" : {
+        "message" : [
+          "Expected a FILTER expression without window function, but found <windowExpr>."
+        ]
+      },
+      "NON_DETERMINISTIC" : {
+        "message" : [
+          "Expected non-deterministic FILTER expression."

Review Comment:
   To-do: "deterministic" instead of "non-deterministic"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1536792184


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,88 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {

Review Comment:
   Remove this and other added tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1534454045


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"

Review Comment:
   When an user query is big enough, it is hard to identify which filter the error belongs to. Let's help to user, and point out the filter expression:
   ```suggestion
         "The FILTER expression <filterExpr> in an aggregate function is invalid."
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "FILTER expression contains an aggregation."

Review Comment:
   And if you can, point out the aggregate expression. Just take `AggregateExpression` and pass it to `aggregateInAggregateFilterError()` in:
   ```scala
               case Some(filter) if filter.exists(_.isInstanceOf[AggregateExpression]) =>
                 throw QueryCompilationErrors.aggregateInAggregateFilterError()
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "FILTER expression contains an aggregation."

Review Comment:
   Let's tell to users what we expected and what we found:
   ```suggestion
             "Expected an expression without any aggregation but found <aggExpr>."
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {

Review Comment:
   The `_ERROR` word is useless because all error classes in the files are related to errors. Let's remove the word, and assign shorted and "conditional" name like `INVALID_AGGREGATE_FILTER`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535547077


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {

Review Comment:
   Done as suggested.



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"

Review Comment:
   Done as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1534454045


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"

Review Comment:
   When user's query is big enough, it is hard to identify which filter the error belongs to. Let's help to user, and point out the filter expression:
   ```suggestion
         "The FILTER expression <filterExpr> in an aggregate function is invalid."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533667336


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,50 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {
+    val plan =
+      CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2")

Review Comment:
   All errors are available/visible through public API. So, if I understood your comment properly, I don't think this should be converted to internal errors. Example of one of the errors:
   <img width="1011" alt="image" src="https://github.com/apache/spark/assets/163021185/0c117a62-98c1-42b7-80fc-09e8309d918f">
   
   However, triggering public API in this test suite seems like too much setup work, since this suite doesn't extend `SQLTestUtilsBase` trait, that provides the support for `sql()` method that triggers public API.
   
   What I'm thinking about is additional tests in some other test suite that would reproduce errors through public API, though I'm not yet sure where exactly to add such tests.
   
   Am I missing something or my thoughts sound good?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535193927


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,50 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {
+    val plan =
+      CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2")

Review Comment:
   > However, triggering public API in this test suite seems like too much setup work ...
   
   Could you move the test queries to `group-by-filter.sql`, and regenerate the golden file.



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1746,6 +1746,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "Expected a FILTER expression without aggregation, but found <aggExpr>."
+        ]
+      },
+      "CONTAINS_WINDOW_FUNCTION" : {
+        "message" : [
+          "Expected a FILTER expression without window function, but found <windowExpr>."
+        ]
+      },
+      "NON_DETERMINISTIC" : {
+        "message" : [
+          "Expected non-deterministic FILTER expression."
+        ]
+      },
+      "NOT_BOOLEAN" : {
+        "message" : [
+          "Expected FILTER expression of boolean type."

Review Comment:
   Let's follow the convention of `toSQLType`, and output the type in the upper case:
   ```suggestion
             "Expected a FILTER expression of the BOOLEAN type."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535544501


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##########
@@ -620,28 +620,36 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase with Compilat
       messageParameters = Map("prettyName" -> toSQLId(prettyName), "syntax" -> toSQLStmt(syntax)))
   }
 
-  def nonDeterministicFilterInAggregateError(): Throwable = {
+  def nonDeterministicFilterInAggregateError(filterExpr: Expression): Throwable = {
     new AnalysisException(
-      errorClass = "_LEGACY_ERROR_TEMP_1024",
-      messageParameters = Map.empty)
+      errorClass = "INVALID_AGGREGATE_FILTER.NON_DETERMINISTIC",
+      messageParameters = Map("filterExpr" -> toPrettySQL(filterExpr)))

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2017685998

   @davidm-db BTW, do you have an account at OSS JIRA? If so, could you leave a comment that you are working on the ticket:  https://issues.apache.org/jira/browse/SPARK-47256 otherwise, please, submit a request for new account.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2019842649

   Thanks @MaxGekk and thanks for the help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533441619


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -4627,6 +4627,26 @@
     ],
     "sqlState" : "42KDF"
   },
+  "AGGREGATE_FILTER_EXPRESSION_NON_DETERMINISTIC" : {

Review Comment:
   If it is possible, let's do that. But we should avoid embedding text to source code or passing parts of error messages as parameters. Tech writers should correct error messages only in `error-classes.json`, and don't modify source code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535545940


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,50 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {
+    val plan =
+      CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2")

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535079267


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##########
@@ -620,28 +620,36 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase with Compilat
       messageParameters = Map("prettyName" -> toSQLId(prettyName), "syntax" -> toSQLStmt(syntax)))
   }
 
-  def nonDeterministicFilterInAggregateError(): Throwable = {
+  def nonDeterministicFilterInAggregateError(filterExpr: Expression): Throwable = {
     new AnalysisException(
-      errorClass = "_LEGACY_ERROR_TEMP_1024",
-      messageParameters = Map.empty)
+      errorClass = "INVALID_AGGREGATE_FILTER.NON_DETERMINISTIC",
+      messageParameters = Map("filterExpr" -> toPrettySQL(filterExpr)))

Review Comment:
   Could you use `toSQLExpr` instead of `toPrettySQL`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533260149


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,50 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {
+    val plan =
+      CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2")

Review Comment:
   @davidm-db Could you reproduce the errors using public APIs, please. If it is not possible, we should consider to convert the error to internal errors (see `SparkException.internalError()`).



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -4627,6 +4627,26 @@
     ],
     "sqlState" : "42KDF"
   },
+  "AGGREGATE_FILTER_EXPRESSION_NON_DETERMINISTIC" : {

Review Comment:
   Please, create an error class w/ a common error message, and its sub-classes w/ specific error messages.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "stefankandic (via GitHub)" <gi...@apache.org>.
stefankandic commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533431044


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -4627,6 +4627,26 @@
     ],
     "sqlState" : "42KDF"
   },
+  "AGGREGATE_FILTER_EXPRESSION_NON_DETERMINISTIC" : {

Review Comment:
   just a thought but we could also have just one error with parametrized message



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533648414


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -4627,6 +4627,26 @@
     ],
     "sqlState" : "42KDF"
   },
+  "AGGREGATE_FILTER_EXPRESSION_NON_DETERMINISTIC" : {

Review Comment:
   I haven't found a way/example to do what Stefan suggested without having to change source code, that is, to do everything within `error-classes.json`. So, I went with what Max suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535547549


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1,4 +1,32 @@
 {
+  "AGGREGATE_FILTER_EXPRESSION_ERROR" : {
+    "message" : [
+      "FILTER expression in aggregate function has errors:"
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "FILTER expression contains an aggregation."

Review Comment:
   Done as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1536792012


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,50 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {
+    val plan =
+      CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2")

Review Comment:
   @davidm-db Could you remove the tests from `AnalysisErrorSuite` because they just duplicate `group-by-filter.sql`, and don't bring any benefits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #45622: [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7]
URL: https://github.com/apache/spark/pull/45622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2018518146

   @davidm-db Congratulations with your first contribution to Apache Spark!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2011239760

   @davidm-db Could you trigger GitHub actions w/ tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db closed pull request #45622: [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7]
URL: https://github.com/apache/spark/pull/45622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1535545314


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1746,6 +1746,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "Expected a FILTER expression without aggregation, but found <aggExpr>."
+        ]
+      },
+      "CONTAINS_WINDOW_FUNCTION" : {
+        "message" : [
+          "Expected a FILTER expression without window function, but found <windowExpr>."
+        ]
+      },
+      "NON_DETERMINISTIC" : {
+        "message" : [
+          "Expected non-deterministic FILTER expression."
+        ]
+      },
+      "NOT_BOOLEAN" : {
+        "message" : [
+          "Expected FILTER expression of boolean type."

Review Comment:
   Thanks for the suggestions, I wasn't aware completely of this. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1533672407


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -119,7 +119,7 @@ case class TestFunctionWithTypeCheckFailure(
 
 case class UnresolvedTestPlan() extends UnresolvedLeafNode
 
-class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
+class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase with SharedSparkSession {

Review Comment:
   To-do: Remove `SharedSparkSession` trait, accidentally added while experimenting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2011935735

   > @davidm-db Could you trigger GitHub actions w/ tests.
   
   I didn't initially setup actions in my forked repo properly. Now that I have pushed new commit, seems like the Build is running properly? @MaxGekk is that it or am I supposed to trigger tests additionally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1536792261


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -807,22 +885,6 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
       """"explode(array(min(a)))", "explode(array(max(a)))"""" :: Nil
   )
 
-  errorTest(
-    "SPARK-38666: non-boolean aggregate filter",
-    CatalystSqlParser.parsePlan("SELECT sum(c) filter (where e) FROM TaBlE2"),
-    "FILTER expression is not of type boolean" :: Nil)
-
-  errorTest(
-    "SPARK-38666: aggregate in aggregate filter",
-    CatalystSqlParser.parsePlan("SELECT sum(c) filter (where max(e) > 1) FROM TaBlE2"),
-    "FILTER expression contains aggregate" :: Nil)
-
-  errorTest(
-    "SPARK-38666: window function in aggregate filter",
-    CatalystSqlParser.parsePlan("SELECT sum(c) " +
-       "filter (where nth_value(e, 2) over(order by b) > 1) FROM TaBlE2"),
-    "FILTER expression contains window function" :: Nil)
-

Review Comment:
   Keep it removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "davidm-db (via GitHub)" <gi...@apache.org>.
davidm-db commented on code in PR #45622:
URL: https://github.com/apache/spark/pull/45622#discussion_r1537324629


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -295,10 +295,88 @@ class AnalysisErrorSuite extends AnalysisTest with DataTypeErrorsBase {
         "aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) FILTER (WHERE c > 1)", 7, 76)))
   }
 
-  errorTest(
-    "non-deterministic filter predicate in aggregate functions",
-    CatalystSqlParser.parsePlan("SELECT count(a) FILTER (WHERE rand(int(c)) > 1) FROM TaBlE2"),
-    "FILTER expression is non-deterministic, it cannot be used in aggregate functions" :: Nil)
+  test("SPARK-47256: non deterministic FILTER expression in an aggregate function") {

Review Comment:
   Completely didn't notice that I haven't committed this change. Done, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47256][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_102[4-7] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #45622:
URL: https://github.com/apache/spark/pull/45622#issuecomment-2018514495

   +1, LGTM. Merging to master.
   Thank you, @davidm-db and @stefankandic for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org