You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/11/20 12:12:33 UTC

[PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

beliefer opened a new pull request, #43910:
URL: https://github.com/apache/spark/pull/43910

   ### What changes were proposed in this pull request?
   Spark SQL parser have a special rule to parse `[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v)`.
   We should merge this rule into the `functionCall`.
   
   
   ### Why are the changes needed?
   Merge the parse rule of PercentileCont and PercentileDisc into functionCall
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   New test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403371641


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,21 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case factory: InverseDistributionFactory if u.sortOrder.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  factory.prettyName)
+              }
+              factory.createInverseDistributionFunction(u.sortOrder.get)
+            case factory: InverseDistributionFactory =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                factory.prettyName)
+            case other if u.sortOrder.isDefined =>
+              throw QueryCompilationErrors.unsupportedInverseDistributionFunctionError(

Review Comment:
   maybe we can add an extra case match at the beginning
   ```
   case _ if !func.isInstanceOf[InverseDistributionFactory] && u.sortOrder.isDefined => fail
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460451568


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403375882


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -975,6 +975,7 @@ primaryExpression
     | LEFT_PAREN query RIGHT_PAREN                                                             #subqueryExpression
     | functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
        (COMMA argument+=functionArgument)*)? RIGHT_PAREN
+       (WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN)?

Review Comment:
   can we check other databases? Is the sort ordering always one item?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer closed pull request #43910: [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall
URL: https://github.com/apache/spark/pull/43910


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400103512


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -444,3 +444,81 @@ case class PercentileDisc(
     }
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on " +
+    "a continuous distribution of numeric or ANSI interval column `col` at the given " +
+    "`percentage` (specified in ORDER BY clause).",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
+       2.5
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
+       0-2
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+// scalastyle:on line.size.limit
+object PercentileContBuilder extends ExpressionBuilder {
+  override def build(funcName: String, expressions: Seq[Expression]): Expression = {
+    val numArgs = expressions.length
+    if (numArgs == 2) {

Review Comment:
   For now, we implement WITHIN GROUP within the expression itself, instead of a general approach. I think this will not be changed in the near future. However, from the API perspective, WITHIN GROUP does like a general feature for aggregate functions.
   
   We should make the framework more flexible. My idea is
   1. function lookup should only look at the function name and inputs, not WITHIN GROUP (it's consistent with other features like DISTINCT, FILTER, etc.)
   2. after function lookup, we get an expression, and the expression should implement a trait to indicate that it supports `WITHIN GROUP`. Otherwise we fail.
   
   For example, we can have a 
   ```
   trait SupportsWithinGroup {
     def withOrderings(orderings: Seq[SortOrder])
   }
   ```
   
   The two percentile expressions should implement this trait so that they can get the orderings in a post-hoc way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403375320


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala:
##########
@@ -298,11 +298,12 @@ case class UnresolvedFunction(
     arguments: Seq[Expression],
     isDistinct: Boolean,
     filter: Option[Expression] = None,
-    ignoreNulls: Boolean = false)
+    ignoreNulls: Boolean = false,
+    sortOrder: Option[SortOrder] = None)

Review Comment:
   maybe `orderingWithinGroup`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400585209


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -444,3 +444,81 @@ case class PercentileDisc(
     }
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on " +
+    "a continuous distribution of numeric or ANSI interval column `col` at the given " +
+    "`percentage` (specified in ORDER BY clause).",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
+       2.5
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
+       0-2
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+// scalastyle:on line.size.limit
+object PercentileContBuilder extends ExpressionBuilder {
+  override def build(funcName: String, expressions: Seq[Expression]): Expression = {
+    val numArgs = expressions.length
+    if (numArgs == 2) {

Review Comment:
   Because the two percentile expressions have a lot of complexity. I recommend the factory mode for inverse distribution function.
   ```
   trait InverseDistributionFactory extends AggregateFunction {
     def createInverseDistributionFunction(sortOrder: SortOrder): AggregateFunction
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #43910:
URL: https://github.com/apache/spark/pull/43910#issuecomment-1838565793

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403370382


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,21 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case factory: InverseDistributionFactory if u.sortOrder.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  factory.prettyName)
+              }
+              factory.createInverseDistributionFunction(u.sortOrder.get)
+            case factory: InverseDistributionFactory =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                factory.prettyName)
+            case other if u.sortOrder.isDefined =>
+              throw QueryCompilationErrors.unsupportedInverseDistributionFunctionError(

Review Comment:
   It only covers unsupported agg func. We should cover all cases in `validateFunction`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400112228


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I referenced H2, SQLServer and Postgres, they call these function `inverse distribution function`.
   Addition, some databases call these functions as `ordered set function`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400112228


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I references H2, SQLServer and Postgres, they call these function inverse distribution function.
   Addition, some databases call these functions as `ordered set function`.



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I references H2, SQLServer and Postgres, they call these function `inverse distribution function`.
   Addition, some databases call these functions as `ordered set function`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1413839440


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1860,6 +1860,11 @@
         "message" : [
           "WITHIN GROUP is required for inverse distribution function."
         ]
+      },
+      "WRONG_NUM_ORDERINGS" : {
+        "message" : [
+          "WITHIN GROUP requires <expectedNum> orderings but the actual number is <actualNum>."

Review Comment:
   oh, let's make it a bit clearer then.
   ```
   Requires <expectedNum> orderings in WITHIN GROUP but got <actualNum>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400094864


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   is `inverse distribution` a common terminology?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1404331597


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -975,6 +975,7 @@ primaryExpression
     | LEFT_PAREN query RIGHT_PAREN                                                             #subqueryExpression
     | functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
        (COMMA argument+=functionArgument)*)? RIGHT_PAREN
+       (WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN)?

Review Comment:
   I checked the inverse distribution functions in Postgres and Oracle 
   `percentile_cont(0.25) WITHIN GROUP (ORDER BY salary, bonus)`
   
   Postgres throws
   ```
   SQL 错误 [42883]: ERROR: function percentile_cont(numeric, numeric, double precision) does not exist
     建议：No function matches the given name and argument types. You might need to add explicit type casts.
     位置：52
   ```
   Oracle throws
   `SQL 错误 [909] [42000]: ORA-00909: 参数个数无效`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400095922


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2232,13 +2232,22 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
             }
           }
 
-          case u @ UnresolvedFunction(nameParts, arguments, _, _, _) => withPosition(u) {
-            resolveBuiltinOrTempFunction(nameParts, arguments, Some(u)).getOrElse {
+          case u @ UnresolvedFunction(nameParts, arguments, _, _, _, sortOrder) => withPosition(u) {
+            val args = sortOrder match {
+              case Some(s) if nameParts.length == 1 &&
+                (nameParts.head == "percentile_cont" || nameParts.head == "percentile_disc") =>

Review Comment:
   can we lookup the function first then check the expression? it's hacky to check the name here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403385272


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   e.g. we can put a null literal as a placeholder for `PercentileCont.left`, and replace it later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412087449


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -444,3 +463,59 @@ case class PercentileDisc(
     }
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on " +
+    "a continuous distribution of numeric or ANSI interval column `col` at the given " +
+    "`percentage` (specified in ORDER BY clause).",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
+       2.5
+      > SELECT _FUNC_(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
+       0-2
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+// scalastyle:on line.size.limit
+object PercentileContBuilder extends ExpressionBuilder {
+  override def build(funcName: String, expressions: Seq[Expression]): Expression = {
+    val numArgs = expressions.length
+    if (numArgs == 1) {
+      PercentileCont(UnresolvedWithinGroup, expressions(0))
+    } else if (numArgs == 0) {
+      throw QueryCompilationErrors.inverseDistributionFunctionMissingPercentageError(funcName)

Review Comment:
   why do we need to throw a different error here? It's just `QueryCompilationErrors.wrongNumArgsError`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403372590


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,21 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case factory: InverseDistributionFactory if u.sortOrder.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  factory.prettyName)
+              }
+              factory.createInverseDistributionFunction(u.sortOrder.get)
+            case factory: InverseDistributionFactory =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                factory.prettyName)
+            case other if u.sortOrder.isDefined =>
+              throw QueryCompilationErrors.unsupportedInverseDistributionFunctionError(

Review Comment:
   BTW, I think the existing `functionWithUnsupportedSyntaxError` is fine here. We don't need to create a new error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #43910:
URL: https://github.com/apache/spark/pull/43910#issuecomment-1833375606

   @cloud-fan Please take a look? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1413751001


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1860,6 +1860,11 @@
         "message" : [
           "WITHIN GROUP is required for inverse distribution function."
         ]
+      },
+      "WRONG_NUM_ORDERINGS" : {
+        "message" : [
+          "WITHIN GROUP requires <expectedNum> orderings but the actual number is <actualNum>."

Review Comment:
   can we mention the function name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1413434804


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -417,6 +429,17 @@ case class PercentileDisc(
     s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY ${left.sql}$direction)"
   }
 
+  override def withOrderingWithinGroup(orderingWithinGroup: Seq[SortOrder]): AggregateFunction = {
+    if (orderingWithinGroup.length != 1) {
+      throw QueryCompilationErrors.wrongNumArgsError(

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460419199


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   I selected `ID001` due to ID is the abbreviation of `Invalid inverse distribution function`.
   `42K0K` seems good too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400136130


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2232,13 +2232,22 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
             }
           }
 
-          case u @ UnresolvedFunction(nameParts, arguments, _, _, _) => withPosition(u) {
-            resolveBuiltinOrTempFunction(nameParts, arguments, Some(u)).getOrElse {
+          case u @ UnresolvedFunction(nameParts, arguments, _, _, _, sortOrder) => withPosition(u) {
+            val args = sortOrder match {
+              case Some(s) if nameParts.length == 1 &&
+                (nameParts.head == "percentile_cont" || nameParts.head == "percentile_disc") =>

Review Comment:
   Yes. These code just is a temp implementation. It's used let you give a first review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400112228


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I references H2 SQLServer and Postgres, they call these function inverse distribution function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403387271


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   We still need a base trait
   ```
   trait SupportsOrderingWithinGroup { self: AggregateFunction =>
     def withOrderingWithinGroup(orderingWithGroup: Seq[SortOrder]): Expression
   }
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1404331597


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -975,6 +975,7 @@ primaryExpression
     | LEFT_PAREN query RIGHT_PAREN                                                             #subqueryExpression
     | functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
        (COMMA argument+=functionArgument)*)? RIGHT_PAREN
+       (WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN)?

Review Comment:
   I checked the inverse distribution functions in Postgres and Oracle 
   Postgres throws
   ```
   SQL 错误 [42883]: ERROR: function percentile_cont(numeric, numeric, double precision) does not exist
     建议：No function matches the given name and argument types. You might need to add explicit type casts.
     位置：52
   ```
   Oracle throws
   `SQL 错误 [909] [42000]: ORA-00909: 参数个数无效`



##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -975,6 +975,7 @@ primaryExpression
     | LEFT_PAREN query RIGHT_PAREN                                                             #subqueryExpression
     | functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
        (COMMA argument+=functionArgument)*)? RIGHT_PAREN
+       (WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN)?

Review Comment:
   I checked the inverse distribution functions in Postgres and Oracle 
   Postgres throws
   ```
   SQL 错误 [42883]: ERROR: function percentile_cont(numeric, numeric, double precision) does not exist
     建议：No function matches the given name and argument types. You might need to add explicit type casts.
     位置：52
   ```
   Oracle throws
   `SQL 错误 [909] [42000]: ORA-00909: 参数个数无效`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460131124


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   I can't find it in https://www.postgresql.org/docs/current/errcodes-appendix.html , did you make it up yourself? @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460279707


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   You means reference the error code from postgresql?
   If so, I will pick it up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403388877


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   if null literal is too hack, we can create a placeholder expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400112228


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I references H2 and Postgres, they call these function inverse distribution function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #43910:
URL: https://github.com/apache/spark/pull/43910#issuecomment-1839846140

   The GA failure is unrelated.
   Merged to master
   @cloud-fan Thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1404711706


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   I think using the factory model or using `SupportsOrderingWithinGroup` is just a difference in specific implementation methods, and I have no preference for the two.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412078992


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,27 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case idf: SupportsOrderingWithinGroup if u.orderingWithinGroup.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  idf.prettyName)
+              }
+              if (idf.isFake) {
+                idf.withOrderingWithinGroup(u.orderingWithinGroup.get)
+              } else {
+                idf
+              }
+            case idf: SupportsOrderingWithinGroup =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                idf.prettyName)
+            case _ =>
+              if (u.orderingWithinGroup.isDefined) {
+                throw QueryCompilationErrors.functionWithUnsupportedSyntaxError(
+                  func.prettyName, "WITHIN GROUP (ORDER BY clause)")

Review Comment:
   ```suggestion
                     func.prettyName, "WITHIN GROUP (ORDER BY ...)")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412076685


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -975,6 +975,7 @@ primaryExpression
     | LEFT_PAREN query RIGHT_PAREN                                                             #subqueryExpression
     | functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
        (COMMA argument+=functionArgument)*)? RIGHT_PAREN
+       (WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN)?

Review Comment:
   https://www.postgresql.org/docs/9.4/functions-aggregate.html
   
   So the WITHIN GROUP can have multiple order by items. We should allow it in the parser and can fail for certain functions that only allow one sort order.
   <img width="1593" alt="image" src="https://github.com/apache/spark/assets/3182036/e53ebb9f-7f91-4a37-b843-9f8ebd4406d3">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412078444


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,27 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case idf: SupportsOrderingWithinGroup if u.orderingWithinGroup.isDefined =>

Review Comment:
   This is not future-proof. `AggregateWindowFunction` extends `AggregateFunction`, but we are not performing the DISTINCT check for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1404302461


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,21 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case factory: InverseDistributionFactory if u.sortOrder.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  factory.prettyName)
+              }
+              factory.createInverseDistributionFunction(u.sortOrder.get)
+            case factory: InverseDistributionFactory =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                factory.prettyName)
+            case other if u.sortOrder.isDefined =>
+              throw QueryCompilationErrors.unsupportedInverseDistributionFunctionError(

Review Comment:
   > BTW, I think the existing `functionWithUnsupportedSyntaxError` is fine here. We don't need to create a new error.
   
   I tried to find out the suitable error, but it too many. Thank you for the reminder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1400112228


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,34 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."

Review Comment:
   I references H2, SQLServer and Postgres, they call these function inverse distribution function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1411953063


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1846,6 +1846,29 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
+    "message" : [
+      "Invalid inverse distribution function <funcName>."
+    ],
+    "subClass" : {
+      "DISTINCT_UNSUPPORTED" : {
+        "message" : [
+          "Cannot use DISTINCT with WITHIN GROUP."
+        ]
+      },
+      "PERCENTILE_PERCENTAGE_MISSING" : {
+        "message" : [
+          "Missing percentage."

Review Comment:
   This doesn't seem like a thing to inverse distribution function. It should be specific to certain functions and shouldn't be in this error class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412086209


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/SupportsOrderingWithinGroup.scala:
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+
+/**
+ * The trait used to set the [[SortOrder]] after inverse distribution functions parsed.
+ */
+trait SupportsOrderingWithinGroup { self: AggregateFunction =>
+  def isFake: Boolean

Review Comment:
   do we need this flag? I think we can always call `withOrderingWithinGroup`, as it only happens once we turn `UnresolvedFunction` into resolved ones.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412088178


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,27 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case idf: SupportsOrderingWithinGroup if u.orderingWithinGroup.isDefined =>

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1413757547


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1860,6 +1860,11 @@
         "message" : [
           "WITHIN GROUP is required for inverse distribution function."
         ]
+      },
+      "WRONG_NUM_ORDERINGS" : {
+        "message" : [
+          "WITHIN GROUP requires <expectedNum> orderings but the actual number is <actualNum>."

Review Comment:
   The parent error class contains it.
   `"Invalid inverse distribution function <funcName>."`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460425591


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   OK, so you invented it. Let's not do it again. SQLSTATE is a standard and the prefix has meanings. Can you open a followup PR to change it to 42K0K? thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403384646


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   instead of having a factory, shall we create a fake `PercentileCont` first and then update its `left` expression later in `validateFunction`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1403376757


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -350,6 +352,36 @@ case class Median(child: Expression)
       newChild: Expression): Median = this.copy(child = newChild)
 }
 
+/**
+ * A factory used to create the inverse distribution function during the analysis phase.
+ * This factory is also an aggregate function that cannot be evaluated.
+ */
+trait InverseDistributionFactory extends AggregateFunction {

Review Comment:
   can we move it to a new file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1404301532


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2367,6 +2367,21 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         case agg: AggregateFunction =>
           // Note: PythonUDAF does not support these advanced clauses.
           if (agg.isInstanceOf[PythonUDAF]) checkUnsupportedAggregateClause(agg, u)
+          val newAgg = agg match {
+            case factory: InverseDistributionFactory if u.sortOrder.isDefined =>
+              if (u.isDistinct) {
+                throw QueryCompilationErrors.distinctInverseDistributionFunctionUnsupportedError(
+                  factory.prettyName)
+              }
+              factory.createInverseDistributionFunction(u.sortOrder.get)
+            case factory: InverseDistributionFactory =>
+              throw QueryCompilationErrors.inverseDistributionFunctionMissingWithinGroupError(
+                factory.prettyName)
+            case other if u.sortOrder.isDefined =>
+              throw QueryCompilationErrors.unsupportedInverseDistributionFunctionError(

Review Comment:
   > It only covers unsupported agg func. We should cover all cases in `validateFunction`
   
   Thank you for the reminder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412085256


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala:
##########
@@ -318,7 +319,17 @@ case class UnresolvedFunction(
   override protected def withNewChildrenInternal(
       newChildren: IndexedSeq[Expression]): UnresolvedFunction = {
     if (filter.isDefined) {
-      copy(arguments = newChildren.dropRight(1), filter = Some(newChildren.last))
+      if (orderingWithinGroup.isDefined) {
+        val newSortOrder = Some(newChildren.last.asInstanceOf[SortOrder])
+        val args = newChildren.dropRight(1)

Review Comment:
   nit: `dropRight` creates a new collection, let's more efficient here
   ```
   val newSortOrder = Some(newChildren.last.asInstanceOf[SortOrder])
   val newFilter = Some(newChildren(newChildren.length - 2).asInstanceOf[SortOrder])
   ... arguments = newChildren.dropRight(2)
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala:
##########
@@ -318,7 +319,17 @@ case class UnresolvedFunction(
   override protected def withNewChildrenInternal(
       newChildren: IndexedSeq[Expression]): UnresolvedFunction = {
     if (filter.isDefined) {
-      copy(arguments = newChildren.dropRight(1), filter = Some(newChildren.last))
+      if (orderingWithinGroup.isDefined) {
+        val newSortOrder = Some(newChildren.last.asInstanceOf[SortOrder])
+        val args = newChildren.dropRight(1)

Review Comment:
   nit: `dropRight` creates a new collection, let's be more efficient here
   ```
   val newSortOrder = Some(newChildren.last.asInstanceOf[SortOrder])
   val newFilter = Some(newChildren(newChildren.length - 2).asInstanceOf[SortOrder])
   ... arguments = newChildren.dropRight(2)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412093209


##########
sql/core/src/test/resources/sql-tests/analyzer-results/percentiles.sql.out:
##########
@@ -149,6 +149,226 @@ Aggregate [median(v#x) AS median(v)#x, percentile(v#x, cast(0.5 as double), 1, 0
                +- LocalRelation [k#x, v#x]
 
 
+-- !query
+SELECT
+  round(v, 0) WITHIN GROUP (ORDER BY v)
+FROM aggr
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "_LEGACY_ERROR_TEMP_1023",

Review Comment:
   Let me make a separate PR for this, because this mistake is very common and can be easily corrected. It's not easy to review, and I'm also not good at resolving conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1412089938


##########
sql/core/src/test/resources/sql-tests/analyzer-results/percentiles.sql.out:
##########
@@ -149,6 +149,226 @@ Aggregate [median(v#x) AS median(v)#x, percentile(v#x, cast(0.5 as double), 1, 0
                +- LocalRelation [k#x, v#x]
 
 
+-- !query
+SELECT
+  round(v, 0) WITHIN GROUP (ORDER BY v)
+FROM aggr
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "_LEGACY_ERROR_TEMP_1023",

Review Comment:
   since we are here, shall we create error classes for them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1413434642


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##########
@@ -374,6 +375,17 @@ case class PercentileCont(left: Expression, right: Expression, reverse: Boolean
     percentile.checkInputDataTypes()
   }
 
+  override def withOrderingWithinGroup(orderingWithinGroup: Seq[SortOrder]): AggregateFunction = {
+    if (orderingWithinGroup.length != 1) {
+      throw QueryCompilationErrors.wrongNumArgsError(
+        nodeName, Seq(2), orderingWithinGroup.length + 1)

Review Comment:
   Can we create a new sub-error-class under `INVALID_INVERSE_DISTRIBUTION_FUNCTION`? like `WRONG_NUM_ORDERINGS`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43910:
URL: https://github.com/apache/spark/pull/43910#discussion_r1460313065


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from:
 |HZ320    |HZ   |RDA-specific condition                            |320     |version not supported                                       |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ321    |HZ   |RDA-specific condition                            |321     |TCP/IP error                                                |RDA/SQL        |Y       |RDA/SQL                                                                     |
 |HZ322    |HZ   |RDA-specific condition                            |322     |TLS alert                                                   |RDA/SQL        |Y       |RDA/SQL                                                                     |
+|ID001    |IM   |Invalid inverse distribution function             |001     |Invalid inverse distribution function                       |SQL/Foundation |N       |SQL/Foundation PostgreSQL Oracle Snowflake Redshift H2                      |

Review Comment:
   I mean we should be honest here. If it's not from PostgreSQL or other system, and we do want a new category here, I'd suggest `42K0K`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org