You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/13 05:00:27 UTC

[GitHub] [spark] LuciferYang opened a new pull request, #38234: [WIP][SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

LuciferYang opened a new pull request, #38234:
URL: https://github.com/apache/spark/pull/38234

   ### What changes were proposed in this pull request?
   This pr replace `TypeCheckFailure` by `DataTypeMismatch` in type checks in the percentile expressions, includes `ApproximatePercentile.scala` and `percentiles.scala`
   
   ### Why are the changes needed?
   Migration onto error classes unifies Spark SQL error messages.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. The PR changes user-facing error messages.
   
   ### How was this patch tested?
   - Pass GitHub Actions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38234: [WIP][SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r994176160


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -951,6 +951,45 @@
       "3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use this API with caution"
     ]
   },
+  "INVALID_PERCENTAGES" : {
+    "message" : [
+      "The percentage(s) "
+    ],
+    "subClass" : {

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r995232922


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,44 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",

Review Comment:
   OK, Let me see how to make `NON_FOLDABLE_INPUT ` more general
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r994852662


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,44 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",
+        messageParameters = Map(
+          "exprName" -> "percentage(s)",
+          "currentValue" -> percentageExpression.toString
+        )
+      )
+    } else if (!accuracyExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",

Review Comment:
   dito



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,44 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",
+        messageParameters = Map(
+          "exprName" -> "percentage(s)",
+          "currentValue" -> percentageExpression.toString
+        )
+      )
+    } else if (!accuracyExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",
+        messageParameters = Map(
+          "exprName" -> "accuracy",
+          "currentValue" -> accuracyExpression.toString
+        )
+      )
     } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
-      TypeCheckFailure(s"The accuracy provided must be a literal between (0, ${Int.MaxValue}]" +
-        s" (current value = $accuracy)")
+      DataTypeMismatch(
+        errorSubClass = "VALUE_OUT_OF_RANGE",
+        messageParameters = Map(
+          "exprName" -> "accuracy",
+          "valueRange" -> s"(0, ${Int.MaxValue}]",
+          "currentValue" -> accuracy.toString
+        )
+      )
     } else if (percentages == null) {
-      TypeCheckFailure("Percentage value must not be null")
+      DataTypeMismatch(
+        errorSubClass = "MUST_NOT_NULL",

Review Comment:
   Maybe, `UNEXPECTED_NULL`?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,44 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "MUST_BE_CONSTANT",

Review Comment:
   In fact, the input must be a foldable expr. Can't you re-use the existing error sub-class `NON_FOLDABLE_INPUT`? Maybe, make it more general to be able to use it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38234:
URL: https://github.com/apache/spark/pull/38234#issuecomment-1279699763

   Thanks very much for your guidance in this pr @MaxGekk 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r995865267


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -170,7 +170,7 @@
       },
       "NON_FOLDABLE_INPUT" : {
         "message" : [
-          "the input should be a foldable string expression and not null; however, got <inputExpr>."
+          "the input <inputName> should be a foldable <inputType> expression; however, got <inputExpr>."

Review Comment:
   if input is null, should change to use `UNEXPECTED_NULL`, else the inputType will be `Void`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes
URL: https://github.com/apache/spark/pull/38234


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38234: [WIP][SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r994174346


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -951,6 +951,45 @@
       "3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use this API with caution"
     ]
   },
+  "INVALID_PERCENTAGES" : {
+    "message" : [
+      "The percentage(s) "
+    ],
+    "subClass" : {

Review Comment:
   Please, move to the error class `DATATYPE_MISMATCH`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,32 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "INVALID_PERCENTAGES.MUST_CONSTANT",

Review Comment:
   Pass just an error sub-classes of `DATATYPE_MISMATCH`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38234:
URL: https://github.com/apache/spark/pull/38234#discussion_r995704873


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,46 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "NON_FOLDABLE_INPUT",
+        messageParameters = Map(
+          "inputName" -> "percentage(s)",
+          "inputType" -> "double",

Review Comment:
   Use, `toSQLType`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,46 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "NON_FOLDABLE_INPUT",
+        messageParameters = Map(
+          "inputName" -> "percentage(s)",
+          "inputType" -> "double",
+          "inputExpr" -> percentageExpression.toString

Review Comment:
   Please, use `toSQLExpr()`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,46 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "NON_FOLDABLE_INPUT",
+        messageParameters = Map(
+          "inputName" -> "percentage(s)",

Review Comment:
   Is it a parameter. If so, could you use parameter name from its description:
   _FUNC_(col, **percentage** [, accuracy])



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala:
##########
@@ -118,17 +118,46 @@ case class ApproximatePercentile(
     val defaultCheck = super.checkInputDataTypes()
     if (defaultCheck.isFailure) {
       defaultCheck
-    } else if (!percentageExpression.foldable || !accuracyExpression.foldable) {
-      TypeCheckFailure(s"The accuracy or percentage provided must be a constant literal")
+    } else if (!percentageExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "NON_FOLDABLE_INPUT",
+        messageParameters = Map(
+          "inputName" -> "percentage(s)",
+          "inputType" -> "double",
+          "inputExpr" -> percentageExpression.toString
+        )
+      )
+    } else if (!accuracyExpression.foldable) {
+      DataTypeMismatch(
+        errorSubClass = "NON_FOLDABLE_INPUT",
+        messageParameters = Map(
+          "inputName" -> "accuracy",
+          "inputType" -> "int",
+          "inputExpr" -> accuracyExpression.toString
+        )
+      )
     } else if (accuracy <= 0 || accuracy > Int.MaxValue) {
-      TypeCheckFailure(s"The accuracy provided must be a literal between (0, ${Int.MaxValue}]" +
-        s" (current value = $accuracy)")
+      DataTypeMismatch(
+        errorSubClass = "VALUE_OUT_OF_RANGE",
+        messageParameters = Map(
+          "exprName" -> "accuracy",
+          "valueRange" -> s"(0, ${Int.MaxValue}]",
+          "currentValue" -> accuracy.toString

Review Comment:
   Use `toSQLValue()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38234: [SPARK-40761][SQL] Migrate type check failures of percentile expressions onto error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38234:
URL: https://github.com/apache/spark/pull/38234#issuecomment-1279676492

   +1, LGTM. Merging to master.
   Thank you, @LuciferYang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org