You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "dengziming (via GitHub)" <gi...@apache.org> on 2023/10/28 03:40:21 UTC

[PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

dengziming opened a new pull request, #43567:
URL: https://github.com/apache/spark/pull/43567

   ### What changes were proposed in this pull request?
   The 4 errors are `[CONCAT/FLATTEN/CREATE/UNION]_ARRAYS_WITH_ELEMENTS_EXCEED_LIMIT`,
   CONCAT_XXX is used in concat/array_insert;
   FLATTEN_XXX is only used in flatten;
   CREATE_XXX is used in array_repeat/array_insert/array_distinct/array_union/array_intersect/array_remove;
   UNION_XXX is used in array_union/array_distinct;
   
   I changed all inconsistent usages to `CREATE_XXX`, and `CONCAT/FLATTEN/UNION_XXX` is only used in concat/flatten/union respectively.
   
   
   ### Why are the changes needed?
   To assign proper name as a part of activity in SPARK-37935.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the error message will include the error class name.
   
   
   ### How was this patch tested?
   CREATE can be tested from use code, CONCAT/FLATTEN is tested using a `ColumnarArray`, UNION can't be tested since it will deduplicate the data can create a array which will cause OOM.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #43567:
URL: https://github.com/apache/spark/pull/43567#issuecomment-1793792762

   The failed python GA is not related to the changes, and it passed a couple commits above.
   +1, LGTM. Merging to master.
   Thank you, @dengziming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

Posted by "dengziming (via GitHub)" <gi...@apache.org>.

dengziming commented on PR #43567:
URL: https://github.com/apache/spark/pull/43567#issuecomment-1783686054

   The use of the 4 similar errors is a bit confusing, we have separate errors for flatten/concat/union, but we are using `CREATE_XXX` for array_repeat/array_insert/array_distinct/...
   I firstly move `CREATE_XXX` to a subclass of `INVALID_PARAMETER_VALUE`, but moved it out since it's used in other places.
   Do you think we should merge these 4 errors into one `CREATE_ARRAYS_WITH_ELEMENTS_EXCEED_LIMIT`. @MaxGekk 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #43567:
URL: https://github.com/apache/spark/pull/43567#discussion_r1382526287


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -2586,13 +2577,14 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE
   }
 
   def tooManyArrayElementsError(
-      numElements: Int,
-      elementSize: Int): SparkIllegalArgumentException = {
+      numElements: Long,
+      maxRoundedArrayLength: Int): SparkIllegalArgumentException = {
     new SparkIllegalArgumentException(
-      errorClass = "TOO_MANY_ARRAY_ELEMENTS",
+      errorClass = "COLLECTION_SIZE_LIMIT_EXCEEDED",

Review Comment:
   This should be a sub-class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #43567:
URL: https://github.com/apache/spark/pull/43567#discussion_r1382362098


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -405,6 +405,29 @@
     ],
     "sqlState" : "42704"
   },
+  "COLLECTION_SIZE_LIMIT_EXCEEDED" : {
+    "message" : [
+      "Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
+    ],
+    "subClass" : {
+      "FUNCTION" : {
+        "message" : [
+          "unsuccessful try to create arrays in function <functionName>."
+        ]
+      },
+      "INITIALIZE" : {
+        "message" : [
+          "cannot initialize array with specified parameters."

Review Comment:
   ```suggestion
             "cannot initialize an array with specified parameters."
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -405,6 +405,29 @@
     ],
     "sqlState" : "42704"
   },
+  "COLLECTION_SIZE_LIMIT_EXCEEDED" : {
+    "message" : [
+      "Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
+    ],
+    "subClass" : {
+      "FUNCTION" : {
+        "message" : [
+          "unsuccessful try to create arrays in function <functionName>."
+        ]
+      },
+      "INITIALIZE" : {
+        "message" : [
+          "cannot initialize array with specified parameters."
+        ]
+      },
+      "PARAMETER" : {
+        "message" : [
+          "the value of parameter(s) <parameter> in <functionName> is invalid."

Review Comment:
   ```suggestion
             "the value of parameter(s) <parameter> in the function <functionName> is invalid."
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala:
##########
@@ -1095,6 +1098,55 @@ class QueryExecutionErrorsSuite
       )
     )
   }
+
+  test("Elements exceed limit for concat()") {
+    val array = new ColumnarArray(
+      new ConstantColumnVector(Int.MaxValue, BooleanType), 0, Int.MaxValue)
+
+    checkError(
+      exception = intercept[SparkRuntimeException] {
+        Concat(Seq(Literal.create(array, ArrayType(BooleanType)))).eval(EmptyRow)
+      },
+      errorClass = "COLLECTION_SIZE_LIMIT_EXCEEDED.FUNCTION",
+      parameters = Map(
+        "numberOfElements" -> Int.MaxValue.toString,
+        "maxRoundedArrayLength" -> MAX_ROUNDED_ARRAY_LENGTH.toString,
+        "functionName" -> toSQLId("concat")
+      )
+    )
+  }
+
+  test("Elements exceed limit for flatten()") {
+    val array = new ColumnarArray(
+      new ConstantColumnVector(Int.MaxValue, BooleanType), 0, Int.MaxValue)
+
+    checkError(
+      exception = intercept[SparkRuntimeException] {
+        Flatten(CreateArray(Seq(Literal.create(array, ArrayType(BooleanType))))).eval(EmptyRow)
+      },
+      errorClass = "COLLECTION_SIZE_LIMIT_EXCEEDED.FUNCTION",
+      parameters = Map(
+        "numberOfElements" -> Int.MaxValue.toString,
+        "maxRoundedArrayLength" -> MAX_ROUNDED_ARRAY_LENGTH.toString,
+        "functionName" -> toSQLId("flatten")
+      )
+    )
+  }
+
+  test("Elements exceed limit for array_repeat()") {
+    checkError(
+      exception = intercept[SparkRuntimeException] {
+        sql("select array_repeat(1, 2147483647)").collect()
+      },
+      errorClass = "COLLECTION_SIZE_LIMIT_EXCEEDED.PARAMETER",
+      parameters = Map(
+        "parameter" -> toSQLId("count"),
+        "numberOfElements" -> "2147483647",

Review Comment:
   nit: place to a val if it is the same as in `sql`



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -405,6 +405,29 @@
     ],
     "sqlState" : "42704"
   },
+  "COLLECTION_SIZE_LIMIT_EXCEEDED" : {
+    "message" : [
+      "Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
+    ],
+    "subClass" : {
+      "FUNCTION" : {
+        "message" : [
+          "unsuccessful try to create arrays in function <functionName>."

Review Comment:
   ```suggestion
             "unsuccessful try to create arrays in the function <functionName>."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk closed pull request #43567: [SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62]
URL: https://github.com/apache/spark/pull/43567


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org