You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dengziming (via GitHub)" <gi...@apache.org> on 2023/09/15 05:02:48 UTC

[GitHub] [spark] dengziming opened a new pull request, #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

dengziming opened a new pull request, #42939:
URL: https://github.com/apache/spark/pull/42939

   ### What changes were proposed in this pull request?
   Assign the name `CLASS_UNSUPPORTED_BY_MAP_OBJECTS` to the legacy error class `_LEGACY_ERROR_TEMP_2018`.
   
   
   ### Why are the changes needed?
   To assign proper name as a part of activity in SPARK-37935
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the error message will include the error class name
   
   
   ### How was this patch tested?
   Add a unit test to produce the error from user code.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42939:
URL: https://github.com/apache/spark/pull/42939#issuecomment-1728998636

   +1, LGTM. Merging to master.
   Thank you, @dengziming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dengziming commented on pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "dengziming (via GitHub)" <gi...@apache.org>.
dengziming commented on PR #42939:
URL: https://github.com/apache/spark/pull/42939#issuecomment-1728918587

   The CI passed, ping @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42939:
URL: https://github.com/apache/spark/pull/42939#issuecomment-1722537839

   @dengziming Please, format PR's title as other using tags `[SPARK-XXXXX][SQL] Assign ...`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42939:
URL: https://github.com/apache/spark/pull/42939#discussion_r1328174627


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala:
##########
@@ -170,7 +170,7 @@ object ExpressionEncoder {
    * Function that deserializes an [[InternalRow]] into an object of type `T`. This class is not
    * thread-safe.
    */
-  class Deserializer[T](private val expressions: Seq[Expression])
+  class Deserializer[T](val expressions: Seq[Expression])

Review Comment:
   I think for testing purpose https://github.com/apache/spark/pull/42939/files#diff-b98c99535d2b28cb47774860d500030e732c244c55b1ac05aead5d1cf1e7a602R2585.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018
URL: https://github.com/apache/spark/pull/42939


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #42939:
URL: https://github.com/apache/spark/pull/42939#discussion_r1328128269


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala:
##########
@@ -170,7 +170,7 @@ object ExpressionEncoder {
    * Function that deserializes an [[InternalRow]] into an object of type `T`. This class is not
    * thread-safe.
    */
-  class Deserializer[T](private val expressions: Seq[Expression])
+  class Deserializer[T](val expressions: Seq[Expression])

Review Comment:
   Could you clarify why do you need this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dengziming commented on a diff in pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "dengziming (via GitHub)" <gi...@apache.org>.
dengziming commented on code in PR #42939:
URL: https://github.com/apache/spark/pull/42939#discussion_r1328286795


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala:
##########
@@ -170,7 +170,7 @@ object ExpressionEncoder {
    * Function that deserializes an [[InternalRow]] into an object of type `T`. This class is not
    * thread-safe.
    */
-  class Deserializer[T](private val expressions: Seq[Expression])
+  class Deserializer[T](val expressions: Seq[Expression])

Review Comment:
   Yes, change for testing.



##########
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##########
@@ -2561,6 +2563,40 @@ class DatasetSuite extends QueryTest
 
     checkDataset(ds.filter(f(col("_1"))), Tuple1(ValueClass(2)))
   }
+
+  test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") {

Review Comment:
   Not sure is this the best place for this test case, currently we don't have test case for creating dataset using `spark.createDatSet`, so I add it to here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42939:
URL: https://github.com/apache/spark/pull/42939#issuecomment-1726452334

   @dengziming Could you rebase on the recent master, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dengziming commented on a diff in pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "dengziming (via GitHub)" <gi...@apache.org>.
dengziming commented on code in PR #42939:
URL: https://github.com/apache/spark/pull/42939#discussion_r1329790053


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -332,6 +332,11 @@
     ],
     "sqlState" : "22003"
   },
+  "CLASS_UNSUPPORTED_BY_MAP_OBJECTS" : {
+    "message" : [
+      "class `<cls>` is not supported by `MapObjects` as resulting collection."

Review Comment:
   Nice improvement, done.



##########
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##########
@@ -2561,6 +2563,40 @@ class DatasetSuite extends QueryTest
 
     checkDataset(ds.filter(f(col("_1"))), Tuple1(ValueClass(2)))
   }
+
+  test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") {
+    withSQLConf(
+      // Set CODEGEN_FACTORY_MODE to default value to reproduce CLASS_UNSUPPORTED_BY_MAP_OBJECTS
+      SQLConf.CODEGEN_FACTORY_MODE.key -> CodegenObjectFactoryMode.NO_CODEGEN.toString) {
+      // Create our own encoder to cover the default encoder from spark.implicits._
+      implicit val im: ExpressionEncoder[Array[Int]] = ExpressionEncoder(
+        AgnosticEncoders.IterableEncoder(
+          ClassTag(classOf[Array[Int]]), BoxedIntEncoder, false, false))
+
+      val df = spark.createDataset(Seq(Array(1)))
+      val exception = intercept[org.apache.spark.SparkRuntimeException] {
+        df.collect()
+      }
+      val expressions = im.resolveAndBind(df.queryExecution.logical.output,
+        spark.sessionState.analyzer)
+        .createDeserializer().expressions
+
+      // Expression decoding error
+      checkError(
+        exception = exception,
+        errorClass = "_LEGACY_ERROR_TEMP_2151",

Review Comment:
   Yes, I created SPARK-45213 for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #42939:
URL: https://github.com/apache/spark/pull/42939#discussion_r1329724651


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -332,6 +332,11 @@
     ],
     "sqlState" : "22003"
   },
+  "CLASS_UNSUPPORTED_BY_MAP_OBJECTS" : {
+    "message" : [
+      "class `<cls>` is not supported by `MapObjects` as resulting collection."

Review Comment:
   Let's change the order:
   ```suggestion
         "`MapObjects` does not support the class <cls> as resulting collection."
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##########
@@ -2561,6 +2563,40 @@ class DatasetSuite extends QueryTest
 
     checkDataset(ds.filter(f(col("_1"))), Tuple1(ValueClass(2)))
   }
+
+  test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") {
+    withSQLConf(
+      // Set CODEGEN_FACTORY_MODE to default value to reproduce CLASS_UNSUPPORTED_BY_MAP_OBJECTS
+      SQLConf.CODEGEN_FACTORY_MODE.key -> CodegenObjectFactoryMode.NO_CODEGEN.toString) {
+      // Create our own encoder to cover the default encoder from spark.implicits._
+      implicit val im: ExpressionEncoder[Array[Int]] = ExpressionEncoder(
+        AgnosticEncoders.IterableEncoder(
+          ClassTag(classOf[Array[Int]]), BoxedIntEncoder, false, false))
+
+      val df = spark.createDataset(Seq(Array(1)))
+      val exception = intercept[org.apache.spark.SparkRuntimeException] {
+        df.collect()
+      }
+      val expressions = im.resolveAndBind(df.queryExecution.logical.output,
+        spark.sessionState.analyzer)
+        .createDeserializer().expressions
+
+      // Expression decoding error
+      checkError(
+        exception = exception,
+        errorClass = "_LEGACY_ERROR_TEMP_2151",

Review Comment:
   Could you open an JIRA for this, and point out the test, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org