You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/26 01:55:46 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

HyukjinKwon commented on a change in pull request #29421:
URL: https://github.com/apache/spark/pull/29421#discussion_r511684691



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##########
@@ -111,15 +111,14 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
               .zip(outputFieldWriters)
               .map { case (data, writer) => writer(data) })
       } else {
-        // In schema less mode, hive default serde will choose first two output column as output
-        // if output column size less then 2, it will throw ArrayIndexOutOfBoundsException.
-        // Here we change spark's behavior same as hive's default serde.
-        // But in hive, TRANSFORM with schema less behavior like origin spark, we will fix this
-        // to keep spark and hive behavior same in SPARK-32388
+        // In schema less mode, hive will choose first two output column as output.
+        // If output column size less then 2, it will return NULL for columns with missing values.
+        // Here we split row string and choose first 2 values, if values's size less then 2,
+        // we pad NULL value until 2 to make behavior same with hive.
         val kvWriter = CatalystTypeConverters.createToCatalystConverter(StringType)
         prevLine: String =>
           new GenericInternalRow(
-            prevLine.split(outputRowFormat).slice(0, 2)
+            prevLine.split(outputRowFormat).slice(0, 2).padTo(2, null)

Review comment:
       @AngersZhuuuu, can you add a configuration to control the legacy behaviour, and add a note to the migration guide since it's arguable to say this is a bug? With that, I think I can leave my sign-off and merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org