You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/28 08:55:01 UTC

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

juliuszsompolski commented on code in PR #38397:
URL: https://github.com/apache/spark/pull/38397#discussion_r1007823310


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala:
##########
@@ -126,9 +136,24 @@ class OrcFileFormat
 
     val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields)
     val sqlConf = sparkSession.sessionState.conf
-    val enableVectorizedReader = supportBatch(sparkSession, resultSchema)
     val capacity = sqlConf.orcVectorizedReaderBatchSize
 
+    // Should always be set by FileSourceScanExec creating this.
+    // Check conf before checking option, to allow working around an issue by changing conf.
+    val enableVectorizedReader = sqlConf.orcVectorizedReaderEnabled &&
+      options.get(FileFormat.OPTION_RETURNING_BATCH)
+        .getOrElse {
+          throw new IllegalArgumentException(
+            "OPTION_RETURNING_BATCH should always be set for OrcFileFormat." +
+              "To workaround this issue, set spark.sql.orc.enableVectorizedReader=false.")

Review Comment:
   > Is this a correct recommendation? Why not recommend to set OPTION_RETURNING_BATCH?
   
   @dongjoon-hyun passing OPTION_RETURNING_BATCH is something that the developer of the code that called without setting this option can do. For an end user who faces this issue by hitting some code path that doesn't set this, the workaround would be to disable this config. Hence it's called a "workaround" not a "fix".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org