You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2024/01/05 05:52:42 UTC

Re: [PR] [SPARK-46598][SQL] OrcColumnarBatchReader should should use ConstantColumnVector for missing columns [spark]

LuciferYang commented on PR #44598:
URL: https://github.com/apache/spark/pull/44598#issuecomment-1878163361

   ```
   [info] - SPARK-39557 INSERT INTO statements with tables with array defaults *** FAILED *** (448 milliseconds)
   [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 711.0 failed 1 times, most recent failure: Lost task 0.0 in stage 711.0 (TID 965) (localhost executor driver): java.lang.RuntimeException: DataType ARRAY<INT> is not supported in column vectorized reader.
   [info] 	at org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:96)
   [info] 	at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:197)
   [info] 	at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$2(OrcFileFormat.scala:214)
   [info] 	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:218)
   ```
   Seems if using `ConstantColumnVector`, some refactoring is needed for the `ColumnVectorUtils.populate` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org