You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/11/13 17:39:41 UTC

[GitHub] [incubator-iceberg] xabriel opened a new pull request #640: Allow ordered projections when writing

xabriel opened a new pull request #640: Allow ordered projections when writing
URL: https://github.com/apache/incubator-iceberg/pull/640
 
 
   Having table schema `{ id : Int, data: String }`
   
   We want to be able to:
   
   ```
   spark.read
     .format("iceberg")
     .load(...)
     .select("id")
     .write
     .format("iceberg")
     .mode("append")
     .save(...)
   ```
   
   
   We were getting:
   ```
   java.lang.AssertionError: index (1) should < 1
   	at org.apache.spark.sql.catalyst.expressions.UnsafeRow.assertIndexIsValid(UnsafeRow.java:131)
   	at org.apache.spark.sql.catalyst.expressions.UnsafeRow.isNullAt(UnsafeRow.java:352)
   	at org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:308)
   	at org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:471)
   	at org.apache.iceberg.spark.data.SparkParquetWriters$InternalRowWriter.get(SparkParquetWriters.java:453)
   	at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:444)
   	at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:110)
   	at org.apache.iceberg.spark.source.Writer$BaseWriter.writeInternal(Writer.java:388)
   	at org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:472)
   	at org.apache.iceberg.spark.source.Writer$UnpartitionedWriter.write(Writer.java:455)
   	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118)
   	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
   	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
   	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
   	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
   	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:123)
   	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   The stack points to Iceberg expecting `UnsafeRow` to match `table.schema`.
   
   With this PR, we bubble down the write schema so writes of ordered projections are allowed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org