You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/07 06:03:54 UTC

[GitHub] [iceberg] mehtaashish23 commented on issue #2040: Partial data ingestion to Iceberg in failing with Spark 3.0.x

mehtaashish23 commented on issue #2040:
URL: https://github.com/apache/iceberg/issues/2040#issuecomment-755905969


   @rdblue : I tried to debug this, it seems in V2WriteCommand in Spark 3.0.x (not the latest master) gets the table (NamedRelation) with 3 columns [here] in output, whereas in Spark 3.2.x (latest master) we are able to write partial data, because we are getting 2 columns there (not sure what major things have changed, and if they are intended)
   
   NOTE: If I try to insert data using insert SQL in Spark 3.2.x (where the dataFrame.write works fine) I hit following
   
   So at this point in time, I am not clear what's the correct direction to resolve this, because of inconsistent behavior.
   
   ```
   scala> sc.parallelize(Seq((s"f1ValueFile",s"f2ValueFile"))).toDF("f1","f2").registerTempTable("input")
   warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
   
   scala> spark.read.format("iceberg").load(masterTablePath).registerTempTable("target")
   warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
   
   scala> spark.sql("insert into target (select * from input)").show(false)
   org.apache.spark.sql.AnalysisException: Cannot write to 'file:////Users/ashmehta/git_code/aep/siphon-tools/test/172adb28-dff0-4063-bdea-0cd533c4c8de-master', not enough data columns:
   Table columns: 'f1', 'f2', 'f3'
   Data columns: 'f1', 'f2';
   ```
   
   
   [here]: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala#L47


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org