You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/16 09:59:13 UTC

[GitHub] [iceberg] GrigorievNick commented on issue #1202: [Question] Do Spark Iceberg module implement Copy On Write Delete and Update?

GrigorievNick commented on issue #1202:
URL: https://github.com/apache/iceberg/issues/1202#issuecomment-659308492


   > Both Spark 2.4 and Spark 3.0 support dynamic partition overwrite. Spark 3.0 also supports overwrite by expression, although the expression must match all rows in a data file or no rows of a data file, or else it will cause an exception because the granularity of delete is a whole data file.
   
   But Overwrite that implemented in delete is match smarter then overwrite all data in the partition. 
   it will change only files that contain changes, while simple overwrite will update all partition.
   So of course I can read data all data from partition -> manipulate -> overwrite. 
   But I can do it with any code. What I am looking for is to update only files that match changes.
   So as I understand, there is no such solution right now, yes?
   
   I can implement it manually using low-level(java-core) API.
   But in this case, I have one more question, which I can't find in docs. 
   Does it possible to do concurrent [Table Operation](https://iceberg.apache.org/api/#table-metadata) -> `newRewrite`?
    Small explanation: I will have different spark partitions that will overwrite one or a few dataFiles.
   And of course, a partition is idempotent and running in parallel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org