You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/16 23:39:23 UTC

[GitHub] [iceberg] mehtaashish23 opened a new issue #1949: Read options to read append and delete with overwrite snapshots in Incremental reads

mehtaashish23 opened a new issue #1949:
URL: https://github.com/apache/iceberg/issues/1949


   Currently, the IncrementalDataScan doesn't support IncrementalReads on Overwrite snapshot [here], but as a client, I should be able to read just append data or delete data explicitly and construct an incremental reader at the application level.
   For instance: A client doing updates based on the primary key can potentially be able to construct back CDC by reading append data and delete data in separate DataFrame, and then take client-side joins.
   
   There should be options for the reader to pass options 
   1. To read the overwrite snapshots (to allow appended data via it)
   2. Another option to read-only Deleted DataFiles during IncrementalScan
   
   NOTE: This is to allow clients, who would prefer "copy on write" implementation with Iceberg for executing SQL like DELETE or MERGE_INTO (in future).
   
   [here]: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/IncrementalDataTableScan.java#L122


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mehtaashish23 commented on issue #1949: Read options to read append and delete with overwrite snapshots in Incremental reads

Posted by GitBox <gi...@apache.org>.
mehtaashish23 commented on issue #1949:
URL: https://github.com/apache/iceberg/issues/1949#issuecomment-747106969


   @rdblue , @aokolnychyi Let me know your thoughts on it. We would like to contribute this to community ASAP, since we are looking to use copy-on-write mode with Delete/MergeInto (before switching to "merge on read" implementation)
   
   cc: @dilipbiswal


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #1949: Read options to read append and delete with overwrite snapshots in Incremental reads

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #1949:
URL: https://github.com/apache/iceberg/issues/1949#issuecomment-747156346


   Can you give us a clearer example of what you're trying to support here? I'm trying to understand the use case for ignoring the deletes in an overwrite.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mehtaashish23 commented on issue #1949: Read options to read append and delete with overwrite snapshots in Incremental reads

Posted by GitBox <gi...@apache.org>.
mehtaashish23 commented on issue #1949:
URL: https://github.com/apache/iceberg/issues/1949#issuecomment-748189545


   @rdblue started an email thread for this, will summarize the details here as it concludes, but here is the detail. 
   
   For us, our main use case is primary key based dataSets (like MySQL bin log export) where the DELETE and MERGE always
   update records based on the primary key. Considering that I know the primary key and this fixed use case of primary key-
   based updates, I can easily construct back CDC from the appended/deleted data from the table, by taking full outer join on 
   primary key between appended data and deleted data, and expose what all rows were updated/inserted/deleted, along with 
   the previous value in case of updates/deletes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san commented on issue #1949: Read options to read append and delete with overwrite snapshots in Incremental reads

Posted by GitBox <gi...@apache.org>.
ayush-san commented on issue #1949:
URL: https://github.com/apache/iceberg/issues/1949#issuecomment-856442050


   @mehtaashish23 We are also doing something similar, were you able to contribute this feature?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org