You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/01 21:32:02 UTC

[GitHub] [iceberg] SreeramGarlapati commented on pull request #2611: Spark3 structured streaming micro_batch read support

SreeramGarlapati commented on pull request #2611:
URL: https://github.com/apache/iceberg/pull/2611#issuecomment-852459405


   @holdenk - thanks for your first take on the PR. Would be happy to hear out https://github.com/viirya's thoughts as well. Am unable to tag https://github.com/viirya. Please see if you can...
   
   The PR description about GDPR - is to decide between 
   a) whether to ignore deletes by `default` or 
   b) whether to take a special flag to be able to ignore deletes. 
   
   The reasoning is that - 
   * In Spark Structured streaming - we are STREAMING Iceberg table - row by row. ==> So, there is NO way to STREAM deletes from Iceberg table. 
   * Which implies ==> when we encounter deletes - we are left with 2 choices
       1. fail - with `UnsupportedSnapshotDataOperation - DELETE`
       2. Ignore deletes and move on. 
   * almost all of the users of the Iceberg tables want to be GDPR compliant.
   * which implies ==> they would want to delete some rows out of their Iceberg table & want to stream reads out of that table.
   * So, if we throw - UnsupportedOperation - when we encountered a Snapshot of type Delete while reading off of Iceberg Table -  potentially all tables out there will need to handle this!
   * So, my proposal is to accept that - Iceberg tables will have GDPR deletes - i.e., - if the Iceberg table has Snapshots which are marked as Delete - we will ignore that Snapshot for streaming read purposes. In the later PRs I will expose a Spark Option - which will give the ability to fail the streaming read - if a Delete is encountered. 
   
   Did this make sense!? Happy to discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org