You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/03 01:19:03 UTC

[GitHub] [iceberg] rdblue commented on issue #360: Spec: Add column equality delete files

rdblue commented on issue #360:
URL: https://github.com/apache/iceberg/issues/360#issuecomment-653281824


   I think we're in agreement on a few points for moving forward:
   
   * We will use a static schema for equality deletes
   * We need to be able to reconstruct an equivalent stream of changes for streaming CDC pipelines
   * We should add a way to encode all of the columns for an equality delete and identify the subset used for deletion (for efficiency)
   * We should add a way to encode all columns into position delete files
   * For the CDC case, we'll first assume that we have the entire deleted row in delete events
   * We should handle a stream of upserts as a separate use case
   
   The doc describes ways to use both equality and position deletes for CDC streams. Sounds like equality would be ideal if (1) events have a unique ID, and (2) the execution has exactly-once semantics. Otherwise, I think it is possible to use position deletes. Which do you plan to target?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org