You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/30 01:00:41 UTC

[GitHub] [iceberg] electrum commented on pull request #1499: Update the Iceberg spec for row-level deletes

electrum commented on pull request #1499:
URL: https://github.com/apache/iceberg/pull/1499#issuecomment-701098117


   I've thought a lot about equality deletes and I think they are the wrong design for what I see as the common case. Deletes often occur declaratively via a SQL statement such as `DELETE FROM t WHERE x = 5`. We can execute this very efficiently by simply recording the `x = 5` rather than finding and recording all matching rows. That's great.
   
   What doesn't make sense is recording the file name. The delete applies to everything visible in the table. I actually can't think of any reason why we'd want to restrict it to a specific file. One way to solve this is to make the file name optional -- treat it as a special column in the table, that is only present in the delete file if needed as a filter.
   
   If we remove the file name, then the equality delete becomes very small -- we would typically expect only one record per delete operation. At that point, having a separate file is overkill. Reading a separate file during planning for a single record is expensive. It would be better to store the equality delete inline, with the other metadata.
   
   Now that equality deletes are small and inline, we no longer have to limit them to simple equality. Basic range filters would cover common use cases, such as _"delete all data older than X"_.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org