You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "jackye1995 (via GitHub)" <gi...@apache.org> on 2023/02/10 05:47:36 UTC

[GitHub] [iceberg] jackye1995 commented on issue #6781: Fix migration of Delta table that has performed VACUUM

jackye1995 commented on issue #6781:
URL: https://github.com/apache/iceberg/issues/6781#issuecomment-1425214434

   Thanks for the explanation!
   
   I am not sure how Delta leverages its logs. Does each log has a unique ID? Is that useable by end users? For Iceberg, users can query and do time travel by snapshot ID, and users can search for snapshot ID by system table `snapshots`. Is there any similar feature in Delta?
   
   If the log ID is an internal concept, then I would opt for just solution 2. Even if log ID is available for user to use, I would still say we should prioritize solution 2, because as an end user, I should not really care about the starting delta log version when I want to do an migration. It should just work. I won't even expose any configurations at this moment and just treat the file not found as a bug we need to fix by your proposal:
   
   > we can catch the IOException when trying to build the DataFile and skip the whole snapshot if any parquet file can not be found. Specifically, we should do this when there has been no version migrated yet. If there are some successfully migrated snapshot earlier, then the IOException must be caused by something else and we shall not skip the version as delta logs are consecutive. 
   
   Also I am curious is that the same experience in Databricks Delta? I think it should not be, because there needs to be a process to keep the delta log size short. @ericlgoodman do you know anything about this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org