You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/23 05:22:38 UTC

[GitHub] [iceberg] pvary commented on issue #5339: Adding the same file twice for the same table

pvary commented on issue #5339:
URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1193063396

   > Yea I think there was an old similar discussion here: #3064. I think we can do a per check of all files added in same transaction, but anything beyond that involves an expensive spark call to check for duplicates in the table itself?
   
   Thanks @szehon-ho, I was not aware of the old thread. It seems like a reasonable comprise to accept duplicated files, if we do not parse the whole table metadata anyway.
   What is the level of the data parsed when we have a `Table` object at hand? Which metadata files do we read when we commit something? Does anyone have a quick answer for this, or shall I check?
   
   Thanks everyone for the answers!
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org