You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/22 23:30:29 UTC

[GitHub] [iceberg] hililiwei commented on issue #5339: Adding the same file twice for the same table

hililiwei commented on issue #5339:
URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1193003590

   Similarly, in Flink, when we write data, we need to find a way to avoid double commits.
   We might add a default behavior that does not allow the same file to be submitted twice. In addition to checking the file path, we should also check the file content, such as verifying the MD5, to ensure that the contents of the two files are also consistent.
   If we really want to add duplicate files, we can enforce it by an option like 'force', just like what @Spince said.
   Or reverse the logic and allow it by default. This has the advantage of preserving compatibility, consistent with our current behavior.
   In conclusion, I believe that it is useful to provide such a mechanism.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org