You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/17 03:39:32 UTC

[GitHub] [iceberg] openinx commented on pull request #2010: Core: Add primary key spec.

openinx commented on pull request #2010:
URL: https://github.com/apache/iceberg/pull/2010#issuecomment-800769586

Okay, after reconsidered the primary key uniqueness issues, it's hard to guarantee the uniqueness in an embedded table format lib, If both the spark job and flink streaming job are writing the same iceberg table I couldn't think of a good and efficient way to guarantee the uniqueness of primary key. If we have an online server in front of those data files, then it's will be easy to guarantee the uniqueness because all of the write requests will be send to the same online server and the server could decide how to reject those duplicated write request, while for an iceberg table format it's hard to synchronize between different computation job.

So I'm fine to introduce the primary key without enforced uniqueness. @jackye1995 Did you start this work in your repo ? Should we update this PR based the above discussion ? ( Sorry about the delay).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org