You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/30 18:01:28 UTC

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1974: Flink: Add ChangeLog DataStream end-to-end unit tests.

aokolnychyi commented on a change in pull request #1974:
URL: https://github.com/apache/iceberg/pull/1974#discussion_r550279316



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -169,6 +172,17 @@ public Builder writeParallelism(int newWriteParallelism) {
       return this;
     }
 
+    /**
+     * Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.
+     *
+     * @param columns defines the iceberg table's key.
+     * @return {@link Builder} to connect the iceberg table.
+     */
+    public Builder equalityFieldColumns(List<String> columns) {

Review comment:
       I’d vote for not ensuring uniqueness as it is really hard at scale. If we are to ensure this at write, we have to join the incoming data with the target table making it really expensive. Doing this at read would require sorting the data not only by the sort key but also by the sequence number. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org