You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/26 06:53:46 UTC

[GitHub] [iceberg] stevenzwu opened a new pull request #2745: add an optional uidPrefix to FlinkSink#Builder to explicitly set oper…

stevenzwu opened a new pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745


   …ator id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#discussion_r659745312



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -205,6 +207,30 @@ public Builder equalityFieldColumns(List<String> columns) {
       return this;
     }
 
+    /**
+     * Set the uid prefix for FlinkSink operators. Note that FlinkSink internally consists of multiple operators (like
+     * writer, committer, dummy sink etc.) Actually operator uid will be appended with a suffix like "uid-writer".
+     * <p>
+     * Flink auto generates operator uids if not set explicitly. It is a recommended
+     * <a href="https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/production_ready/">
+     * best-practice to set uid for all operators</a> before deploying to production. Flink has an option to {@code
+     * pipeline.auto-generate-uids=false} to disable auto-generation and force explicit setting of all operator uids.
+     * <p>
+     * Be careful with setting this for an existing job, because now we are changing the opeartor uid from an
+     * auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore
+     * the previous Flink sink operator state (more specifically the committer operator state). You need to use {@code
+     * --allowNonRestoredState} to ignore the previous sink state. During restore Flink sink state is used to check if
+     * checkpointed files were actually committed or not. {@code --allowNonRestoredState} can lead to data loss if the
+     * Iceberg commit failed in the last completed checkpoints.
+     *
+     * @param newPrefix defines the iceberg table's key.

Review comment:
       > @param newPrefix defines the iceberg table's key.
   
   I think we will need a correct parameter doc for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx merged pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx merged pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx merged pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx merged pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#issuecomment-870351727


   Got this PR merged,  thanks @stevenzwu for contributing ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#issuecomment-869647545


   @stevenzwu This PR looks pretty good to me, just left a small comment to fix this parameter comment !   The javadoc looks very friendly to datastream users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#issuecomment-869647545






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #2745: add an optional uidPrefix to FlinkSink#Builder to explicitly set oper…

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#issuecomment-868958704


   @openinx can you take a look? 
   
   This is to address this email thread from mailing list: https://www.mail-archive.com/dev@iceberg.apache.org/msg02269.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2745: Flink: Add an optional uidPrefix to FlinkSink#Builder to explicitly set operator uid.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#discussion_r659745312



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -205,6 +207,30 @@ public Builder equalityFieldColumns(List<String> columns) {
       return this;
     }
 
+    /**
+     * Set the uid prefix for FlinkSink operators. Note that FlinkSink internally consists of multiple operators (like
+     * writer, committer, dummy sink etc.) Actually operator uid will be appended with a suffix like "uid-writer".
+     * <p>
+     * Flink auto generates operator uids if not set explicitly. It is a recommended
+     * <a href="https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/production_ready/">
+     * best-practice to set uid for all operators</a> before deploying to production. Flink has an option to {@code
+     * pipeline.auto-generate-uids=false} to disable auto-generation and force explicit setting of all operator uids.
+     * <p>
+     * Be careful with setting this for an existing job, because now we are changing the opeartor uid from an
+     * auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore
+     * the previous Flink sink operator state (more specifically the committer operator state). You need to use {@code
+     * --allowNonRestoredState} to ignore the previous sink state. During restore Flink sink state is used to check if
+     * checkpointed files were actually committed or not. {@code --allowNonRestoredState} can lead to data loss if the
+     * Iceberg commit failed in the last completed checkpoints.
+     *
+     * @param newPrefix defines the iceberg table's key.

Review comment:
       > @param newPrefix defines the iceberg table's key.
   
   I think we will need a correct parameter doc for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org