You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/05 05:15:46 UTC

[GitHub] [hudi] wangxianghu commented on a change in pull request #3024: [HUDI-1924] Add BootstrapFunction to support index bootstrap

wangxianghu commented on a change in pull request #3024:
URL: https://github.com/apache/hudi/pull/3024#discussion_r645943833



##########
File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java
##########
@@ -93,8 +95,15 @@ public static void main(String[] args) throws Exception {
         ), kafkaProps))
         .name("kafka_source")
         .uid("uid_kafka_source")
-        .map(new RowDataToHoodieFunction<>(rowType, conf), TypeInformation.of(HoodieRecord.class))
-        // Key-by record key, to avoid multiple subtasks write to a partition at the same time
+        .map(new RowDataToHoodieFunction<>(rowType, conf), TypeInformation.of(HoodieRecord.class));
+    if (conf.getBoolean(FlinkOptions.INDEX_BOOTSTRAP_ENABLED)) {

Review comment:
       1. Since this is a very time-consuming operation, we'd better add a 'TODO'  here, indicating that optimization is needed here.
   2. Maybe we should remind the user `HoodieDeltaStreamer` is not suitable to initialize on large tables when we have no checkpoint to restore from, Another pr on the docs of the website




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org