You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/28 08:47:01 UTC

[GitHub] [hudi] danny0405 commented on a change in pull request #5093: [HUDI-3539] Flink bucket index bucketID bootstrap optimization.

danny0405 commented on a change in pull request #5093:
URL: https://github.com/apache/hudi/pull/5093#discussion_r836190712



##########
File path: hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java
##########
@@ -109,25 +118,30 @@ public void initializeState(FunctionInitializationContext context) throws Except
   @Override
   public void snapshotState() {
     super.snapshotState();
-    this.bucketIndex.putAll(this.incBucketIndex);
     this.incBucketIndex.clear();
   }
 
   @Override
   public void processElement(I i, ProcessFunction<I, Object>.Context context, Collector<Object> collector) throws Exception {
     HoodieRecord<?> record = (HoodieRecord<?>) i;
     final HoodieKey hoodieKey = record.getKey();
+    final String partition = hoodieKey.getPartitionPath();
     final HoodieRecordLocation location;
 
+    bootstrapIndexIfNeed(partition);
+    Map<Integer, String> bucketToFileIdMap = bucketIndex.get(partition);
     final int bucketNum = BucketIdentifier.getBucketId(hoodieKey, indexKeyFields, this.bucketNum);

Review comment:
       `.get(partition)` -> `computeIfAbsent(partition, p -> new HashMap<>())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org