You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/07 10:36:27 UTC

[GitHub] [hudi] codope commented on a diff in pull request #4118: [HUDI-2774] Handle duplicate instants while fetching pending clustering plans

codope commented on code in PR #4118:
URL: https://github.com/apache/hudi/pull/4118#discussion_r844982092


##########
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##########
@@ -124,7 +125,16 @@
         // get all filegroups in the plan
         getFileGroupEntriesInClusteringPlan(clusteringPlan.getLeft(), clusteringPlan.getRight()));
 
-    Map<HoodieFileGroupId, HoodieInstant> resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+    Map<HoodieFileGroupId, HoodieInstant> resultMap;
+    try {
+      resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+    } catch (Exception e) {
+      if (e instanceof IllegalStateException && e.getMessage().contains("Duplicate key")) {
+        throw new HoodieException("Found duplicate file groups pending clustering. If you're running deltastreamer in continuous mode, consider adding delay using --min-sync-interval-seconds. "

Review Comment:
   anyway, now we have OCC with in process lock provider when metadata is enabled and users just need to set one config to adjust concurrency mode in case of deltastreamer/spark streaming: `HoodieWriteConfig#AUTO_ADJUST_LOCK_CONFIGS`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org