You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/06 23:16:28 UTC

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5478: [HUDI-3998] Fix getCommitsSinceLastCleaning failed when async cleaning

nsivabalan commented on code in PR #5478:
URL: https://github.com/apache/hudi/pull/5478#discussion_r964245759


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -64,11 +65,16 @@ private int getCommitsSinceLastCleaning() {
     Option<HoodieInstant> lastCleanInstant = table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
     HoodieTimeline commitTimeline = table.getActiveTimeline().getCommitsTimeline().filterCompletedInstants();
 
-    String latestCleanTs;
-    int numCommits = 0;
-    if (lastCleanInstant.isPresent()) {
-      latestCleanTs = lastCleanInstant.get().getTimestamp();
-      numCommits = commitTimeline.findInstantsAfter(latestCleanTs).countInstants();
+    int numCommits;
+    if (lastCleanInstant.isPresent() && !table.getActiveTimeline().isEmpty(lastCleanInstant.get())) {
+      try {
+        HoodieCleanMetadata cleanMetadata = TimelineMetadataUtils
+            .deserializeHoodieCleanMetadata(table.getActiveTimeline().getInstantDetails(lastCleanInstant.get()).get());
+        String lastCompletedCommitTimestamp = cleanMetadata.getLastCompletedCommitTimestamp();
+        numCommits = commitTimeline.findInstantsAfter(lastCompletedCommitTimestamp).countInstants();
+      } catch (IOException e) {
+        throw new HoodieIOException(e.getMessage(), e);

Review Comment:
   exception (2nd arg) will carry the msg anyways. can we fix the first argument w/ custom msg ("Parsing of last clean instant " + lastCleanInstant.get() + " failed") 



##########
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java:
##########
@@ -502,14 +502,20 @@ public void handle(@NotNull Context context) throws Exception {
         if (refreshCheck) {
           long beginFinalCheck = System.currentTimeMillis();
           if (isLocalViewBehind(context)) {
-            String errMsg =
-                "Last known instant from client was "
-                    + context.queryParam(RemoteHoodieTableFileSystemView.LAST_INSTANT_TS,
-                        HoodieTimeline.INVALID_INSTANT_TS)
-                    + " but server has the following timeline "
-                    + viewManager.getFileSystemView(context.queryParam(RemoteHoodieTableFileSystemView.BASEPATH_PARAM))
-                        .getTimeline().getInstants().collect(Collectors.toList());
-            throw new BadRequestResponse(errMsg);
+            String lastInstantTs = context.queryParam(RemoteHoodieTableFileSystemView.LAST_INSTANT_TS,
+                HoodieTimeline.INVALID_INSTANT_TS);
+            HoodieTimeline localTimeline =
+                viewManager.getFileSystemView(context.queryParam(RemoteHoodieTableFileSystemView.BASEPATH_PARAM)).getTimeline();
+            HoodieTimeline afterLastInstantTimeLine = localTimeline.findInstantsAfter(lastInstantTs).filterCompletedInstants();
+            if (!(afterLastInstantTimeLine.countInstants() == 1

Review Comment:
   So, are we making an exception to just 1 case here. 
   i.e. when client has 1 extra commit compared to timeline server and if that is a clean action, we don't trigger a refresh ? 
   can you move this to a separate method. I see we might potentially add more cases in here going forward. 
   
   



##########
hudi-common/src/main/avro/HoodieCleanerPlan.avsc:
##########
@@ -42,6 +42,11 @@
       }],
       "default" : null
     },
+    {
+      "name": "lastCompletedCommitTimestamp",
+      "type": "string",
+      "default" : ""

Review Comment:
   same here.



##########
hudi-common/src/main/avro/HoodieCleanMetadata.avsc:
##########
@@ -23,6 +23,7 @@
      {"name": "timeTakenInMillis", "type": "long"},
      {"name": "totalFilesDeleted", "type": "int"},
      {"name": "earliestCommitToRetain", "type": "string"},
+     {"name": "lastCompletedCommitTimestamp", "type": "string", "default" : ""},

Review Comment:
   why empty string. we could go w/ null ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org