You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/06 15:15:59 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #3426: [HUDI-2285] Metadata Table synchronous design

nsivabalan commented on a change in pull request #3426:
URL: https://github.com/apache/hudi/pull/3426#discussion_r702969171



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -200,20 +200,19 @@ public boolean archiveIfRequired(HoodieEngineContext context) throws IOException
         .collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
             HoodieInstant.getComparableAction(i.getAction()))));
 
-    // If metadata table is enabled, do not archive instants which are more recent that the latest synced
-    // instant on the metadata table. This is required for metadata table sync.
+    // If metadata table is enabled, do not archive instants which are more recent that the last compaction on the
+    // metadata table.
     if (config.useFileListingMetadata()) {
       try (HoodieTableMetadata tableMetadata = HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
           config.getBasePath(), FileSystemViewStorageConfig.FILESYSTEM_VIEW_SPILLABLE_DIR.defaultValue())) {
-        Option<String> lastSyncedInstantTime = tableMetadata.getSyncedInstantTime();
-
-        if (lastSyncedInstantTime.isPresent()) {
-          LOG.info("Limiting archiving of instants to last synced instant on metadata table at " + lastSyncedInstantTime.get());
-          instants = instants.filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), HoodieTimeline.LESSER_THAN,
-              lastSyncedInstantTime.get()));
-        } else {
-          LOG.info("Not archiving as there is no instants yet on the metadata table");
+        Option<String> latestCompactionTime = tableMetadata.getLatestCompactionTime();

Review comment:
       bcoz, as per the patch, we assume metadata table is in full sync w/ dataset table, up until compacted time for sure. for delta commits, there could be some gaps due to multi-writes. But when it comes to base file(metadata table), we ensure there are no gaps and hence in this piece of code, if compaction hasn't happened on metadata table, we can't archive. Else, we could miss out on some commits not synced to metadata table. Also on similar lines, we limit the instants only to those which are compacted in metadata table.  
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org