You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/24 21:26:11 UTC

[GitHub] [hudi] bvaradar commented on a change in pull request #1964: [HUDI-1191] Add incremental meta client API to query partitions changed

bvaradar commented on a change in pull request #1964:
URL: https://github.com/apache/hudi/pull/1964#discussion_r475895369



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -296,6 +300,42 @@ public boolean isBeforeTimelineStarts(String instant) {
     return details.apply(instant);
   }
 
+  /**

Review comment:
       Timeline APIs are only about instants  in general. I think adding partitions here is breaking that abstraction. Can you move this to some helper class

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
##########
@@ -232,6 +233,12 @@
    */
   Option<byte[]> getInstantDetails(HoodieInstant instant);
 
+  /**
+   * Returns partitions that have been modified in the timeline. This includes internal operations such as clean.
+   * Note that this only returns data for completed instants.
+   */
+  List<String> getPartitionsMutated();

Review comment:
       +1 on abstraction point. I think having a separate helper class would be better.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -296,6 +300,42 @@ public boolean isBeforeTimelineStarts(String instant) {
     return details.apply(instant);
   }
 
+  /**
+   * Returns partitions that have been modified in the timeline. This includes internal operations such as clean.
+   * Note that this only returns data for completed instants.
+   */
+  public List<String> getPartitionsMutated() {
+    return filterCompletedInstants().getInstants().flatMap(s -> {
+      switch (s.getAction()) {
+        case HoodieTimeline.COMMIT_ACTION:
+        case HoodieTimeline.DELTA_COMMIT_ACTION:
+          try {
+            HoodieCommitMetadata commitMetadata = HoodieCommitMetadata.fromBytes(getInstantDetails(s).get(), HoodieCommitMetadata.class);
+            return commitMetadata.getPartitionToWriteStats().keySet().stream();
+          } catch (IOException e) {
+            throw new HoodieIOException("Failed to get partitions written between " + firstInstant() + " " + lastInstant(), e);
+          }
+        case HoodieTimeline.CLEAN_ACTION:

Review comment:
       We dont need to look at clean and (event compaction) for figuring out changed partitions for the case of hive syncing. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org