You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "vinothchandar (via GitHub)" <gi...@apache.org> on 2023/04/13 01:58:16 UTC

[GitHub] [hudi] vinothchandar commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

vinothchandar commented on code in PR #7627:
URL: https://github.com/apache/hudi/pull/7627#discussion_r1164862966


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##########
@@ -205,6 +218,9 @@ class IncrementalRelation(val sqlContext: SQLContext,
       val endInstantArchived = commitTimeline.isBeforeTimelineStarts(endInstantTime)
 
       val scanDf = if (fallbackToFullTableScan && (startInstantArchived || endInstantArchived)) {
+        if (useStateTransitionTime) {
+          throw new HoodieException("Cannot use stateTransitionTime while enables full table scan")

Review Comment:
   throw error or could we log a warn and move on?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java:
##########
@@ -70,32 +100,38 @@ public enum State {
     NIL
   }
 
-  private State state = State.COMPLETED;
-  private String action;
-  private String timestamp;
+  private final State state;
+  private final String action;
+  private final String timestamp;
+  private final String stateTransitionTime;
 
   /**
    * Load the instant from the meta FileStatus.
    */
   public HoodieInstant(FileStatus fileStatus) {
     // First read the instant timestamp. [==>20170101193025<==].commit
     String fileName = fileStatus.getPath().getName();
-    String fileExtension = getTimelineFileExtension(fileName);
-    timestamp = fileName.replace(fileExtension, "");
-
-    // Next read the action for this marker
-    action = fileExtension.replaceFirst(".", "");
-    if (action.equals("inflight")) {
-      // This is to support backwards compatibility on how in-flight commit files were written
-      // General rule is inflight extension is .<action>.inflight, but for commit it is .inflight
-      action = "commit";
-      state = State.INFLIGHT;
-    } else if (action.contains(HoodieTimeline.INFLIGHT_EXTENSION)) {
-      state = State.INFLIGHT;
-      action = action.replace(HoodieTimeline.INFLIGHT_EXTENSION, "");
-    } else if (action.contains(HoodieTimeline.REQUESTED_EXTENSION)) {
-      state = State.REQUESTED;
-      action = action.replace(HoodieTimeline.REQUESTED_EXTENSION, "");
+    Matcher matcher = NAME_FORMAT.matcher(fileName);
+    if (matcher.find()) {
+      timestamp = matcher.group(1);
+      if (matcher.group(2).equals(HoodieTimeline.INFLIGHT_EXTENSION)) {
+        // This is to support backwards compatibility on how in-flight commit files were written
+        // General rule is inflight extension is .<action>.inflight, but for commit it is .inflight
+        action = "commit";
+        state = State.INFLIGHT;
+      } else {
+        action = matcher.group(2).replaceFirst(".", "");
+        if (matcher.groupCount() == 3 && matcher.group(3) != null) {
+          state = State.valueOf(matcher.group(3).replaceFirst(".", "").toUpperCase());
+        } else {
+          // Like 20230104152218702.commit
+          state = State.COMPLETED;
+        }
+      }
+      stateTransitionTime =

Review Comment:
   I am worried that the modification time will change when you say copy over the table to a new location .. Then all the incremental queries need to be restarted?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##########
@@ -82,9 +87,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
 
   private val lastInstant = commitTimeline.lastInstant().get()
 
-  private val commitsTimelineToReturn = commitTimeline.findInstantsInRange(
-    optParams(DataSourceReadOptions.BEGIN_INSTANTTIME.key),
-    optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key(), lastInstant.getTimestamp))
+  private val commitsTimelineToReturn = {
+    if (useStateTransitionTime) {

Review Comment:
   how does this affect the timetravel query? (which basically hits the same code path as inc query with same begin and end instant times)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org