You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "prashantwason (via GitHub)" <gi...@apache.org> on 2023/04/28 22:45:50 UTC

[GitHub] [hudi] prashantwason opened a new pull request, #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

prashantwason opened a new pull request, #8605:
URL: https://github.com/apache/hudi/pull/8605

   [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.
   
   ### Change Logs
   
   Check explicitly for older style timestamps in the timeline during tagLocation.
   
   ### Impact
   
   Fixed the check.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua merged PR #8605:
URL: https://github.com/apache/hudi/pull/8605


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1186572034


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieIndex.java:
##########
@@ -587,6 +594,43 @@ public void testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath() thro
     assertEquals(incomingPayloadSamePartition.getJsonData(), ((RawTripTestPayload) record.getData()).getJsonData());
   }
 
+  @Test
+  public void testCheckIfValidCommit() throws Exception {
+    setUp(IndexType.BLOOM, true, false);
+
+    // When timeline is empty, all commits are invalid
+    HoodieTimeline timeline = new HoodieDefaultTimeline(Collections.EMPTY_LIST.stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertTrue(timeline.empty());
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, "001"));
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieActiveTimeline.createNewInstantTime()));
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
+
+    // Valid when timeline contains the timestamp or the timestamp is before timeline start
+    final HoodieInstant instant1 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, "010");
+    String instantTimestamp = HoodieActiveTimeline.createNewInstantTime();
+    final HoodieInstant instant2 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieActiveTimeline.createNewInstantTime());
+    timeline = new HoodieDefaultTimeline(Arrays.asList(instant1, instant2).stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertFalse(timeline.empty());
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, instant1.getTimestamp()));
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, instant2.getTimestamp()));
+    // no instant on timeline
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, instantTimestamp));
+    // future timestamp
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieActiveTimeline.createNewInstantTime()));
+    // timestamp before timeline starts
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, "001"));
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    instantTimestamp = HoodieActiveTimeline.createNewInstantTime();
+    final String instanceTimestampSec = instantTimestamp.substring(0, instantTimestamp.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+    final HoodieInstant instant3 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, instanceTimestampSec);
+    timeline = new HoodieDefaultTimeline(Arrays.asList(instant1, instant3).stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertFalse(timeline.empty());
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, instantTimestamp));

Review Comment:
   minor. 
   can we add couple of more cases. 
   a. generate a sec format instant time and checkIfValid should return false assuming its > first entry in the active timeline.  CheckifContainsOrBefore() should return false.
   b. generate a sec format instant time lesser than first entry in the active timeline. CheckifContainsOrBefore() should return true 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1560507783

   @prashantwason There are some checkstyle isses we need to fix: https://github.com/apache/hudi/actions/runs/5065044371/jobs/9093199158?pr=8605


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1203451740


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieIndex.java:
##########
@@ -587,6 +594,43 @@ public void testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath() thro
     assertEquals(incomingPayloadSamePartition.getJsonData(), ((RawTripTestPayload) record.getData()).getJsonData());
   }
 
+  @Test
+  public void testCheckIfValidCommit() throws Exception {
+    setUp(IndexType.BLOOM, true, false);
+
+    // When timeline is empty, all commits are invalid
+    HoodieTimeline timeline = new HoodieDefaultTimeline(Collections.EMPTY_LIST.stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertTrue(timeline.empty());
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, "001"));
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieActiveTimeline.createNewInstantTime()));
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
+
+    // Valid when timeline contains the timestamp or the timestamp is before timeline start
+    final HoodieInstant instant1 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, "010");
+    String instantTimestamp = HoodieActiveTimeline.createNewInstantTime();
+    final HoodieInstant instant2 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieActiveTimeline.createNewInstantTime());
+    timeline = new HoodieDefaultTimeline(Arrays.asList(instant1, instant2).stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertFalse(timeline.empty());
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, instant1.getTimestamp()));
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, instant2.getTimestamp()));
+    // no instant on timeline
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, instantTimestamp));
+    // future timestamp
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieActiveTimeline.createNewInstantTime()));
+    // timestamp before timeline starts
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, "001"));
+    assertTrue(HoodieIndexUtils.checkIfValidCommit(timeline, HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    instantTimestamp = HoodieActiveTimeline.createNewInstantTime();
+    final String instanceTimestampSec = instantTimestamp.substring(0, instantTimestamp.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+    final HoodieInstant instant3 = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, instanceTimestampSec);
+    timeline = new HoodieDefaultTimeline(Arrays.asList(instant1, instant3).stream(), metaClient.getActiveTimeline()::getInstantDetails);
+    assertFalse(timeline.empty());
+    assertFalse(HoodieIndexUtils.checkIfValidCommit(timeline, instantTimestamp));

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1561258592

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c3b02c63d14d0ff217ed697eef6daba701f9bb88 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1528210426

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 141f56805c474aca55f9952ca35e75726f4171cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1203436289


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -170,9 +171,35 @@ public static List<String> filterKeysFromFile(Path filePath, List<String> candid
     return foundRecordKeys;
   }
 
+  /**
+   * Check if the given commit timestamp is valid for the timeline.
+   *
+   * The commit timestamp is considered to be valid if:
+   *   1. the commit timestamp is present in the timeline, or
+   *   2. the commit timestamp is less than the first commit timestamp in the timeline
+   *
+   * @param commitTimeline  The timeline
+   * @param commitTs        The commit timestamp to check
+   * @return                true if the commit timestamp is valid for the timeline
+   */
   public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) {
-    // Check if the last commit ts for this row is 1) present in the timeline or
-    // 2) is less than the first commit ts in the timeline
-    return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs);
+    if (commitTimeline.empty()) {
+      return false;
+    }
+
+    // Check for 0.8+ timestamps which have msec granularity

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1560519104

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 141f56805c474aca55f9952ca35e75726f4171cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749) 
   * c3b02c63d14d0ff217ed697eef6daba701f9bb88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1569365504

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17494",
       "triggerID" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4255b784b88cb29b346462f5ca8b49fd7091ab85 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17494) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1569191152

   I fixed the checkstyle issues.  Once CI passes, we can land the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1560477871

   @nsivabalan PTAL again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1185550921


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -170,9 +171,35 @@ public static List<String> filterKeysFromFile(Path filePath, List<String> candid
     return foundRecordKeys;
   }
 
+  /**
+   * Check if the given commit timestamp is valid for the timeline.
+   *
+   * The commit timestamp is considered to be valid if:
+   *   1. the commit timestamp is present in the timeline, or
+   *   2. the commit timestamp is less than the first commit timestamp in the timeline
+   *
+   * @param commitTimeline  The timeline
+   * @param commitTs        The commit timestamp to check
+   * @return                true if the commit timestamp is valid for the timeline
+   */
   public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) {
-    // Check if the last commit ts for this row is 1) present in the timeline or
-    // 2) is less than the first commit ts in the timeline
-    return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs);
+    if (commitTimeline.empty()) {
+      return false;
+    }
+
+    // Check for 0.8+ timestamps which have msec granularity
+    if (commitTimeline.containsOrBeforeTimelineStarts(commitTs)) {
+      return true;
+    }
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    if (commitTs.length() == HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && commitTs.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) {
+      final String actualOlderFormatTs = commitTs.substring(0, commitTs.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+      if (commitTimeline.containsOrBeforeTimelineStarts(actualOlderFormatTs)) {
+        return true;
+      }
+    }

Review Comment:
   when we migrated from sec to ms, we were able to manage it w/o explicitly adding a new version to the timeline. 
   
   wrt, moving it inside : 
   I also feel, moving it inside makes sense so that any callers will get it rather than just the IndexUtils
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -170,9 +171,35 @@ public static List<String> filterKeysFromFile(Path filePath, List<String> candid
     return foundRecordKeys;
   }
 
+  /**
+   * Check if the given commit timestamp is valid for the timeline.
+   *
+   * The commit timestamp is considered to be valid if:
+   *   1. the commit timestamp is present in the timeline, or
+   *   2. the commit timestamp is less than the first commit timestamp in the timeline
+   *
+   * @param commitTimeline  The timeline
+   * @param commitTs        The commit timestamp to check
+   * @return                true if the commit timestamp is valid for the timeline
+   */
   public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) {
-    // Check if the last commit ts for this row is 1) present in the timeline or
-    // 2) is less than the first commit ts in the timeline
-    return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs);
+    if (commitTimeline.empty()) {
+      return false;
+    }
+
+    // Check for 0.8+ timestamps which have msec granularity

Review Comment:
   minor. check for "0.10.0 +" 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1560512097

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 141f56805c474aca55f9952ca35e75726f4171cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749) 
   * c3b02c63d14d0ff217ed697eef6daba701f9bb88 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1203490569


##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##########
@@ -388,12 +388,23 @@ public boolean containsInstant(HoodieInstant instant) {
 
   @Override
   public boolean containsInstant(String ts) {
-    return getInstantsAsStream().anyMatch(s -> s.getTimestamp().equals(ts));
+    // Check for 0.10.0+ timestamps which have msec granularity
+    if (getInstantsAsStream().anyMatch(s -> s.getTimestamp().equals(ts))) {
+      return true;
+    }
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    if (ts.length() == HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && ts.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) {
+      final String actualOlderFormatTs = ts.substring(0, ts.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+      return containsOrBeforeTimelineStarts(actualOlderFormatTs);
+    }
+
+    return false;
   }
 
   @Override
   public boolean containsOrBeforeTimelineStarts(String instant) {
-    return getInstantsAsStream().anyMatch(s -> s.getTimestamp().equals(instant)) || isBeforeTimelineStarts(instant);
+    return containsInstant(instant) || isBeforeTimelineStarts(instant);

Review Comment:
   Nice catch ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1569233109

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c3b02c63d14d0ff217ed697eef6daba701f9bb88 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326) 
   * 4255b784b88cb29b346462f5ca8b49fd7091ab85 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1528196371

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 141f56805c474aca55f9952ca35e75726f4171cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1181157260


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -170,9 +171,35 @@ public static List<String> filterKeysFromFile(Path filePath, List<String> candid
     return foundRecordKeys;
   }
 
+  /**
+   * Check if the given commit timestamp is valid for the timeline.
+   *
+   * The commit timestamp is considered to be valid if:
+   *   1. the commit timestamp is present in the timeline, or
+   *   2. the commit timestamp is less than the first commit timestamp in the timeline
+   *
+   * @param commitTimeline  The timeline
+   * @param commitTs        The commit timestamp to check
+   * @return                true if the commit timestamp is valid for the timeline
+   */
   public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) {
-    // Check if the last commit ts for this row is 1) present in the timeline or
-    // 2) is less than the first commit ts in the timeline
-    return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs);
+    if (commitTimeline.empty()) {
+      return false;
+    }
+
+    // Check for 0.8+ timestamps which have msec granularity
+    if (commitTimeline.containsOrBeforeTimelineStarts(commitTs)) {
+      return true;
+    }
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    if (commitTs.length() == HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && commitTs.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) {
+      final String actualOlderFormatTs = commitTs.substring(0, commitTs.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+      if (commitTimeline.containsOrBeforeTimelineStarts(actualOlderFormatTs)) {
+        return true;
+      }
+    }

Review Comment:
   Shouldm't we fix this method instead? `commitTimeline.containsOrBeforeTimelineStarts` and should we have a version number for the timeline?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1528692772

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 141f56805c474aca55f9952ca35e75726f4171cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #8605:
URL: https://github.com/apache/hudi/pull/8605#discussion_r1203435656


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -170,9 +171,35 @@ public static List<String> filterKeysFromFile(Path filePath, List<String> candid
     return foundRecordKeys;
   }
 
+  /**
+   * Check if the given commit timestamp is valid for the timeline.
+   *
+   * The commit timestamp is considered to be valid if:
+   *   1. the commit timestamp is present in the timeline, or
+   *   2. the commit timestamp is less than the first commit timestamp in the timeline
+   *
+   * @param commitTimeline  The timeline
+   * @param commitTs        The commit timestamp to check
+   * @return                true if the commit timestamp is valid for the timeline
+   */
   public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) {
-    // Check if the last commit ts for this row is 1) present in the timeline or
-    // 2) is less than the first commit ts in the timeline
-    return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs);
+    if (commitTimeline.empty()) {
+      return false;
+    }
+
+    // Check for 0.8+ timestamps which have msec granularity
+    if (commitTimeline.containsOrBeforeTimelineStarts(commitTs)) {
+      return true;
+    }
+
+    // Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations
+    if (commitTs.length() == HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && commitTs.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) {
+      final String actualOlderFormatTs = commitTs.substring(0, commitTs.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
+      if (commitTimeline.containsOrBeforeTimelineStarts(actualOlderFormatTs)) {
+        return true;
+      }
+    }

Review Comment:
   Moved to HoodieDefaultTimeline



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8605:
URL: https://github.com/apache/hudi/pull/8605#issuecomment-1569240213

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16749",
       "triggerID" : "141f56805c474aca55f9952ca35e75726f4171cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326",
       "triggerID" : "c3b02c63d14d0ff217ed697eef6daba701f9bb88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17494",
       "triggerID" : "4255b784b88cb29b346462f5ca8b49fd7091ab85",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c3b02c63d14d0ff217ed697eef6daba701f9bb88 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17326) 
   * 4255b784b88cb29b346462f5ca8b49fd7091ab85 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17494) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org