You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/11 17:05:00 UTC

[GitHub] [hudi] kasured opened a new issue, #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

kasured opened a new issue, #5843:
URL: https://github.com/apache/hudi/issues/5843

   **Describe the problem you faced**
   
   Under certain circumstances Hoodies can create a commit on a timeline far in the future (days and years). 
   
   **To Reproduce**
   
   At the moment there is no clear way to reproduce the issue. It was discovered with the same symptoms as described in https://github.com/apache/hudi/issues/2533 when Hoodies denied to rollback the commit because there is another commit that it has found in the future. 
   
   **Expected behavior**
   
   Hoodie does not create successful commits far in the future
   
   **Environment Description**
   
   * EMR version: 6.5.0
   
   * Hudi version : 0.9.0-amzn-1	
   
   * Spark version : 3.1.2
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
   * We have a single Spark Application with multiple Structured Streaming queries running at the same time as described in https://github.com/apache/hudi/issues/5298
   * We are using COW with a single writer model and we have `hoodie.cleaner.policy.failed.writes=true` by default
   * It is expected that Spark Driver can be restarted by YARN so if there is an inflight commit that has not been completed it will be rolled-back. In method `BaseRollbackActionExecutor#validateRollbackCommitSequence` there is a logic that prevents the roll back because the future commit has been found on the timeline. Which is also understandable 
   ```
    private void validateRollbackCommitSequence() {
       // Continue to provide the same behavior if policy is EAGER (similar to pendingRollback logic). This is required
       // since with LAZY rollback we support parallel writing which can allow a new inflight while rollback is ongoing
       // Remove this once we support LAZY rollback of failed writes by default as parallel writing becomes the default
       // writer mode.
       if (config.getFailedWritesCleanPolicy().isEager()) {
   ```
   * After looking into the s3 location we were able to see the following 
   ```
   2022-06-01 08:43:16      12147 20220601084314.clean.inflight
   2022-06-01 08:43:16      12147 20220601084314.clean.requested
   2022-06-01 08:43:17      10731 20220601084314.clean
   
   2022-06-01 09:00:54          0 20220629090052.commit.requested
   2022-06-01 09:05:50      54609 20220629090052.inflight
   2022-06-01 09:08:26      68699 20220629090052.commit
   
   2022-06-01 09:55:09          0 20220601095507.commit.requested
   2022-06-01 10:04:21      56777 20220601095507.inflight
   2022-06-01 10:13:10      68725 20220601095507.commit
   
   2022-06-01 12:00:46          0 20220601120045.commit.requested
   ```
   * As can be seen above Hoodie were expectedly trying to roll back the non-completed commit 20220601120045 but it found the completed commit  20220629090052 which is more that 28 days in the future as per Hoodie instant time
   * We have checked another locations and it looks like there are even commits had been created and completed in 2023 
   ```
   2022-06-11 16:05:28       2337 20220611160525.clean.requested
   2022-06-11 16:33:15       8789 20220611163013.commit
   2022-06-11 16:30:14          0 20220611163013.commit.requested
   2022-06-11 16:30:22       3928 20220611163013.inflight
   
   2022-05-26 08:30:53       1746 20231226083050.clean
   2022-05-26 08:30:52       1752 20231226083050.clean.inflight
   2022-05-26 08:30:52       1752 20231226083050.clean.requested
   
   ```
   * The logic of creating the instant time is in the `HoodieActiveTimeline#createNewInstantTime`
   ```
   private static AtomicReference<String> lastInstantTime = new AtomicReference<>(String.valueOf(Integer.MIN_VALUE));
   /**
    * Returns next instant time that adds N milliseconds in the {@link #COMMIT_FORMATTER} format.
    * Ensures each instant time is atleast 1 second apart since we create instant times at second granularity
    */
   public static String createNewInstantTime(long milliseconds) {
     return lastInstantTime.updateAndGet((oldVal) -> {
       String newCommitTime;
       do {
         newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date(System.currentTimeMillis() + milliseconds));
       } while (HoodieTimeline.compareTimestamps(newCommitTime, LESSER_THAN_OR_EQUALS, oldVal));
       return newCommitTime;
     });
   }
   ```
   * I am planning to reproduce this issue but it looks like Hoodies might get in the state when optimistic locking with CAS approach causes multiple threads to update the value until that condition disappears and the commit can be created and more importantly flushed
   * This might cause unexpected querying results when doing incremental querying.
   
   **Stacktrace**
   
   ```Caused by: org.apache.hudi.exception.HoodieRollbackException: Found commits after time :20220601120045, please rollback greater commits first```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kasured commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1161409868

   Hello @nsivabalan Unfortunatelly we do not have easy way to backport and deploy it. We are working with AWS team to create a patched version of the 0.9.0 release, but It looks like the potential impact of the issues might be really concerning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kasured commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1289941384

   @nsivabalan Sorry for the delay. Ever since we upgraded to 0.11.0 we have not seen this so far. Again the main root cause is the concurrency issue caused by using SimpleDateFormat. Not sure if 0.9.0 can be even used without a backport


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1239171918

   @kasured Did you hear back from AWS wrt the patched version? Would it be easier to upgrade to latest EMR 6.x version which has Hudi 0.11.1 with this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kasured commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1163331800

   @nsivabalan to-day we saw a case when it is not only future commit that has been created but also the commit with the non valid date 
   
   2022-06-16 02:31:03       2443 20220631003058.clean
   2022-06-16 02:31:01       2417 20220631003058.clean.inflight
   2022-06-16 02:31:00       2417 20220631003058.clean.requested
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154523375

   you might be right. We did get reports from some users and hence the fix. Unfortunately, currently we don't support backporting fixes yet. Can you patch it up on your end? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1289933810

   @kasured : gentle ping. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope closed issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope closed issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline
URL: https://github.com/apache/hudi/issues/5843


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kasured commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154342855

   @nsivabalan I have been able to reproduce the behaviour locally with hoodie 0.9.0 Please check the repository here https://github.com/kasured/hudi-timeline-5843. When you run the code as instructed in branch `main` you will be able to see some commits that are created in the future. I was able to get commits 1 hour in the future. But I would say that might depend on multiple factors and the difference might be as displayed in the description of the issues.
   
   ```
   ##################################################################################################
   Number of generated(attempted) instants            : 107
   Number of commits in the future                    : 9
   Number of ArrayIndexOutOfBoundsException exceptions: 0
   ##################################################################################################
   ```
   
   I have not been able to reproduce another which I believe relating to the same root cause issue `ArrayIndexOutOfBoundsException` that we from time to time get on the EMR cluster  
   
   ```
    message: Micro batch id=1297 threw following exception: 
   java.lang.ArrayIndexOutOfBoundsException: -1
   	at java.util.Calendar.getDisplayName(Calendar.java:2114)
   	at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
   	at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
   	at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
   	at java.text.DateFormat.format(DateFormat.java:345)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.lambda$createNewInstantTime$0(HoodieActiveTimeline.java:92)
   ```
   
   In that repository that I created If you switch to branch `hoodie/0.10.0` though and run the code as described you will see significant difference in the number of generated instants and also I was not able to reproduce that future commit issue. So I wonder if https://github.com/apache/hudi/pull/4073 can be back-ported to 0.9.0. Without that code this version is susceptible to subtle issue and data corruptions
   
   ```
   ##################################################################################################
   Number of generated(attempted) instants            : 28722
   Number of commits in the future                    : 0
   Number of ArrayIndexOutOfBoundsException exceptions: 0
   ##################################################################################################
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154016807

   actually there was a bug wrt rollback validation. we did fix it with 0.11 https://github.com/apache/hudi/pull/5157
   There could be another bug as well which we fixed it along similar lines with 0.11. I can try to get you the actual patch if you are interested. Can you try with 0.11 and let us know if you still see the issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154017361

   guess this is the other fix https://github.com/apache/hudi/pull/2863
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kasured commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by GitBox <gi...@apache.org>.
kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154047895

   We are not using clustering, so I am not sure if these far-in-the-future commits are caused by that. We have both inline and async clustering disabled "hoodie.clustering.inline" = "false" and "hoodie.clustering.async.enabled" = "false"
   
   Again, we are trying to understand how this commits happen to appear in the timeline. I might be thinking of the impact of `HoodieActiveTimeline` using not thread-safe `java.text.SimpleDateFormat` when generating the timeline instant
   
   I am seeing that 0.10.0 has been reworked to remove that issue in https://issues.apache.org/jira/browse/HUDI-2831 and https://github.com/apache/hudi/pull/4073 But we are on the version 0.9.0 which might be very much susceptible to this subtle threat. Even in the single writer scenario it looks like that COMMIT_FORMATTER can be accessed by multiple threads. In our scenario we are creating multiple Streaming Queries in a single spark application which can amplify such concurrency issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #5843: [SUPPORT] Hoodie can request and complete commits far in the future on its timeline

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1537986491

   @kasured Closing this issue as the fix worked and issue not present in higher version. Please reopen in case of any concerns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org