You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/04 12:22:04 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

nsivabalan opened a new pull request #1912:
URL: https://github.com/apache/hudi/pull/1912


   ## What is the purpose of the pull request
   
   Introducing a TimedWaitOnAppearConsistencyGuard for eventual consistent stores. This will sleep for configured period of time only on APPEAR. It is a no-op for DISAPPEAR. This is specifically for eventually consistent stores like S3A filesystem and here is the rational.
   This guard is used when deleting data files corresponding to marker files that needs to be deleted.
   There are two tricky cases that needs to be considered. Case 1 : A data file creation is eventually consistent and hence
    when issuing deletes, the file may not be found. Case 2: a data file was never created in the first place as the process crashed.
    In S3A, GET and LIST are eventually consistent, and delete() implementation internally does a LIST/EXISTS.
    Prior to this patch, hudi was leveraging FailSafeConsistencyGuard which was doing the following to delete data files.
    Step1: wait for all files to appear with linear backoff timer w/ a max timer.
    Step2: issue deletes
    Step3: wait for all files to disappear.
    Step1 and Step2 is handled by {@link FailSafeConsistencyGuard}.
   
    We are simplifying these steps with TimedWaitOnAppearConsistencyGaurd as below.
    Step1: Sleep for a configured threshold.
    Step2: issue deletes.
   
    With this, if any files that was created, should be available within configured threshold(eventual consistency).
    Delete() will return false if FileNotFound. So, both cases are taken care of this {@link ConsistencyGuard}.
   Step3 is not required since if FileIsNotFound, delete() would have returned false and hence we ignore the return values. But if file exists and if file could not be deleted, some exception would be thrown. 
   
   ## Brief change log
   
     - *Added TimedWaitOnAppearConsistencyGuard to delete a bunch of files in eventually consistent cloud stores*
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *Added tests to TestConsistencyGuard to verify the change.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar merged pull request #1912: [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite instead of FailSafeConsistencyGuard

Posted by GitBox <gi...@apache.org>.
bvaradar merged pull request #1912:
URL: https://github.com/apache/hudi/pull/1912


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #1912: [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite instead of FailSafeConsistencyGuard

Posted by GitBox <gi...@apache.org>.
bvaradar commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-671105923


   @nsivabalan : Please fix the checkstyle issues when you get a chance. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
bvaradar commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-669662462


   @umehrot2 : I will be reviewing this PR. In the interest of testing time, can you in parallel test and confirm if this would be effective in solving commit abort issue you are  facing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#discussion_r465142508



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/TimedWaitOnAppearConsistencyGaurd.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.common.util.ValidationUtils;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A consistency guard which sleeps for configured period of time only on APPEAR. It is a no-op for DISAPPEAR.
+ * This is specifically for S3A filesystem and here is the rational.
+ * This guard is used when deleting data files corresponding to marker files that needs to be deleted.
+ * There are two tricky cases that needs to be considered. Case 1 : A data file creation is eventually consistent and hence
+ * when issuing deletes, it may not be found. Case 2: a data file was never created in the first place since the process crashed.
+ * In S3A, GET and LIST are eventually consistent, and delete() implementation internally does a LIST/EXISTS.
+ * Prior to this patch, hudi was leveraging {@link FailSafeConsistencyGuard} which was doing the following to delete data files.
+ * Step1: wait for all files to appear with linear backoff.
+ * Step2: issue deletes
+ * Step3: wait for all files to disappear with linear backoff.
+ * Step1 and Step2 is handled by {@link FailSafeConsistencyGuard}.
+ *
+ * We are simplifying these steps with {@link TimedWaitOnAppearConsistencyGaurd}.
+ * Step1: Sleep for a configured threshold.
+ * Step2: issue deletes.
+ *
+ * With this, if any files that was created, should be available within configured threshold(eventual consistency).
+ * Delete() will return false if FileNotFound. So, both cases are taken care of this {@link ConsistencyGuard}.
+ */
+public class TimedWaitOnAppearConsistencyGaurd implements ConsistencyGuard {
+
+  private static final Logger LOG = LogManager.getLogger(TimedWaitOnAppearConsistencyGaurd.class);
+
+  private final ConsistencyGuardConfig consistencyGuardConfig;
+
+  public TimedWaitOnAppearConsistencyGaurd(FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) {
+    this.consistencyGuardConfig = consistencyGuardConfig;
+    ValidationUtils.checkArgument(consistencyGuardConfig.isConsistencyCheckEnabled());
+  }
+
+  @Override
+  public void waitTillFileAppears(Path filePath) throws IOException, TimeoutException {
+    try {
+      Thread.sleep(consistencyGuardConfig.getInitialConsistencyCheckIntervalMs());

Review comment:
       I am repurposing "hoodie.consistency.check.initial_interval_ms" for this sleep time. Let me know if we need to introduce a new config for this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#discussion_r465039422



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -505,16 +507,26 @@ private boolean waitForCondition(String partitionPath, Stream<Pair<String, Strin
     final FileSystem fileSystem = metaClient.getRawFs();
     List<String> fileList = partitionFilePaths.map(Pair::getValue).collect(Collectors.toList());
     try {
-      getFailSafeConsistencyGuard(fileSystem).waitTill(partitionPath, fileList, visibility);
+      getConsistencyGuard(fileSystem, config.getConsistencyGuardConfig()).waitTill(partitionPath, fileList, visibility);
     } catch (IOException | TimeoutException ioe) {
       LOG.error("Got exception while waiting for files to show up", ioe);
       return false;
     }
     return true;
   }
 
-  private ConsistencyGuard getFailSafeConsistencyGuard(FileSystem fileSystem) {
-    return new FailSafeConsistencyGuard(fileSystem, config.getConsistencyGuardConfig());
+  /**
+   * Instantiate {@link ConsistencyGuard} via reflection, passing in the required args.
+   * <p>
+   * Default consistencyGuard class is {@link FailSafeConsistencyGuard}.
+   */
+  public static ConsistencyGuard getConsistencyGuard(FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) throws IOException {

Review comment:
       I see FSUtils also instantiates FailSafeConsistencyGuard for WrapperFileSystem. Do we need similar reflection based instantiation fix there too?
   https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
   Check lines 529 to 535. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-670479909


   @bvaradar : since you suggested to have the TimedWaitOnAppearCG as default opt in, I would suggest to introduce a new config for the sleep time. so that we can set it to 2 or 3 secs as default. The existing config which we are repurposing, has a default value of 400 ms and users may not realize to fix this value since this is going to be default opt in. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar removed a comment on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
vinothchandar removed a comment on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-668853480


   @umehrot2 can you please review this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on a change in pull request #1912: [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite instead of FailSafeConsistencyGuard

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#discussion_r467631868



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuardConfig.java
##########
@@ -36,15 +36,15 @@
   // time between successive attempts to ensure written data's metadata is consistent on storage
   private static final String INITIAL_CONSISTENCY_CHECK_INTERVAL_MS_PROP =
       "hoodie.consistency.check.initial_interval_ms";
-  private static long DEFAULT_INITIAL_CONSISTENCY_CHECK_INTERVAL_MS = 2000L;
+  private static long DEFAULT_INITIAL_CONSISTENCY_CHECK_INTERVAL_MS = 400L;

Review comment:
       Sounds good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-668708393


   @bvaradar you are most familiar with this. can you please review this? 
   
   @umehrot2 let's see if we agree on the principles here


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#discussion_r465008616



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuardConfig.java
##########
@@ -36,15 +36,15 @@
   // time between successive attempts to ensure written data's metadata is consistent on storage
   private static final String INITIAL_CONSISTENCY_CHECK_INTERVAL_MS_PROP =
       "hoodie.consistency.check.initial_interval_ms";
-  private static long DEFAULT_INITIAL_CONSISTENCY_CHECK_INTERVAL_MS = 2000L;
+  private static long DEFAULT_INITIAL_CONSISTENCY_CHECK_INTERVAL_MS = 400L;

Review comment:
       I have also fixed some default configs. If incase someone is using FailSafeConsistencyGuard, in case of data file not present, this guard will execute for 256 secs which may not be required. Hence curtailing to 140 secs max. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-670480225


   @umehrot2 : Would appreciate if you agree on the approach here. Before I go ahead and address feedback want to have consensus. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-670295868


   Synced up with @bvaradar  on the diff. Here are some changes/conclusions we narrowed down after our discussion. 
   - We felt exposing TimedWaitOnAppearCG to external users may not make much sense. We do not want to make TimedWaitOnAppear as the global consistency guard in all places, but it is very specific to finalizeWrite code path. 
   - And so, instead of defining a config for consistencyGuardClass, we will introduce a config in ConsistencyGuard (for lack of better word, lets say "simplifyFailSafeConsistencyToTimedSleep/simplifyFailSafeConsistencyToTimedWait" for now). Will enable this config by default in which case TimedWaitOnAppearCG will be used in finalize Write. If users wish to use FailSafeCG even in finalizeWrite code paths, they can disable this config value on which FailSafeCG will be used. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite instead of FailSafeConsistencyGuard

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-670927465


   @bvaradar : patch is good to be reviewed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1912:
URL: https://github.com/apache/hudi/pull/1912#issuecomment-668853480


   @umehrot2 can you please review this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org