You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/29 00:16:02 UTC

[GitHub] [incubator-hudi] nbalajee opened a new pull request #1288: HUDI-117 Close file handle before throwing an exception due to append…

nbalajee opened a new pull request #1288: HUDI-117 Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288
 
 
   … failure.
   
   Add test cases to handle/verify stage failure scenarios.
   
   ## What is the purpose of the pull request
   * Fix for HDFS "LeaseRecoveryInProgress" exception (triggered by an append failure) that results in stage failures and eventually job getting killed.
   
   ## Brief change log
   * Add a shutdown hook to handle JVM crash or executor getting killed, during write.
   * Upon encountering an append failure, close the file handle, so that subsequent task attempts on a different executor would be able to acquire lease on the .log file.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   
     - Added test cases to verify that upon stage failures, data records present in multiple log files can be successfully handled/merged.
     
   ## Committer checklist
   
    - [ x] Has a corresponding JIRA in PR title & commit
    
    - [x ] Commit message is descriptive of the change
    
    - [x ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288#discussion_r372190260
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java
 ##########
 @@ -256,7 +280,22 @@ private void handleAppendExceptionOrRecoverLease(Path path, RemoteException e)
         throw new HoodieException(e);
       }
     } else {
-      throw new HoodieIOException("Failed to open an append stream ", e);
+      // When fs.append() has failed and an exception is thrown, by closing the output stream
+      // we shall force hdfs to release the lease on the log file. When Spark driver retries this task (with
 
 Review comment:
   Instead of Spark driver, lets just say when "spark retries.."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on issue #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288#issuecomment-579547411
 
 
   @nbalajee It seems like the build is failing, can you check ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash merged pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
n3nash merged pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288#discussion_r372190443
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java
 ##########
 @@ -256,7 +280,22 @@ private void handleAppendExceptionOrRecoverLease(Path path, RemoteException e)
         throw new HoodieException(e);
       }
     } else {
-      throw new HoodieIOException("Failed to open an append stream ", e);
+      // When fs.append() has failed and an exception is thrown, by closing the output stream
+      // we shall force hdfs to release the lease on the log file. When Spark driver retries this task (with
+      // new attemptId, say taskId.1) it will be able to acquire lease on the log file (as output stream was
+      // closed properly by taskId.0).
+      //
+      // If close() call were to fail throwing an exception, our best bet is to rollover to a new log file.
+      try {
+        close();
+        // output stream has been successfully closed and lease on the log file has been released.
+        throw new HoodieIOException("Failed to append to the output stream ", e);
 
 Review comment:
   Can you add a comment here why we are throwing an exception here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288#discussion_r372190536
 
 

 ##########
 File path: hudi-common/src/test/java/org/apache/hudi/common/table/log/TestHoodieLogFormat.java
 ##########
 @@ -1113,6 +1113,99 @@ public void testAvroLogRecordReaderWithMixedInsertsCorruptsAndRollback()
     assertEquals("We would read 0 records", 0, scanner.getTotalLogRecords());
   }
 
+  /*
+   * During a spark stage failure, when the stage is retried, tasks that are part of the previous attempt
+   * of the stage would continue to run.  As a result two different tasks could be performing the same operation.
+   * When trying to update the log file, only one of the tasks would succeed (one holding lease on the log file).
+   *
+   * In order to make progress in this scenario, second task attempting to update the log file would rollover to
+   * a new version of the log file.  As a result, we might end up with two log files with same set of data records
+   * present in both of them.
+   *
+   * Following uint tests mimic this scenario to ensure that the reader can handle merging multiple log files with
+   * duplicate data.
+   *
+   */
+  private void testAvroLogRecordReaderMergingMultipleLogFiles(int numRecordsInLog1, int numRecordsInLog2)
+      throws IOException, URISyntaxException, InterruptedException {
+    try {
+      // Write one Data bloc with same InstantTime (written in same batch)
 
 Review comment:
   nit : s/bloc/block

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nbalajee commented on issue #1288: [HUDI-117] Close file handle before throwing an exception due to append…

Posted by GitBox <gi...@apache.org>.
nbalajee commented on issue #1288: [HUDI-117] Close file handle before throwing an exception due to append…
URL: https://github.com/apache/incubator-hudi/pull/1288#issuecomment-579556806
 
 
   > @nbalajee It seems like the build is failing, can you check ?
   
   :: problems summary ::
   :::: WARNINGS
   		module not found: org.apache.spark#spark-avro_2.11;2.4.4

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services