You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/08/12 17:42:14 UTC

[GitHub] [hadoop] adoroszlai opened a new pull request #1282: HDDS-1908. TestMultiBlockWritesWithDnFailures is failing

adoroszlai opened a new pull request #1282: HDDS-1908. TestMultiBlockWritesWithDnFailures is failing
URL: https://github.com/apache/hadoop/pull/1282
 
 
   ## What changes were proposed in this pull request?
   
   Multi-block writes tests are failing most of the time because Ratis leader election timeout is about the same length as the client retry timeout (5 times 1 second).  This frequently caused an entire pipeline to be excluded (by `KeyOutputStream.handleException`) just because client gives up before leader is elected.  There are only 6 nodes in TestMultiBlockWritesWithDnFailures test, 2 of which is shut down as part of the test.  Thus, if this happens, subsequent write fails because new block cannot be allocated.
   
   This change decreases leader election timeout and increases client retries.  It is basically an extension of [HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780).
   
   Additional changes:
   
    * move `testMultiBlockWritesWithIntermittentDnFailures` to `TestMultiBlockWritesWithDnFailures`
    * remove unused `maxRetries` member
    * call cluster `shutdown()` regardless of test success/failure (see also [HDDS-1949](https://issues.apache.org/jira/browse/HDDS-1949))
   
   https://issues.apache.org/jira/browse/HDDS-1908
   
   ## How was this patch tested?
   
   Ran both test classes 10+ times, without any intermittent failure.
   
   ```
   [INFO] Running org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
   [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 157.086 s - in org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
   [INFO] Running org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.308 s - in org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org