You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/10/13 09:26:48 UTC
[GitHub] [hadoop-ozone] elek opened a new pull request #6: HDDS-2214. TestSCMContainerPlacementRackAware has an intermittent fai…
elek opened a new pull request #6: HDDS-2214. TestSCMContainerPlacementRackAware has an intermittent fai…
URL: https://github.com/apache/hadoop-ozone/pull/6
## What changes were proposed in this pull request?
Fixing an intermittent unit test.
## What is the problem
For example from the nightly build:
```
<testcase name="testNoFallback[8]" classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware" time="0.014">
<failure type="java.lang.AssertionError">java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
```
The problem is in the testNoFallback:
Let's say we have 11 nodes (from parameter) and we would like to choose 5 nodes (hard coded in the test).
As the first two replicas are chosen from the same rack an all the other from different racks it's not possible, so we except a failure.
But we have an assertion that the success count is at least 3. But this is true only if the first two replicas are placed to the rack1 (5 nodes) or rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail immediately:
Lucky case when we have success count > 3
```
rack1 -- node1
rack1 -- node2 -- FIRST replica
rack1 -- node3 -- SECOND replica
rack1 -- node4
rack1 -- node5
rack2 -- node6
rack2 -- node7 -- THIRD replica
rack2 -- node8
rack2 -- node9
rack2 -- node10
rack3 -- node11 -- FOURTH replica{code}
```
The specific case when we have success count == 1, as we can't choose the second replica on rack3 (This is when the test is failing)
```
rack1 -- node1
rack1 -- node2
rack1 -- node3
rack1 -- node4
rack1 -- node5
rack2 -- node6
rack2 -- node7
rack2 -- node8
rack2 -- node9
rack2 -- node10
rack3 -- node11 -- FIRST replica{code}
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2289
## How was this patch tested?
With Intellij you can execute the unit test multiple times (1000x) or until the next failure. Execute it with or without the patch. Usually I got the problem during the first 100 execution.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org