You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2019/10/01 09:29:00 UTC

[jira] [Created] (HDDS-2214) TestSCMContainerPlacementRackAware has an intermittent failure

Marton Elek created HDDS-2214:
---------------------------------

             Summary: TestSCMContainerPlacementRackAware has an intermittent failure
                 Key: HDDS-2214
                 URL: https://issues.apache.org/jira/browse/HDDS-2214
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
            Reporter: Marton Elek
            Assignee: Marton Elek


For example from the nightly build:
{code:java}
  <testcase name="testNoFallback[8]" classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware" time="0.014">
      
      
            <failure type="java.lang.AssertionError">java.lang.AssertionError
   
      
        	at org.junit.Assert.fail(Assert.java:86)
      
      
        	at org.junit.Assert.assertTrue(Assert.java:41)
      
      
        	at org.junit.Assert.assertTrue(Assert.java:52)
      
      
        	at org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
      
      
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      
      
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      
      
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      
      
        	at java.lang.reflect.Method.invoke(Method.java:498)
      
      
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 {code}
The problem is in the testNoFallback:

Let's say we have 11 nodes (from parameter) and we would like to choose 5 nodes (hard coded in the test).

As the first two replicas are chosen from the same rack an all the other from different racks it's not possible, so we except a failure.

But we have an assertion that the success count is at least 3. But this is true only if the first two replicas are placed to the rack1 (5 nodes) or rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail immediately:

 

Lucky case when we have success count > 3
{code:java}
 rack1 -- node1 
 rack1 -- node2 -- FIRST replica
 rack1 -- node3 -- SECOND replica
 rack1 -- node4
 rack1 -- node5 
 rack2 -- node6
 rack2 -- node7 -- THIRD replica
 rack2 -- node8
 rack2 -- node9 
 rack2 -- node10
 rack3 -- node11 -- FOURTH replica{code}
 The specific case when we have success count == 1, as we can't choose the second replica on rack3 (This is when the test is failing)
{code:java}
 rack1 -- node1 
 rack1 -- node2
 rack1 -- node3
 rack1 -- node4
 rack1 -- node5 
 rack2 -- node6
 rack2 -- node7
 rack2 -- node8
 rack2 -- node9 
 rack2 -- node10
 rack3 -- node11 -- FIRST replica{code}
 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org