You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Gour Saha (JIRA)" <ji...@apache.org> on 2015/06/16 04:24:00 UTC

[jira] [Created] (SLIDER-905) Container request fails when Slider requests container with node label and host constraints

Gour Saha created SLIDER-905:
--------------------------------

             Summary: Container request fails when Slider requests container with node label and host constraints
                 Key: SLIDER-905
                 URL: https://issues.apache.org/jira/browse/SLIDER-905
             Project: Slider
          Issue Type: Bug
          Components: appmaster, core
    Affects Versions: Slider 0.80
            Reporter: Gour Saha


This cluster had node labels defined and 8 hosts were labelled with regionserver_label and 1 host labelled with master_label. HBase app was created with 1 master and 8 regionservers and resource spec was set in a way such that only 1 regionserver would come up in 1 host. So in its final running state, 8 regionservers were running in 8 different nodes and the master in its own node.

At this point, one of the regionserver container failed. Slider made a request to RM for a replacement container, this time with node label and host constraint (the host where the previous container failed). RM fulfilled the container request, but Slider failed with the following exception -

{code}
2015-06-15 15:51:05,674 [AmExecutor-006] INFO  util.RackResolver - Resolved cn072.ambari.apache.org to /default-rack
2015-06-15 15:51:05,677 [AmExecutor-006] ERROR actions.QueueExecutor - Exception processing org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize@bd73e28 name='onContainersCompleted', delay=0, attrs=4, sequenceNumber=33}: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
        at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
        at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
        at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
        at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
        at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
        at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
        at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
        at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
        at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2015-06-15 15:51:05,680 [AmExecutor-006] ERROR appmaster.SliderAppMaster - Exception in AmExecutor-006: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
        at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
        at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
        at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
        at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
        at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
        at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
        at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
        at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
        at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2015-06-15 15:56:38,828 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (15068)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2015-06-15 15:56:39,830 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (16070)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)