You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2015/06/16 14:00:05 UTC

[jira] [Commented] (SLIDER-905) Container request fails when Slider requests container with node label and host constraints

    [ https://issues.apache.org/jira/browse/SLIDER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587928#comment-14587928 ] 

Steve Loughran commented on SLIDER-905:
---------------------------------------

OK, looks like it is forbidden to specify a one more hostnames/rack names and a label expression. I'm copying the check there into {{OutstandingRequest}} so we can test it.

# For an original placed request, we just ignore the label, caching it for reuse.
# for escalation, our current policy is "send a relaxed request to YARN", but still include the node list. We're going to have to change that with a new policy

* when escalating requests with a label expression, do not include the original nodes —simply supply the label expression.

> Container request fails when Slider requests container with node label and host constraints
> -------------------------------------------------------------------------------------------
>
>                 Key: SLIDER-905
>                 URL: https://issues.apache.org/jira/browse/SLIDER-905
>             Project: Slider
>          Issue Type: Bug
>          Components: appmaster, core
>    Affects Versions: Slider 0.80
>            Reporter: Gour Saha
>            Assignee: Steve Loughran
>
> This cluster had node labels defined and 8 hosts were labelled with regionserver_label and 1 host labelled with master_label. HBase app was created with 1 master and 8 regionservers and resource spec was set in a way such that only 1 regionserver would come up in 1 host. So in its final running state, 8 regionservers were running in 8 different nodes and the master in its own node.
> At this point, one of the regionserver container failed. Slider made a request to RM for a replacement container, this time with node label and host constraint (the host where the previous container failed). RM fulfilled the container request, but Slider failed with the following exception -
> {code}
> 2015-06-15 15:51:05,674 [AmExecutor-006] INFO  util.RackResolver - Resolved cn072.ambari.apache.org to /default-rack
> 2015-06-15 15:51:05,677 [AmExecutor-006] ERROR actions.QueueExecutor - Exception processing org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize@bd73e28 name='onContainersCompleted', delay=0, attrs=4, sequenceNumber=33}: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
> org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
>         at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
>         at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
>         at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
>         at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
>         at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
>         at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
>         at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
>         at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
>         at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
>         at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
>         at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-06-15 15:51:05,680 [AmExecutor-006] ERROR appmaster.SliderAppMaster - Exception in AmExecutor-006: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
> org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
>         at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
>         at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
>         at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
>         at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
>         at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
>         at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
>         at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
>         at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
>         at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
>         at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
>         at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-06-15 15:56:38,828 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (15068)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
>         at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-06-15 15:56:39,830 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (16070)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>         at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
>         at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)