You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Gour Saha (JIRA)" <ji...@apache.org> on 2014/08/13 09:55:11 UTC

[jira] [Created] (SLIDER-313) Slider AM fails to restart on kill post newly created application

Gour Saha created SLIDER-313:
--------------------------------

             Summary: Slider AM fails to restart on kill post newly created application
                 Key: SLIDER-313
                 URL: https://issues.apache.org/jira/browse/SLIDER-313
             Project: Slider
          Issue Type: Bug
          Components: appmaster, core
    Affects Versions: Slider 0.40
            Reporter: Gour Saha


Slider AM fails to restart on kill on a newly created application. Containers are assigned with high priority ids 1073741825 and 1073741826 on AM restart.

Steps to reproduce:

1. Create a new package
  slider create cl1 ... 

2. Kill AM

3. Wait for RM to create new AM

4. Check AM logs - will see NullPointerException like below
{noformat}
Exception: java.lang.NullPointerException
14/08/13 00:42:59 ERROR main.ServiceLauncher: Exception: java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.slider.providers.agent.AgentProviderService.rebuildContainerDetails(AgentProviderService.java:344)
	at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:722)
	at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:454)
	at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186)
	at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
	at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401)
	at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626)
	at org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1735)
14/08/13 00:42:59 INFO util.ExitUtil: Exiting with status 32
{noformat}

5. Go to RM logs and see the container assigned logs as below -
{noformat}
2014-08-13 02:04:36,991 INFO  capacity.LeafQueue (LeafQueue.java:assignContainer(1352)) - assignedContainer application attempt=appattempt_1407891977820_0005_000001 container=Container: [ContainerId: container_1407891977820_0005_01_000002, NodeId: c6409.ambari.apache.org:45454, NodeHttpAddress: c6409.ambari.apache.org:8042, Resource: <memory:256, vCores:1>, Priority: 1073741825, Token: null, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>, usedCapacity=0.375, absoluteUsedCapacity=0.375, numApps=1, numContainers=3 clusterResource=<memory:2048, vCores:8>
{noformat}

Check that Priority is assigned the value: {color:red}1073741825{color}

For HBase application, Slider AM expects priority to be either 1 or 2 for its agent containers. The ContainerPriority class defines the following variable which seems to be used for some locality feature - 
{noformat}
NOLOCATION = 1 << 30
{noformat}

On subsequent freeze and thaw and then AM kill this issue never occurs. 

Note: The NPE in AgentProviderService is fixed but not merged yet. However a fix to this issue is required for SLIDER-285 feature to work on AM kill post a newly created application. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)