You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ac...@apache.org on 2013/07/30 14:56:09 UTC

svn commit: r1508428 [2/2] - /hadoop/common/branches/branch-2.1.0-beta/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html

Modified: hadoop/common/branches/branch-2.1.0-beta/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-2.1.0-beta/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html?rev=1508428&r1=1508427&r2=1508428&view=diff
==============================================================================
--- hadoop/common/branches/branch-2.1.0-beta/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html (original)
+++ hadoop/common/branches/branch-2.1.0-beta/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html Tue Jul 30 12:56:09 2013
@@ -12,57 +12,179 @@ These release notes include new develope
 <a name="changes"/>
 <h2>Changes since Hadoop 2.0.5-alpha</h2>
 <ul>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-968">YARN-968</a>.
+     Blocker bug reported by Kihwal Lee and fixed by Vinod Kumar Vavilapalli <br>
+     <b>RM admin commands don't work</b><br>
+     <blockquote>If an RM admin command is issued using CLI, I get something like following:
+
+13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234
+refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB
+
+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-961">YARN-961</a>.
+     Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
+     <b>ContainerManagerImpl should enforce token on server. Today it is [TOKEN, SIMPLE]</b><br>
+     <blockquote>We should only accept SecurityAuthMethod.TOKEN for ContainerManagementProtocol. Today it also accepts SIMPLE for unsecured environment.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-960">YARN-960</a>.
+     Blocker bug reported by Alejandro Abdelnur and fixed by Daryn Sharp <br>
+     <b>TestMRCredentials and  TestBinaryTokenFile are failing on trunk</b><br>
+     <blockquote>Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
+
+Making it a blocker until full impact of the issue is scoped.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-945">YARN-945</a>.
+     Blocker bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli <br>
+     <b>AM register failing after AMRMToken</b><br>
+     <blockquote>509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1       threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN]]
+510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN]
+511   at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531)
+512   at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482)
+513   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788)
+514   at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587)
+515   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)
+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-937">YARN-937</a>.
+     Blocker bug reported by Arun C Murthy and fixed by Alejandro Abdelnur <br>
+     <b>Fix unmanaged AM in non-secure/secure setup post YARN-701</b><br>
+     <blockquote>Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-932">YARN-932</a>.
+     Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
+     <b>TestResourceLocalizationService.testLocalizationInit can fail on JDK7</b><br>
+     <blockquote>It looks like this is occurring when testLocalizationInit doesn't run first.  Somehow yarn.nodemanager.log-dirs is getting set by one of the other tests (to ${yarn.log.dir}/userlogs), but yarn.log.dir isn't being set.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-927">YARN-927</a>.
+     Major task reported by Bikas Saha and fixed by Bikas Saha <br>
+     <b>Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest</b><br>
+     <blockquote>The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest().</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-926">YARN-926</a>.
+     Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
+     <b>ContainerManagerProtcol APIs should take in requests for multiple containers</b><br>
+     <blockquote>AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses.
+
+The client libraries could expose both the single and multi-container requests.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-922">YARN-922</a>.
+     Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
+     <b>Change FileSystemRMStateStore to use directories</b><br>
+     <blockquote>Store each app and its attempts in the same directory so that removing application state is only one operation</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-919">YARN-919</a>.
+     Minor bug reported by Mayank Bansal and fixed by Mayank Bansal <br>
+     <b>Document setting default heap sizes in yarn env</b><br>
+     <blockquote>Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script.
+
+There is no straight forward way to change it in script. Just updating the variables with defaults.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-918">YARN-918</a>.
+     Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701</b><br>
+     <blockquote>Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-912">YARN-912</a>.
+     Major bug reported by Bikas Saha and fixed by Mayank Bansal <br>
+     <b>Create exceptions package in common/api for yarn and move client facing exceptions to them</b><br>
+     <blockquote>Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-909">YARN-909</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
+     <b>Disable TestLinuxContainerExecutorWithMocks on Windows</b><br>
+     <blockquote>This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-897">YARN-897</a>.
+     Blocker bug reported by Djellel Eddine Difallah and fixed by Djellel Eddine Difallah (capacityscheduler)<br>
+     <b>CapacityScheduler wrongly sorted queues</b><br>
+     <blockquote>The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.
+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-894">YARN-894</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
+     <b>NodeHealthScriptRunner timeout checking is inaccurate on Windows</b><br>
+     <blockquote>In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution.
+
+Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout.
+
+We have following execution sequence in Shell:
+1) In main thread, schedule a delayed timer task that will kill the original process upon timeout.
+2) In main thread, open a buffered reader and feed in the process's standard input stream.
+3) When timeout happens, the timer task will call {{Process#destroy()}}
+ to kill the main process.
+
+On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread.
+
+On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this.
+
+
+ </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-883">YARN-883</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Expose Fair Scheduler-specific queue metrics</b><br>
+     <blockquote>When the Fair Scheduler is enabled, QueueMetrics should include fair share, minimum share, and maximum share.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-877">YARN-877</a>.
+     Major sub-task reported by Junping Du and fixed by Junping Du (scheduler)<br>
+     <b>Allow for black-listing resources in FifoScheduler</b><br>
+     <blockquote>YARN-750 already addressed black-list staff in YARN API and CS scheduler, this jira add implementation for FifoScheduler.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-875">YARN-875</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>Application can hang if AMRMClientAsync callback thread has exception</b><br>
+     <blockquote>Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError().</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-874">YARN-874</a>.
      Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>Tracking YARN/MR test failures after HADOOP-9421 and YARN-827</b><br>
      <blockquote>HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those..</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-873">YARN-873</a>.
+     Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>YARNClient.getApplicationReport(unknownAppId) returns a null report</b><br>
+     <blockquote>How can the client find out that app does not exist?</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-869">YARN-869</a>.
      Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>ResourceManagerAdministrationProtocol should neither be public(yet) nor in yarn.api</b><br>
      <blockquote>This is a admin only api that we don't know yet if people can or should write new tools against. I am going to move it to yarn.server.api and make it @Private..</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-866">YARN-866</a>.
+     Major test reported by Wei Yan and fixed by Wei Yan <br>
+     <b>Add test for class ResourceWeights</b><br>
+     <blockquote>Add test case for the class ResourceWeights</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-865">YARN-865</a>.
+     Major improvement reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>RM webservices can't query based on application Types</b><br>
+     <blockquote>The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-861">YARN-861</a>.
      Critical bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (nodemanager)<br>
      <b>TestContainerManager is failing</b><br>
-     <blockquote>https://builds.apache.org/job/Hadoop-Yarn-trunk/246/
-
-{code:xml}
-Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
-Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec &lt;&lt;&lt; FAILURE!
-testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager)  Time elapsed: 286 sec  &lt;&lt;&lt; FAILURE!
-junit.framework.ComparisonFailure: expected:&lt;[asf009.sp2.ygridcore.ne]t&gt; but was:&lt;[localhos]t&gt;
-	at junit.framework.Assert.assertEquals(Assert.java:85)
-
+     <blockquote>https://builds.apache.org/job/Hadoop-Yarn-trunk/246/
+
+{code:xml}
+Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
+Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec &lt;&lt;&lt; FAILURE!
+testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager)  Time elapsed: 286 sec  &lt;&lt;&lt; FAILURE!
+junit.framework.ComparisonFailure: expected:&lt;[asf009.sp2.ygridcore.ne]t&gt; but was:&lt;[localhos]t&gt;
+	at junit.framework.Assert.assertEquals(Assert.java:85)
+
 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-854">YARN-854</a>.
      Blocker bug reported by Ramya Sunil and fixed by Omkar Vinit Joshi <br>
      <b>App submission fails on secure deploy</b><br>
-     <blockquote>App submission on secure cluster fails with the following exception:
-
-{noformat}
-INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with  exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
-main : user is qa_user
-javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
-	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
-	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
-	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
-	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
-	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
-	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
-Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
-	at org.apache.hadoop.ipc.Client.call(Client.java:1298)
-	at org.apache.hadoop.ipc.Client.call(Client.java:1250)
-	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
-	at $Proxy7.heartbeat(Unknown Source)
-	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
-	... 3 more
-
-.Failing this attempt.. Failing the application.
-
+     <blockquote>App submission on secure cluster fails with the following exception:
+
+{noformat}
+INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with  exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
+main : user is qa_user
+javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
+	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
+	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
+	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
+	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
+	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
+	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
+	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
+Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
+	at org.apache.hadoop.ipc.Client.call(Client.java:1298)
+	at org.apache.hadoop.ipc.Client.call(Client.java:1250)
+	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
+	at $Proxy7.heartbeat(Unknown Source)
+	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
+	... 3 more
+
+.Failing this attempt.. Failing the application.
+
 {noformat}</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-853">YARN-853</a>.
+     Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)<br>
+     <b>maximum-am-resource-percent doesn't work after refreshQueues command</b><br>
+     <blockquote>If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.&lt;queue-path&gt;.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing  'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-852">YARN-852</a>.
      Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>TestAggregatedLogFormat.testContainerLogsFileAccess fails on Windows</b><br>
@@ -78,13 +200,59 @@ Caused by: org.apache.hadoop.ipc.RemoteE
 <li> <a href="https://issues.apache.org/jira/browse/YARN-848">YARN-848</a>.
      Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
      <b>Nodemanager does not register with RM using the fully qualified hostname</b><br>
-     <blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. 
-
+     <blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. 
+
 Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-846">YARN-846</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Move pb Impl from yarn-api to yarn-common</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-845">YARN-845</a>.
+     Major sub-task reported by Arpit Gupta and fixed by Mayank Bansal (resourcemanager)<br>
+     <b>RM crash with NPE on NODE_UPDATE</b><br>
+     <blockquote>the following stack trace is generated in rm
+
+{code}
+n, service: 68.142.246.147:45454 }, ] resource=&lt;memory:1536, vCores:1&gt; queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=&lt;memory:44544, vCores:29&gt;usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;
+2013-06-17 12:43:53,655 INFO  capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;
+2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED
+2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454
+2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved  on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation &lt;memory:6144, vCores:4&gt;
+2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate...
+2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler
+java.lang.NullPointerException
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
+        at java.lang.Thread.run(Thread.java:662)
+2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye..
+2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088
+2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
+2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
+2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
+2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
+2013-06-17 12:43:53,768 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
+2013-06-17 12:43:53,768 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8033
+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033
+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8032
+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032
+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8030
+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030
+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8031
+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
+2013-06-17 12:43:53,774 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031
+2013-06-17 12:43:53,775 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
+{code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-841">YARN-841</a>.
      Major sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
      <b>Annotate and document AuxService APIs</b><br>
@@ -96,24 +264,24 @@ Furthermore, HDFS uses fully qualified h
 <li> <a href="https://issues.apache.org/jira/browse/YARN-839">YARN-839</a>.
      Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
-     <blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.
-
-Exception in trunk:
-{noformat}
-Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
-Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec &lt;&lt;&lt; FAILURE!
-testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)  Time elapsed: 1307 sec  &lt;&lt;&lt; ERROR!
-java.lang.NullPointerException
-        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-        at java.lang.reflect.Method.invoke(Method.java:597)
-        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
-        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
-        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
-        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
-        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
+     <blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.
+
+Exception in trunk:
+{noformat}
+Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
+Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec &lt;&lt;&lt; FAILURE!
+testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)  Time elapsed: 1307 sec  &lt;&lt;&lt; ERROR!
+java.lang.NullPointerException
+        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278)
+        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
+        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
+        at java.lang.reflect.Method.invoke(Method.java:597)
+        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
+        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
+        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
+        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
+        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-837">YARN-837</a>.
      Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
@@ -154,7 +322,7 @@ java.lang.NullPointerException
 <li> <a href="https://issues.apache.org/jira/browse/YARN-824">YARN-824</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Add  static factory to yarn client lib interface and change it to abstract class</b><br>
-     <blockquote>Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private.
+     <blockquote>Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private.
 The purpose is not to expose impl</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-823">YARN-823</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -168,76 +336,88 @@ The purpose is not to expose impl</block
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Rename FinishApplicationMasterRequest.setFinishApplicationStatus to setFinalApplicationStatus to be consistent with getter</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-820">YARN-820</a>.
+     Major sub-task reported by Bikas Saha and fixed by Mayank Bansal <br>
+     <b>NodeManager has invalid state transition after error in resource localization</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-814">YARN-814</a>.
+     Major sub-task reported by Hitesh Shah and fixed by Jian He <br>
+     <b>Difficult to diagnose a failed container launch when error due to invalid environment variable</b><br>
+     <blockquote>The container's launch script sets up environment variables, symlinks etc. 
+
+If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. 
+
+To reproduce, set an env var where the value contains characters that throw syntax errors in bash. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-812">YARN-812</a>.
      Major bug reported by Ramya Sunil and fixed by Siddharth Seth <br>
      <b>Enabling app summary logs causes 'FileNotFound' errors</b><br>
-     <blockquote>RM app summary logs have been enabled as per the default config:
-
-{noformat}
-#
-# Yarn ResourceManager Application Summary Log 
-#
-# Set the ResourceManager summary log filename
-yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
-# Set the ResourceManager summary log level and appender
-yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
-
-# Appender for ResourceManager Application Summary Log
-# Requires the following properties to be set
-#    - hadoop.log.dir (Hadoop Log directory)
-#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
-#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
-
-log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
-log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
-log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
-log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
-log4j.appender.RMSUMMARY.MaxFileSize=256MB
-log4j.appender.RMSUMMARY.MaxBackupIndex=20
-log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
-log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
-{noformat}
-
-This however, throws errors while running commands as non-superuser:
-{noformat}
--bash-4.1$ hadoop dfs -ls /
-DEPRECATED: Use of this script to execute hdfs command is deprecated.
-Instead use the hdfs command for it.
-
-log4j:ERROR setFile(null,true) call failed.
-java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory)
-        at java.io.FileOutputStream.openAppend(Native Method)
-        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:192)
-        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:116)
-        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
-        at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
-        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
-        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
-        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
-        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
-        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
-        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
-        at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
-        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
-        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
-        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
-        at org.apache.log4j.LogManager.&lt;clinit&gt;(LogManager.java:127)
-        at org.apache.log4j.Logger.getLogger(Logger.java:104)
-        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
-        at org.apache.commons.logging.impl.Log4JLogger.&lt;init&gt;(Log4JLogger.java:109)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
-        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
-        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
-        at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
-        at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)
-        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
-        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
-        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
-        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
-        at org.apache.hadoop.fs.FsShell.&lt;clinit&gt;(FsShell.java:41)
-Found 1 items
-drwxr-xr-x   - hadoop   hadoop            0 2013-06-12 21:28 /user
+     <blockquote>RM app summary logs have been enabled as per the default config:
+
+{noformat}
+#
+# Yarn ResourceManager Application Summary Log 
+#
+# Set the ResourceManager summary log filename
+yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
+# Set the ResourceManager summary log level and appender
+yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
+
+# Appender for ResourceManager Application Summary Log
+# Requires the following properties to be set
+#    - hadoop.log.dir (Hadoop Log directory)
+#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
+#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
+
+log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
+log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
+log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
+log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
+log4j.appender.RMSUMMARY.MaxFileSize=256MB
+log4j.appender.RMSUMMARY.MaxBackupIndex=20
+log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
+log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
+{noformat}
+
+This however, throws errors while running commands as non-superuser:
+{noformat}
+-bash-4.1$ hadoop dfs -ls /
+DEPRECATED: Use of this script to execute hdfs command is deprecated.
+Instead use the hdfs command for it.
+
+log4j:ERROR setFile(null,true) call failed.
+java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory)
+        at java.io.FileOutputStream.openAppend(Native Method)
+        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:192)
+        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:116)
+        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
+        at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
+        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
+        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
+        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
+        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
+        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
+        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
+        at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
+        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
+        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
+        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
+        at org.apache.log4j.LogManager.&lt;clinit&gt;(LogManager.java:127)
+        at org.apache.log4j.Logger.getLogger(Logger.java:104)
+        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
+        at org.apache.commons.logging.impl.Log4JLogger.&lt;init&gt;(Log4JLogger.java:109)
+        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
+        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
+        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
+        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
+        at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
+        at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)
+        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
+        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
+        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
+        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
+        at org.apache.hadoop.fs.FsShell.&lt;clinit&gt;(FsShell.java:41)
+Found 1 items
+drwxr-xr-x   - hadoop   hadoop            0 2013-06-12 21:28 /user
 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-806">YARN-806</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -254,120 +434,124 @@ drwxr-xr-x   - hadoop   hadoop          
 <li> <a href="https://issues.apache.org/jira/browse/YARN-799">YARN-799</a>.
      Major bug reported by Chris Riccomini and fixed by Chris Riccomini (nodemanager)<br>
      <b>CgroupsLCEResourcesHandler tries to write to cgroup.procs</b><br>
-     <blockquote>The implementation of
-
-bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
-
-Tells the container-executor to write PIDs to cgroup.procs:
-
-{code}
-  public String getResourcesOption(ContainerId containerId) {
-    String containerName = containerId.toString();
-    StringBuilder sb = new StringBuilder("cgroups=");
-
-    if (isCpuWeightEnabled()) {
-      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
-      sb.append(",");
-    }
-
-    if (sb.charAt(sb.length() - 1) == ',') {
-      sb.deleteCharAt(sb.length() - 1);
-    } 
-    return sb.toString();
-  }
-{code}
-
-Apparently, this file has not always been writeable:
-
-https://patchwork.kernel.org/patch/116146/
-http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
-https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
-
-The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.
-
-{quote}
-$ uname -a
-Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
-{quote}
-
-As a result, when the container-executor tries to run, it fails with this error message:
-
-bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
-
-This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
-
-{quote}
-$ pwd 
-/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
-$ ls -l
-total 0
--r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
-{quote}
-
-I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.
-
-I can think of several potential resolutions to this ticket:
-
-1. Ignore the problem, and make people patch YARN when they hit this issue.
-2. Write to /tasks instead of /cgroup.procs for everyone
-3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
-4. Add a config to yarn-site that lets admins specify which file to write to.
-
+     <blockquote>The implementation of
+
+bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
+
+Tells the container-executor to write PIDs to cgroup.procs:
+
+{code}
+  public String getResourcesOption(ContainerId containerId) {
+    String containerName = containerId.toString();
+    StringBuilder sb = new StringBuilder("cgroups=");
+
+    if (isCpuWeightEnabled()) {
+      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
+      sb.append(",");
+    }
+
+    if (sb.charAt(sb.length() - 1) == ',') {
+      sb.deleteCharAt(sb.length() - 1);
+    } 
+    return sb.toString();
+  }
+{code}
+
+Apparently, this file has not always been writeable:
+
+https://patchwork.kernel.org/patch/116146/
+http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
+https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
+
+The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.
+
+{quote}
+$ uname -a
+Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
+{quote}
+
+As a result, when the container-executor tries to run, it fails with this error message:
+
+bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
+
+This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
+
+{quote}
+$ pwd 
+/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
+$ ls -l
+total 0
+-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
+{quote}
+
+I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.
+
+I can think of several potential resolutions to this ticket:
+
+1. Ignore the problem, and make people patch YARN when they hit this issue.
+2. Write to /tasks instead of /cgroup.procs for everyone
+3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
+4. Add a config to yarn-site that lets admins specify which file to write to.
+
 Thoughts?</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-795">YARN-795</a>.
      Major bug reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
      <b>Fair scheduler queue metrics should subtract allocated vCores from available vCores</b><br>
-     <blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.
+     <blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.
 This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-792">YARN-792</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Move NodeHealthStatus from yarn.api.record to yarn.server.api.record</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-791">YARN-791</a>.
+     Blocker sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)<br>
+     <b>Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-789">YARN-789</a>.
      Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scheduler)<br>
      <b>Enable zero capabilities resource requests in fair scheduler</b><br>
-     <blockquote>Per discussion in YARN-689, reposting updated use case:
-
-1. I have a set of services co-existing with a Yarn cluster.
-
-2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.
-
-3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.
-By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.
-
-These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.
-
-The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).
-
-The current limitation is that the increment is also the minimum. 
-
-If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).
-
-If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.
-
-Finally, on hard enforcement. 
-
-* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.
-
+     <blockquote>Per discussion in YARN-689, reposting updated use case:
+
+1. I have a set of services co-existing with a Yarn cluster.
+
+2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.
+
+3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.
+By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.
+
+These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.
+
+The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).
+
+The current limitation is that the increment is also the minimum. 
+
+If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).
+
+If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.
+
+Finally, on hard enforcement. 
+
+* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.
+
 * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again,  this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-787">YARN-787</a>.
      Blocker sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)<br>
      <b>Remove resource min from Yarn client API</b><br>
-     <blockquote>Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
+     <blockquote>Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-782">YARN-782</a>.
      Critical improvement reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
      <b>vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way </b><br>
-     <blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.
-
-If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory.  But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.
-
-The lack of consistency will exacerbate the already difficult problem of resource configuration.
+     <blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.
+
+If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory.  But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.
+
+The lack of consistency will exacerbate the already difficult problem of resource configuration.
 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-781">YARN-781</a>.
      Major sub-task reported by Devaraj Das and fixed by Jian He <br>
@@ -384,18 +568,22 @@ The lack of consistency will exacerbate 
 <li> <a href="https://issues.apache.org/jira/browse/YARN-767">YARN-767</a>.
      Major bug reported by Jian He and fixed by Jian He <br>
      <b>Initialize Application status metrics  when QueueMetrics is initialized</b><br>
-     <blockquote>Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed
+     <blockquote>Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed
 For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-764">YARN-764</a>.
      Major bug reported by nemon lou and fixed by nemon lou (resourcemanager)<br>
      <b>blank Used Resources on Capacity Scheduler page </b><br>
-     <blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)
+     <blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)
 After changing resource.java's toString method by replacing "&lt;&gt;" with "{}",this bug gets fixed.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-763">YARN-763</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>AMRMClientAsync should stop heartbeating after receiving shutdown from RM</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-761">YARN-761</a>.
      Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
      <b>TestNMClientAsync fails sometimes</b><br>
-     <blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
-
+     <blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
+
 It passed on my machine though.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-760">YARN-760</a>.
      Major bug reported by Sandy Ryza and fixed by Niranjan Singh (nodemanager)<br>
@@ -428,8 +616,8 @@ It passed on my machine though.</blockqu
 <li> <a href="https://issues.apache.org/jira/browse/YARN-750">YARN-750</a>.
      Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
      <b>Allow for black-listing resources in YARN API and Impl in CS</b><br>
-     <blockquote>YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources.
-
+     <blockquote>YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources.
+
 This jira is a companion to allow for black-listing (in CS).</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-749">YARN-749</a>.
      Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
@@ -442,13 +630,13 @@ This jira is a companion to allow for bl
 <li> <a href="https://issues.apache.org/jira/browse/YARN-746">YARN-746</a>.
      Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>rename Service.register() and Service.unregister() to registerServiceListener() &amp; unregisterServiceListener() respectively</b><br>
-     <blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} &amp; {{unregisterServiceListener()}} respectively.
-
+     <blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} &amp; {{unregisterServiceListener()}} respectively.
+
 This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-742">YARN-742</a>.
      Major bug reported by Kihwal Lee and fixed by Jason Lowe (nodemanager)<br>
      <b>Log aggregation causes a lot of redundant setPermission calls</b><br>
-     <blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/&lt;user&gt;/logs. Also mkdirs calls are made before this.
+     <blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/&lt;user&gt;/logs. Also mkdirs calls are made before this.
 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-739">YARN-739</a>.
      Major sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi <br>
@@ -458,6 +646,12 @@ This only affects a couple of production
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 </b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-736">YARN-736</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Add a multi-resource fair sharing metric</b><br>
+     <blockquote>Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it.  This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions.
+
+With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense.  It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-735">YARN-735</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Make ApplicationAttemptID, ContainerID, NodeID immutable</b><br>
@@ -465,35 +659,39 @@ This only affects a couple of production
 <li> <a href="https://issues.apache.org/jira/browse/YARN-733">YARN-733</a>.
      Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>TestNMClient fails occasionally</b><br>
-     <blockquote>The problem happens at:
-{code}
-        // getContainerStatus can be called after stopContainer
-        try {
-          ContainerStatus status = nmClient.getContainerStatus(
-              container.getId(), container.getNodeId(),
-              container.getContainerToken());
-          assertEquals(container.getId(), status.getContainerId());
-          assertEquals(ContainerState.RUNNING, status.getState());
-          assertTrue("" + i, status.getDiagnostics().contains(
-              "Container killed by the ApplicationMaster."));
-          assertEquals(-1000, status.getExitStatus());
-        } catch (YarnRemoteException e) {
-          fail("Exception is not expected");
-        }
-{code}
-
-NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.
-
-There will be the similar problem wrt NMClientImpl#startContainer.
+     <blockquote>The problem happens at:
+{code}
+        // getContainerStatus can be called after stopContainer
+        try {
+          ContainerStatus status = nmClient.getContainerStatus(
+              container.getId(), container.getNodeId(),
+              container.getContainerToken());
+          assertEquals(container.getId(), status.getContainerId());
+          assertEquals(ContainerState.RUNNING, status.getState());
+          assertTrue("" + i, status.getDiagnostics().contains(
+              "Container killed by the ApplicationMaster."));
+          assertEquals(-1000, status.getExitStatus());
+        } catch (YarnRemoteException e) {
+          fail("Exception is not expected");
+        }
+{code}
+
+NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.
+
+There will be the similar problem wrt NMClientImpl#startContainer.
 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-731">YARN-731</a>.
      Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen <br>
      <b>RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions</b><br>
      <blockquote>Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-727">YARN-727</a>.
+     Blocker sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
+     <b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter</b><br>
+     <blockquote>Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-726">YARN-726</a>.
      Critical bug reported by Siddharth Seth and fixed by Mayank Bansal <br>
      <b>Queue, FinishTime fields broken on RM UI</b><br>
-     <blockquote>The queue shows up as "Invalid Date"
+     <blockquote>The queue shows up as "Invalid Date"
 Finish Time shows up as a Long value.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-724">YARN-724</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -510,8 +708,8 @@ Finish Time shows up as a Long value.</b
 <li> <a href="https://issues.apache.org/jira/browse/YARN-717">YARN-717</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Copy BuilderUtil methods into token-related records</b><br>
-     <blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
-
+     <blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
+
 We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-716">YARN-716</a>.
      Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
@@ -520,69 +718,74 @@ We may remove the ClientToken/ContainerT
 <li> <a href="https://issues.apache.org/jira/browse/YARN-715">YARN-715</a>.
      Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
      <b>TestDistributedShell and TestUnmanagedAMLauncher are failing</b><br>
-     <blockquote>Tests are timing out. Looks like this is related to YARN-617.
-{code}
-2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
-  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
-  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
-  at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
-  at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
-  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
+     <blockquote>Tests are timing out. Looks like this is related to YARN-617.
+{code}
+2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.
+Expected containerId: user Found: container_1369183214008_0001_01_000001
+2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
+Expected containerId: user Found: container_1369183214008_0001_01_000001
+2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
+Expected containerId: user Found: container_1369183214008_0001_01_000001
+org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.
+Expected containerId: user Found: container_1369183214008_0001_01_000001
+  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
+  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
+  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
+  at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
+  at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
+  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-714">YARN-714</a>.
      Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>AMRM protocol changes for sending NMToken list</b><br>
-     <blockquote>NMToken will be sent to AM on allocate call if
-1) AM doesn't already have NMToken for the underlying NM
-2) Key rolled over on RM and AM gets new container on the same NM.
+     <blockquote>NMToken will be sent to AM on allocate call if
+1) AM doesn't already have NMToken for the underlying NM
+2) Key rolled over on RM and AM gets new container on the same NM.
 On allocate call RM will send a consolidated list of all required NMTokens.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-711">YARN-711</a>.
      Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
      <b>Copy BuilderUtil methods into individual records</b><br>
-     <blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.
-
+     <blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.
+
 As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-708">YARN-708</a>.
      Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
      <b>Move RecordFactory classes to hadoop-yarn-api, miscellaneous fixes to the interfaces</b><br>
-     <blockquote>This is required for additional changes in YARN-528. 
+     <blockquote>This is required for additional changes in YARN-528. 
 Some of the interfaces could use some cleanup as well - they shouldn't be declaring YarnException (Runtime) in their signature.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-706">YARN-706</a>.
      Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>Race Condition in TestFSDownload</b><br>
-     <blockquote>See the test failure in YARN-695
-
+     <blockquote>See the test failure in YARN-695
+
 https://builds.apache.org/job/PreCommit-YARN-Build/957//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPatternJar/</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-701">YARN-701</a>.
+     Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>ApplicationTokens should be used irrespective of kerberos</b><br>
+     <blockquote> - Single code path for secure and non-secure cases is useful for testing, coverage.
+ - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-700">YARN-700</a>.
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>TestInfoBlock fails on Windows because of line ending missmatch</b><br>
-     <blockquote>Exception:
-{noformat}
-Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
-Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec &lt;&lt;&lt; FAILURE!
-testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  Time elapsed: 873 sec  &lt;&lt;&lt; FAILURE!
-java.lang.AssertionError: 
-	at org.junit.Assert.fail(Assert.java:91)
-	at org.junit.Assert.assertTrue(Assert.java:43)
-	at org.junit.Assert.assertTrue(Assert.java:54)
-	at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
-	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
-	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-	at java.lang.reflect.Method.invoke(Method.java:597)
-	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
-	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
-	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
-	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
-	at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
+     <blockquote>Exception:
+{noformat}
+Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
+Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec &lt;&lt;&lt; FAILURE!
+testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  Time elapsed: 873 sec  &lt;&lt;&lt; FAILURE!
+java.lang.AssertionError: 
+	at org.junit.Assert.fail(Assert.java:91)
+	at org.junit.Assert.assertTrue(Assert.java:43)
+	at org.junit.Assert.assertTrue(Assert.java:54)
+	at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
+	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
+	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
+	at java.lang.reflect.Method.invoke(Method.java:597)
+	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
+	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
+	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
+	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
+	at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-695">YARN-695</a>.
      Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
@@ -591,28 +794,28 @@ java.lang.AssertionError: 
 <li> <a href="https://issues.apache.org/jira/browse/YARN-694">YARN-694</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Start using NMTokens to authenticate all communication with NM</b><br>
-     <blockquote>AM uses the NMToken to authenticate all the AM-NM communication.
-NM will validate NMToken in below manner
-* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.
-* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.
-* If NMToken is invalid then NM will reject AM calls.
-
-Modification for ContainerToken
-* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).
-* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).
+     <blockquote>AM uses the NMToken to authenticate all the AM-NM communication.
+NM will validate NMToken in below manner
+* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.
+* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.
+* If NMToken is invalid then NM will reject AM calls.
+
+Modification for ContainerToken
+* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).
+* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).
 * ContainerToken will exist and it will only be used to validate the AM's container start request.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-693">YARN-693</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Sending NMToken to AM on allocate call</b><br>
-     <blockquote>This is part of YARN-613.
-As per the updated design, AM will receive per NM, NMToken in following scenarios
-* AM is receiving first container on underlying NM.
-* AM is receiving container on underlying NM after either NM or RM rebooted.
-** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).
-** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.
-* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.
-In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.
-
+     <blockquote>This is part of YARN-613.
+As per the updated design, AM will receive per NM, NMToken in following scenarios
+* AM is receiving first container on underlying NM.
+* AM is receiving container on underlying NM after either NM or RM rebooted.
+** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).
+** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.
+* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.
+In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.
+
 AMRMClient should expose these NMToken to client. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-692">YARN-692</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
@@ -621,9 +824,14 @@ AMRMClient should expose these NMToken t
 <li> <a href="https://issues.apache.org/jira/browse/YARN-690">YARN-690</a>.
      Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
      <b>RM exits on token cancel/renew problems</b><br>
-     <blockquote>The DelegationTokenRenewer thread is critical to the RM.  When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread.  It should be exiting only on non-RuntimeExceptions.
-
+     <blockquote>The DelegationTokenRenewer thread is critical to the RM.  When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread.  It should be exiting only on non-RuntimeExceptions.
+
 The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process.  An UnknownHostException takes down the RM...</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-688">YARN-688</a>.
+     Major bug reported by Jian He and fixed by Jian He <br>
+     <b>Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater</b><br>
+     <blockquote>Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495.
+On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-686">YARN-686</a>.
      Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
      <b>Flatten NodeReport</b><br>
@@ -636,6 +844,11 @@ The problem is especially bad in 23 beca
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
      <b>Change ResourceTracker API and LocalizationProtocol API to throw YarnRemoteException and IOException</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-661">YARN-661</a>.
+     Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
+     <b>NM fails to cleanup local directories for users</b><br>
+     <blockquote>YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems.  The top-level usercache directory is owned by the user but is in a directory that is not writable by the user.  Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.
+</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-660">YARN-660</a>.
      Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
      <b>Improve AMRMClient with matching requests</b><br>
@@ -644,6 +857,10 @@ The problem is especially bad in 23 beca
      Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
      <b>Fair scheduler metrics should subtract allocated memory from available memory</b><br>
      <blockquote>In the scheduler web UI, cluster metrics reports that the "Memory Total" goes up when an application is allocated resources.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-654">YARN-654</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>AMRMClient: Perform sanity checks for parameters of public methods</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-651">YARN-651</a>.
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
      <b>Change ContainerManagerPBClientImpl and RMAdminProtocolPBClientImpl to throw IOException and YarnRemoteException</b><br>
@@ -655,9 +872,9 @@ The problem is especially bad in 23 beca
 <li> <a href="https://issues.apache.org/jira/browse/YARN-646">YARN-646</a>.
      Major bug reported by Dapeng Sun and fixed by Dapeng Sun (documentation)<br>
      <b>Some issues in Fair Scheduler's document</b><br>
-     <blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
-1.In the section &#8220;Configuration&#8221;, It contains two properties named &#8220;yarn.scheduler.fair.minimum-allocation-mb&#8221;, the second one should be &#8220;yarn.scheduler.fair.maximum-allocation-mb&#8221;
-2.In the section &#8220;Allocation file format&#8221;, the document tells &#8220; The format contains three types of elements&#8221;, but it lists four types of elements following that.
+     <blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
+1.In the section &#8220;Configuration&#8221;, It contains two properties named &#8220;yarn.scheduler.fair.minimum-allocation-mb&#8221;, the second one should be &#8220;yarn.scheduler.fair.maximum-allocation-mb&#8221;
+2.In the section &#8220;Allocation file format&#8221;, the document tells &#8220; The format contains three types of elements&#8221;, but it lists four types of elements following that.
 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-645">YARN-645</a>.
      Major bug reported by Jian He and fixed by Jian He <br>
@@ -722,8 +939,8 @@ The problem is especially bad in 23 beca
 <li> <a href="https://issues.apache.org/jira/browse/YARN-617">YARN-617</a>.
      Minor sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
      <b>In unsercure mode, AM can fake resource requirements </b><br>
-     <blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.
-
+     <blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.
+
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-615">YARN-615</a>.

[... 2402 lines stripped ...]