You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Sanil Jain (Jira)" <ji...@apache.org> on 2020/02/26 23:37:00 UTC

[jira] [Created] (SAMZA-2474) Release expired allocated resources from a job

Sanil Jain created SAMZA-2474:
---------------------------------

             Summary: Release expired allocated resources from a job
                 Key: SAMZA-2474
                 URL: https://issues.apache.org/jira/browse/SAMZA-2474
             Project: Samza
          Issue Type: New Feature
            Reporter: Sanil Jain


Yarn deems allocated resources expired 
yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
But samza still holds on to that resource and tries to issue a container start which results in start failures like this (exception below), samza should prevent starting new containers on such resources

 

```

2020-02-21 00:45:28.033 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] YarnClusterResourceManager [INFO] Got start error notification for Container ID: container_e05_1563223715359_0384_01_000833 for Processor ID: 34-0 org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1582245928028 found 1582245749262 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #61] YarnClusterResourceManager [INFO] Got stop notification for Container ID: container_e05_1563223715359_0384_01_000183 for Processor ID: 35-0 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] ContainerProcessManager [INFO] Container ID: container_e05_1563223715359_0384_01_000833 matched pending Processor ID: 34-0 on host: lva1-app1115.corp.linkedin.com 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] ContainerProcessManager [ERROR] Launch failed for pending Processor ID: 34-0 on Container ID: container_e05_1563223715359_0384_01_000833 on host: lva1-app1115.corp.linkedin.com with exception: {} org.apache.samza.clustermanager.ProcessorLaunchException: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1582245928028 found 1582245749262 Note: System times on machines may be out of sync. Check system time and time zones. at org.apache.samza.job.yarn.YarnClusterResourceManager.onStartContainerError(YarnClusterResourceManager.java:552) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.onExceptionRaised(NMClientAsyncImpl.java:401) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:390) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1582245928028 found 1582245749262 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378) ... 10 more

```

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)