You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Bharath Kumarasubramanian (Jira)" <ji...@apache.org> on 2020/05/27 03:31:00 UTC

[jira] [Closed] (SAMZA-2475) Add a allocated resource expiry timeout in samza yarn type of apps

     [ https://issues.apache.org/jira/browse/SAMZA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharath Kumarasubramanian closed SAMZA-2475.
--------------------------------------------

> Add a allocated resource expiry timeout in samza yarn type of apps
> ------------------------------------------------------------------
>
>                 Key: SAMZA-2475
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2475
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Sanil Jain
>            Assignee: Sanil Jain
>            Priority: Major
>             Fix For: 1.5
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Today if samza apps are not able to use an allocated resource within
>  yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
>  start of the container fails with this exception, this can be avoided by just setting a allocated resource timeout less than that config
>  
> {code:java}
> // 2020-02-21 00:45:28.033 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] YarnClusterResourceManager [INFO] Got start error notification for Container ID: container_e05_1563223715359_0384_01_000833 for Processor ID: 34-0 
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time zones.
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
> 	at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> 	at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #61] YarnClusterResourceManager [INFO] Got stop notification for Container ID: container_e05_1563223715359_0384_01_000183 for Processor ID: 35-0
> 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] ContainerProcessManager [INFO] Container ID: container_e05_1563223715359_0384_01_000833 matched pending Processor ID: 34-0 on host: lva1-app1115.corp.linkedin.com
> 2020-02-21 00:45:28.034 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] ContainerProcessManager [ERROR] Launch failed for pending Processor ID: 34-0 on Container ID: container_e05_1563223715359_0384_01_000833 on host: lva1-app1115.corp.linkedin.com with exception: {}
> org.apache.samza.clustermanager.ProcessorLaunchException: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time zones.
> 	at org.apache.samza.job.yarn.YarnClusterResourceManager.onStartContainerError(YarnClusterResourceManager.java:552)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.onExceptionRaised(NMClientAsyncImpl.java:401)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:390)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time zones.
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
> 	at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> 	at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
> 	... 10 more
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)