You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Fengyun RAO <ra...@gmail.com> on 2014/04/02 15:55:19 UTC

Re: YarnException: Unauthorized request to start container. This token is expired.

thank you, omkar,

I'm fresh to Hadoop, and all the settings are default, so I guess the
expiration is 10 minutes.

The exception happens when running big job, which occupies all the
resources of all nodes.

When running small job, with many containers remained, no exception was
thrown.


Actually I didn't quite follow you, what "reservation" means,
I guess you mean RM creates the token at the time of reservation, but when
it assigns the container to AM, the token is expired.
Is this correct?

Can I ask you a favor to help me find the jira? or tell me which version
fixed the problem?

Thanks!

2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:

> Can you check few things?
> What is the container expiry interval set to?
> How many containers are getting allocated?
> Is there any reservation of the containers happening..?
> if yes then that was a known problem...I don't remember the jira number
> though... Underlying problem in case of reservation was that it creates a
> token at the time of reservation and not when it issues the token to AM.
>
>
>
> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>
>> no doubt
>>
>> Sent from my iPhone 6
>>
>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>> >
>> > What does this exception mean? I googled a lot, all the results tell me
>> it's because the time is not synchronized between datanode and namenode.
>> > However, I checked all the servers, that the ntpd service is on, and
>> the time differences are less than 1 second.
>> > What's more, the tasks are not always failing on certain datanodes.
>> > It fails and then it restarts and succeeds. If it were the time
>> problem, I guess it would always fail.
>> >
>> > My hadoop version is CDH5 beta. Below is the detailed log:
>> >
>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>> job_1394434496930_0032
>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>> running in uber mode : false
>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>> > Container launch failed for container_1394434496930_0032_01_000041 :
>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>> start container.
>> > This token is expired. current time is 1395558481146 found 1395558443384
>> >        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> >        at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> >        at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> >        at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> >        at
>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>> >        at
>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>> >        at
>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>> >        at
>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>> >        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >        at java.lang.Thread.run(Thread.java:724)
>> >
>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>> > Container launch failed for container_1394434496930_0032_01_000038 :
>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>> start container.
>> > This token is expired. current time is 1395558575889 found 1395558443245
>> >        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> >        at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> >        at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> >        at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> >        at
>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>> >        at
>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>> >        at
>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>> >        at
>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>> >        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >        at java.lang.Thread.run(Thread.java:724)
>> >
>>
>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Wangda Tan <wh...@gmail.com>.
Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <ra...@gmail.com> wrote:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Fengyun RAO <ra...@gmail.com>.
I've found the jira page: https://issues.apache.org/jira/browse/YARN-180

though I don't quite understand container reservation.


2014-04-02 21:55 GMT+08:00 Fengyun RAO <ra...@gmail.com>:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Wangda Tan <wh...@gmail.com>.
Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <ra...@gmail.com> wrote:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Wangda Tan <wh...@gmail.com>.
Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <ra...@gmail.com> wrote:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Fengyun RAO <ra...@gmail.com>.
I've found the jira page: https://issues.apache.org/jira/browse/YARN-180

though I don't quite understand container reservation.


2014-04-02 21:55 GMT+08:00 Fengyun RAO <ra...@gmail.com>:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Wangda Tan <wh...@gmail.com>.
Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <ra...@gmail.com> wrote:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Fengyun RAO <ra...@gmail.com>.
I've found the jira page: https://issues.apache.org/jira/browse/YARN-180

though I don't quite understand container reservation.


2014-04-02 21:55 GMT+08:00 Fengyun RAO <ra...@gmail.com>:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Re: YarnException: Unauthorized request to start container. This token is expired.

Posted by Fengyun RAO <ra...@gmail.com>.
I've found the jira page: https://issues.apache.org/jira/browse/YARN-180

though I don't quite understand container reservation.


2014-04-02 21:55 GMT+08:00 Fengyun RAO <ra...@gmail.com>:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <om...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <ra...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>