You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Nur Kholis Majid <nu...@gmail.com> on 2015/02/27 13:29:46 UTC

How to set AM attempt interval?

Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.

Re: How to set AM attempt interval?

Posted by Nur Kholis Majid <nu...@gmail.com>.

Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_000010. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_000001_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
> <nu...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>

Re: How to set AM attempt interval?

Posted by Nur Kholis Majid <nu...@gmail.com>.

Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_000010. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_000001_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
> <nu...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>

Re: How to set AM attempt interval?

Posted by Nur Kholis Majid <nu...@gmail.com>.

Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_000010. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_000001_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
> <nu...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>

Re: How to set AM attempt interval?

Posted by Nur Kholis Majid <nu...@gmail.com>.

Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_000010. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_000001_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
> <nu...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>

Re: How to set AM attempt interval?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere.

Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) added at YARN-1781?

+Vinod

On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com>> wrote:

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <nu...@gmail.com>> wrote:
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.

Re: How to set AM attempt interval?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere.

Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) added at YARN-1781?

+Vinod

On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com>> wrote:

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <nu...@gmail.com>> wrote:
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.

Re: How to set AM attempt interval?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere.

Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) added at YARN-1781?

+Vinod

On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com>> wrote:

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <nu...@gmail.com>> wrote:
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.

Re: How to set AM attempt interval?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere.

Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) added at YARN-1781?

+Vinod

On Feb 27, 2015, at 7:31 AM, Ted Yu <yu...@gmail.com>> wrote:

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <nu...@gmail.com>> wrote:
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.

Re: How to set AM attempt interval?

Posted by Ted Yu <yu...@gmail.com>.

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <
nur.kholis.majid@gmail.com> wrote:

> Hi All,
>
> I have many jobs failed because AM trying to rerun job in very short
> interval (only in 6 second). How can I add the interval to bigger
> value?
>
> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>
> Thank you.
>

Re: How to set AM attempt interval?

Posted by Ted Yu <yu...@gmail.com>.

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <
nur.kholis.majid@gmail.com> wrote:

> Hi All,
>
> I have many jobs failed because AM trying to rerun job in very short
> interval (only in 6 second). How can I add the interval to bigger
> value?
>
> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>
> Thank you.
>

Re: How to set AM attempt interval?

Posted by Ted Yu <yu...@gmail.com>.

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <
nur.kholis.majid@gmail.com> wrote:

> Hi All,
>
> I have many jobs failed because AM trying to rerun job in very short
> interval (only in 6 second). How can I add the interval to bigger
> value?
>
> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>
> Thank you.
>

Re: How to set AM attempt interval?

Posted by Ted Yu <yu...@gmail.com>.

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid <
nur.kholis.majid@gmail.com> wrote:

> Hi All,
>
> I have many jobs failed because AM trying to rerun job in very short
> interval (only in 6 second). How can I add the interval to bigger
> value?
>
> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>
> Thank you.
>