You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Amol Kekre (JIRA)" <ji...@apache.org> on 2011/07/21 02:05:58 UTC

[jira] [Created] (MAPREDUCE-2718) Job fails if AppMaster is killed

Job fails if AppMaster is killed
--------------------------------

                 Key: MAPREDUCE-2718
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2718
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
            Reporter: Amol Kekre
             Fix For: 0.23.0


Started a cluster. Sumitted a sleep job with around 10000 maps and 1000 reduces.
when 5000 maps got completed, It killed AppMaster.
RM web UI Application as failed.
And jobclient after retry for 50 times -:
{
java.lang.reflect.UndeclaredThrowableException
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:161)
        at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:254)
        at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:520)
        at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:540)
        at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1130)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1084)
        at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:259)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
        at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:191)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
        at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:111)
        at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:118)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call to /98.137.103.174:42557 failed on
connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:96)
        at $Proxy11.getTaskAttemptCompletionEvents(Unknown Source)
        at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:154)
        ... 21 more
Caused by: java.net.ConnectException: Call to /... failed on connection exception:
java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1087)
        at org.apache.hadoop.ipc.Client.call(Client.java:1063)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:250)
        at org.apache.hadoop.yarn.ipc.$Proxy10.call(Unknown Source)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:94)
        ... 23 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:375)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:448)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:536)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
        at org.apache.hadoop.ipc.Client.call(Client.java:1040)
}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-2718) Job fails if AppMaster is killed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-2718.
--------------------------------------

    Resolution: Not A Problem

> Job fails if AppMaster is killed
> --------------------------------
>
>                 Key: MAPREDUCE-2718
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2718
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Amol Kekre
>             Fix For: 0.23.0
>
>
> Started a cluster. Sumitted a sleep job with around 10000 maps and 1000 reduces.
> when 5000 maps got completed, It killed AppMaster.
> RM web UI Application as failed.
> And jobclient after retry for 50 times -:
> {
> java.lang.reflect.UndeclaredThrowableException
>         at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:161)
>         at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:254)
>         at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:520)
>         at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:540)
>         at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1130)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1084)
>         at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:259)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>         at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:191)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>         at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:111)
>         at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:118)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call to /98.137.103.174:42557 failed on
> connection exception: java.net.ConnectException: Connection refused
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:96)
>         at $Proxy11.getTaskAttemptCompletionEvents(Unknown Source)
>         at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:154)
>         ... 21 more
> Caused by: java.net.ConnectException: Call to /... failed on connection exception:
> java.net.ConnectException: Connection refused
>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1087)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1063)
>         at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:250)
>         at org.apache.hadoop.yarn.ipc.$Proxy10.call(Unknown Source)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:94)
>         ... 23 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:375)
>         at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:448)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:536)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1040)
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira