You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Srinivas Chamarthi <sr...@gmail.com> on 2013/10/11 09:22:06 UTC

Map Reduce Job fails

I have a 2 node cluster (HDP1, HDP2) as mentioned below.

HDP 1

1.name node ,
2.data node,
3. node manager
4. resource manager

HDP 2

1. node manager
2. data node

when I submit the map reduce job on HDP1 , the job runs on node HDP2 which
is fine.
But the job fails and in the userlogs/syslogs of the application on HDP2 I
am finding an exception


2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
report from attempt_1381261418535_0021_m_000000_0: Container launch failed
for container_1381261418535_0021_01_000003 : java.net.ConnectException:
Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
connection exception: java.net.ConnectException: Connection refused; For
more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)


looks like it is trying to connect to same server but for some reason it
cannot find something it is looking for on that port.


any idea what I am missing ?

1. one may be I am missing to run some process on HDP2
2. or not providing the right configuration to find a server on HDP1.

any help is greatly appreciated

thx
srinivas

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
issue with /etc/hosts files. thx for letting me explore on my own.
understood lot of internals.


On Fri, Oct 11, 2013 at 3:28 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> from the stack trace, I believe, it is trying to start/connect the
> ApplicationMaster and fails to connect to it. I am not sure if this is
> related to ec2 loopback adapter.
>
>
> On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
> srinivas.chamarthi@gmail.com> wrote:
>
>> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>>
>> HDP 1
>>
>> 1.name node ,
>> 2.data node,
>> 3. node manager
>> 4. resource manager
>>
>> HDP 2
>>
>> 1. node manager
>> 2. data node
>>
>> when I submit the map reduce job on HDP1 , the job runs on node HDP2
>> which is fine.
>> But the job fails and in the userlogs/syslogs of the application on HDP2
>> I am finding an exception
>>
>>
>> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
>> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
>> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
>> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
>> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
>> connection exception: java.net.ConnectException: Connection refused; For
>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>
>> looks like it is trying to connect to same server but for some reason it
>> cannot find something it is looking for on that port.
>>
>>
>> any idea what I am missing ?
>>
>> 1. one may be I am missing to run some process on HDP2
>> 2. or not providing the right configuration to find a server on HDP1.
>>
>> any help is greatly appreciated
>>
>> thx
>> srinivas
>>
>>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
issue with /etc/hosts files. thx for letting me explore on my own.
understood lot of internals.


On Fri, Oct 11, 2013 at 3:28 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> from the stack trace, I believe, it is trying to start/connect the
> ApplicationMaster and fails to connect to it. I am not sure if this is
> related to ec2 loopback adapter.
>
>
> On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
> srinivas.chamarthi@gmail.com> wrote:
>
>> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>>
>> HDP 1
>>
>> 1.name node ,
>> 2.data node,
>> 3. node manager
>> 4. resource manager
>>
>> HDP 2
>>
>> 1. node manager
>> 2. data node
>>
>> when I submit the map reduce job on HDP1 , the job runs on node HDP2
>> which is fine.
>> But the job fails and in the userlogs/syslogs of the application on HDP2
>> I am finding an exception
>>
>>
>> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
>> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
>> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
>> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
>> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
>> connection exception: java.net.ConnectException: Connection refused; For
>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>
>> looks like it is trying to connect to same server but for some reason it
>> cannot find something it is looking for on that port.
>>
>>
>> any idea what I am missing ?
>>
>> 1. one may be I am missing to run some process on HDP2
>> 2. or not providing the right configuration to find a server on HDP1.
>>
>> any help is greatly appreciated
>>
>> thx
>> srinivas
>>
>>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
issue with /etc/hosts files. thx for letting me explore on my own.
understood lot of internals.


On Fri, Oct 11, 2013 at 3:28 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> from the stack trace, I believe, it is trying to start/connect the
> ApplicationMaster and fails to connect to it. I am not sure if this is
> related to ec2 loopback adapter.
>
>
> On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
> srinivas.chamarthi@gmail.com> wrote:
>
>> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>>
>> HDP 1
>>
>> 1.name node ,
>> 2.data node,
>> 3. node manager
>> 4. resource manager
>>
>> HDP 2
>>
>> 1. node manager
>> 2. data node
>>
>> when I submit the map reduce job on HDP1 , the job runs on node HDP2
>> which is fine.
>> But the job fails and in the userlogs/syslogs of the application on HDP2
>> I am finding an exception
>>
>>
>> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
>> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
>> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
>> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
>> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
>> connection exception: java.net.ConnectException: Connection refused; For
>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>
>> looks like it is trying to connect to same server but for some reason it
>> cannot find something it is looking for on that port.
>>
>>
>> any idea what I am missing ?
>>
>> 1. one may be I am missing to run some process on HDP2
>> 2. or not providing the right configuration to find a server on HDP1.
>>
>> any help is greatly appreciated
>>
>> thx
>> srinivas
>>
>>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
issue with /etc/hosts files. thx for letting me explore on my own.
understood lot of internals.


On Fri, Oct 11, 2013 at 3:28 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> from the stack trace, I believe, it is trying to start/connect the
> ApplicationMaster and fails to connect to it. I am not sure if this is
> related to ec2 loopback adapter.
>
>
> On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
> srinivas.chamarthi@gmail.com> wrote:
>
>> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>>
>> HDP 1
>>
>> 1.name node ,
>> 2.data node,
>> 3. node manager
>> 4. resource manager
>>
>> HDP 2
>>
>> 1. node manager
>> 2. data node
>>
>> when I submit the map reduce job on HDP1 , the job runs on node HDP2
>> which is fine.
>> But the job fails and in the userlogs/syslogs of the application on HDP2
>> I am finding an exception
>>
>>
>> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
>> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
>> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
>> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
>> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
>> connection exception: java.net.ConnectException: Connection refused; For
>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>
>> looks like it is trying to connect to same server but for some reason it
>> cannot find something it is looking for on that port.
>>
>>
>> any idea what I am missing ?
>>
>> 1. one may be I am missing to run some process on HDP2
>> 2. or not providing the right configuration to find a server on HDP1.
>>
>> any help is greatly appreciated
>>
>> thx
>> srinivas
>>
>>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
from the stack trace, I believe, it is trying to start/connect the
ApplicationMaster and fails to connect to it. I am not sure if this is
related to ec2 loopback adapter.


On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>
> HDP 1
>
> 1.name node ,
> 2.data node,
> 3. node manager
> 4. resource manager
>
> HDP 2
>
> 1. node manager
> 2. data node
>
> when I submit the map reduce job on HDP1 , the job runs on node HDP2 which
> is fine.
> But the job fails and in the userlogs/syslogs of the application on HDP2 I
> am finding an exception
>
>
> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
> connection exception: java.net.ConnectException: Connection refused; For
> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>
> looks like it is trying to connect to same server but for some reason it
> cannot find something it is looking for on that port.
>
>
> any idea what I am missing ?
>
> 1. one may be I am missing to run some process on HDP2
> 2. or not providing the right configuration to find a server on HDP1.
>
> any help is greatly appreciated
>
> thx
> srinivas
>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
from the stack trace, I believe, it is trying to start/connect the
ApplicationMaster and fails to connect to it. I am not sure if this is
related to ec2 loopback adapter.


On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>
> HDP 1
>
> 1.name node ,
> 2.data node,
> 3. node manager
> 4. resource manager
>
> HDP 2
>
> 1. node manager
> 2. data node
>
> when I submit the map reduce job on HDP1 , the job runs on node HDP2 which
> is fine.
> But the job fails and in the userlogs/syslogs of the application on HDP2 I
> am finding an exception
>
>
> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
> connection exception: java.net.ConnectException: Connection refused; For
> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>
> looks like it is trying to connect to same server but for some reason it
> cannot find something it is looking for on that port.
>
>
> any idea what I am missing ?
>
> 1. one may be I am missing to run some process on HDP2
> 2. or not providing the right configuration to find a server on HDP1.
>
> any help is greatly appreciated
>
> thx
> srinivas
>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
from the stack trace, I believe, it is trying to start/connect the
ApplicationMaster and fails to connect to it. I am not sure if this is
related to ec2 loopback adapter.


On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>
> HDP 1
>
> 1.name node ,
> 2.data node,
> 3. node manager
> 4. resource manager
>
> HDP 2
>
> 1. node manager
> 2. data node
>
> when I submit the map reduce job on HDP1 , the job runs on node HDP2 which
> is fine.
> But the job fails and in the userlogs/syslogs of the application on HDP2 I
> am finding an exception
>
>
> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
> connection exception: java.net.ConnectException: Connection refused; For
> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>
> looks like it is trying to connect to same server but for some reason it
> cannot find something it is looking for on that port.
>
>
> any idea what I am missing ?
>
> 1. one may be I am missing to run some process on HDP2
> 2. or not providing the right configuration to find a server on HDP1.
>
> any help is greatly appreciated
>
> thx
> srinivas
>
>

Re: Map Reduce Job fails

Posted by Srinivas Chamarthi <sr...@gmail.com>.
from the stack trace, I believe, it is trying to start/connect the
ApplicationMaster and fails to connect to it. I am not sure if this is
related to ec2 loopback adapter.


On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:

> I have a 2 node cluster (HDP1, HDP2) as mentioned below.
>
> HDP 1
>
> 1.name node ,
> 2.data node,
> 3. node manager
> 4. resource manager
>
> HDP 2
>
> 1. node manager
> 2. data node
>
> when I submit the map reduce job on HDP1 , the job runs on node HDP2 which
> is fine.
> But the job fails and in the userlogs/syslogs of the application on HDP2 I
> am finding an exception
>
>
> 2013-10-11 03:10:35,700 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1381261418535_0021_m_000000_0: Container launch failed
> for container_1381261418535_0021_01_000003 : java.net.ConnectException:
> Call From 249/172.xy.ab.249 to ip-172-xy-ab-249.localdomain:37785 failed on
> connection exception: java.net.ConnectException: Connection refused; For
> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>
> looks like it is trying to connect to same server but for some reason it
> cannot find something it is looking for on that port.
>
>
> any idea what I am missing ?
>
> 1. one may be I am missing to run some process on HDP2
> 2. or not providing the right configuration to find a server on HDP1.
>
> any help is greatly appreciated
>
> thx
> srinivas
>
>