You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by john smith <js...@gmail.com> on 2011/09/06 09:48:28 UTC

Map phase hanging for wordcount example

Hi Folks,

I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test data
with replication factor 3 (around 400MB data). However when I run wordcount
example , it hangs at map 0%.

bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data
/out2
11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to process :
2
11/09/06 13:07:28 INFO mapred.JobClient: Running job: job_201109061248_0002
11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%

TTs and DNs are running fine on my slaves . I see them running when I run
jps command.


Can any one help me out on this? Any idea why this would happen? I am
totally clueless as nothing shows up in logs too.!

Thanks,
jS

Re: Map phase hanging for wordcount example

Posted by Harsh J <ha...@cloudera.com>.

The wordcount example, by default, will run a single reducer. This is
controllable by passing -Dmapred.reduce.tasks=2 to your launcher. The
following will work:

hadoop jar hadoop-examples.jar wordcount -Dmapred.reduce.tasks=2 input output

Note that just cause a cluster has N nodes, N reducers aren't
necessary to run. It is not dependent on such things, and is simply a
user-configurable number with the default value of 1.

On Tue, Sep 6, 2011 at 3:23 PM, john smith <js...@gmail.com> wrote:
> Yep , it works .. I just synced /etc/hosts files and I didnt change other
> configs and now its working fine. Thanks for the help Harsh. Sorry for
> spamming without checking my TTlogs properly.
>
> Also 1 more doubt . Any idea why its scheduling only a single reduce? I have
> 2 datanodes and I am expecting it to run 2 reducers (data size of 500MB) .
>
> Any hints?
>
>
> On Tue, Sep 6, 2011 at 3:17 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> John,
>>
>> Yes, looks like your slave nodes aren't able to properly resolve some
>> hostnames. Hadoop requires a sane network setup to work properly.
>> Also, yes, you need to use a hostname for your fs.default.name and
>> other configs to the extent possible.
>>
>> The easiest way is to keep a properly synchronized /etc/hosts file.
>>
>> For example, it may look like so, on all machines:
>>
>> 127.0.0.1 localhost.localdomain localhost
>> 192.168.0.1 master.hadoop master
>> 192.168.0.2 slave3.hadoop slave3
>> (and so on…)
>>
>> (This way master can resolve slaves, and slaves can resolve master. If
>> you have the time, setup a DNS, its the best thing to do.)
>>
>> Then, in core-site.xml you'll need:
>>
>> fs.default.name = hdfs://master
>>
>> And in mapred-site.xml:
>>
>> mapred.job.tracker = master:8021
>>
>> That should do it, so long as the slave hosts can freely access the
>> master hosts (no blockage of ports via firewall and such).
>>
>> On Tue, Sep 6, 2011 at 3:05 PM, john smith <js...@gmail.com> wrote:
>> > Hey My TT logs show this ,
>> >
>> > 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught
>> > exception: java.net.UnknownHostException: unknown host: rip-pc.local
>> > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
>> > at org.apache.hadoop.ipc.Client.getConnection(Client.java:853)
>> > at org.apache.hadoop.ipc.Client.call(Client.java:723)
>> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>> > at $Proxy5.getProtocolVersion(Unknown Source)
>> > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>> > at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>> > ^C at
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>> > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>> >
>> >
>> > May be some error in configs ?? I am using IPs in the conf files ..should
>> I
>> > put entries in /etc/hosts files?
>> >
>> > On Tue, Sep 6, 2011 at 3:00 PM, john smith <js...@gmail.com>
>> wrote:
>> >
>> >> Hi Harsh,
>> >>
>> >> My jt log : http://pastebin.com/rXAEeDkC
>> >>
>> >> I have some startup exceptions (which doesn't matter much I guess) but
>> the
>> >> tail indicates that its locating the splits correctly and then it hangs
>> !
>> >>
>> >> Any idea?
>> >>
>> >> Thanks
>> >>
>> >>
>> >> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >>> I'd check the tail of JobTracker logs after a submit is done to see if
>> >>> an error/warn there is causing this. And then dig further on
>> >>> why/what/how.
>> >>>
>> >>> Hard to tell what your problem specifically is without logs :)
>> >>>
>> >>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com>
>> >>> wrote:
>> >>> > Hi Folks,
>> >>> >
>> >>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test
>> >>> data
>> >>> > with replication factor 3 (around 400MB data). However when I run
>> >>> wordcount
>> >>> > example , it hangs at map 0%.
>> >>> >
>> >>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount
>> /test_data
>> >>> > /out2
>> >>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
>> >>> process :
>> >>> > 2
>> >>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
>> >>> job_201109061248_0002
>> >>> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
>> >>> >
>> >>> > TTs and DNs are running fine on my slaves . I see them running when I
>> >>> run
>> >>> > jps command.
>> >>> >
>> >>> >
>> >>> > Can any one help me out on this? Any idea why this would happen? I am
>> >>> > totally clueless as nothing shows up in logs too.!
>> >>> >
>> >>> > Thanks,
>> >>> > jS
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: Map phase hanging for wordcount example

Posted by john smith <js...@gmail.com>.

Yep , it works .. I just synced /etc/hosts files and I didnt change other
configs and now its working fine. Thanks for the help Harsh. Sorry for
spamming without checking my TTlogs properly.

Also 1 more doubt . Any idea why its scheduling only a single reduce? I have
2 datanodes and I am expecting it to run 2 reducers (data size of 500MB) .

Any hints?


On Tue, Sep 6, 2011 at 3:17 PM, Harsh J <ha...@cloudera.com> wrote:

> John,
>
> Yes, looks like your slave nodes aren't able to properly resolve some
> hostnames. Hadoop requires a sane network setup to work properly.
> Also, yes, you need to use a hostname for your fs.default.name and
> other configs to the extent possible.
>
> The easiest way is to keep a properly synchronized /etc/hosts file.
>
> For example, it may look like so, on all machines:
>
> 127.0.0.1 localhost.localdomain localhost
> 192.168.0.1 master.hadoop master
> 192.168.0.2 slave3.hadoop slave3
> (and so on…)
>
> (This way master can resolve slaves, and slaves can resolve master. If
> you have the time, setup a DNS, its the best thing to do.)
>
> Then, in core-site.xml you'll need:
>
> fs.default.name = hdfs://master
>
> And in mapred-site.xml:
>
> mapred.job.tracker = master:8021
>
> That should do it, so long as the slave hosts can freely access the
> master hosts (no blockage of ports via firewall and such).
>
> On Tue, Sep 6, 2011 at 3:05 PM, john smith <js...@gmail.com> wrote:
> > Hey My TT logs show this ,
> >
> > 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker:
> Caught
> > exception: java.net.UnknownHostException: unknown host: rip-pc.local
> > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> > at org.apache.hadoop.ipc.Client.getConnection(Client.java:853)
> > at org.apache.hadoop.ipc.Client.call(Client.java:723)
> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> > at $Proxy5.getProtocolVersion(Unknown Source)
> > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> > at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> > ^C at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >
> >
> > May be some error in configs ?? I am using IPs in the conf files ..should
> I
> > put entries in /etc/hosts files?
> >
> > On Tue, Sep 6, 2011 at 3:00 PM, john smith <js...@gmail.com>
> wrote:
> >
> >> Hi Harsh,
> >>
> >> My jt log : http://pastebin.com/rXAEeDkC
> >>
> >> I have some startup exceptions (which doesn't matter much I guess) but
> the
> >> tail indicates that its locating the splits correctly and then it hangs
> !
> >>
> >> Any idea?
> >>
> >> Thanks
> >>
> >>
> >> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >>> I'd check the tail of JobTracker logs after a submit is done to see if
> >>> an error/warn there is causing this. And then dig further on
> >>> why/what/how.
> >>>
> >>> Hard to tell what your problem specifically is without logs :)
> >>>
> >>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com>
> >>> wrote:
> >>> > Hi Folks,
> >>> >
> >>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test
> >>> data
> >>> > with replication factor 3 (around 400MB data). However when I run
> >>> wordcount
> >>> > example , it hangs at map 0%.
> >>> >
> >>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount
> /test_data
> >>> > /out2
> >>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
> >>> process :
> >>> > 2
> >>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
> >>> job_201109061248_0002
> >>> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
> >>> >
> >>> > TTs and DNs are running fine on my slaves . I see them running when I
> >>> run
> >>> > jps command.
> >>> >
> >>> >
> >>> > Can any one help me out on this? Any idea why this would happen? I am
> >>> > totally clueless as nothing shows up in logs too.!
> >>> >
> >>> > Thanks,
> >>> > jS
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Map phase hanging for wordcount example

Posted by Harsh J <ha...@cloudera.com>.

John,

Yes, looks like your slave nodes aren't able to properly resolve some
hostnames. Hadoop requires a sane network setup to work properly.
Also, yes, you need to use a hostname for your fs.default.name and
other configs to the extent possible.

The easiest way is to keep a properly synchronized /etc/hosts file.

For example, it may look like so, on all machines:

127.0.0.1 localhost.localdomain localhost
192.168.0.1 master.hadoop master
192.168.0.2 slave3.hadoop slave3
(and so on…)

(This way master can resolve slaves, and slaves can resolve master. If
you have the time, setup a DNS, its the best thing to do.)

Then, in core-site.xml you'll need:

fs.default.name = hdfs://master

And in mapred-site.xml:

mapred.job.tracker = master:8021

That should do it, so long as the slave hosts can freely access the
master hosts (no blockage of ports via firewall and such).

On Tue, Sep 6, 2011 at 3:05 PM, john smith <js...@gmail.com> wrote:
> Hey My TT logs show this ,
>
> 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.net.UnknownHostException: unknown host: rip-pc.local
> at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:853)
> at org.apache.hadoop.ipc.Client.call(Client.java:723)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy5.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> ^C at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>
>
> May be some error in configs ?? I am using IPs in the conf files ..should I
> put entries in /etc/hosts files?
>
> On Tue, Sep 6, 2011 at 3:00 PM, john smith <js...@gmail.com> wrote:
>
>> Hi Harsh,
>>
>> My jt log : http://pastebin.com/rXAEeDkC
>>
>> I have some startup exceptions (which doesn't matter much I guess) but the
>> tail indicates that its locating the splits correctly and then it hangs !
>>
>> Any idea?
>>
>> Thanks
>>
>>
>> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> I'd check the tail of JobTracker logs after a submit is done to see if
>>> an error/warn there is causing this. And then dig further on
>>> why/what/how.
>>>
>>> Hard to tell what your problem specifically is without logs :)
>>>
>>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com>
>>> wrote:
>>> > Hi Folks,
>>> >
>>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test
>>> data
>>> > with replication factor 3 (around 400MB data). However when I run
>>> wordcount
>>> > example , it hangs at map 0%.
>>> >
>>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data
>>> > /out2
>>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
>>> process :
>>> > 2
>>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
>>> job_201109061248_0002
>>> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
>>> >
>>> > TTs and DNs are running fine on my slaves . I see them running when I
>>> run
>>> > jps command.
>>> >
>>> >
>>> > Can any one help me out on this? Any idea why this would happen? I am
>>> > totally clueless as nothing shows up in logs too.!
>>> >
>>> > Thanks,
>>> > jS
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>



-- 
Harsh J

Re: Map phase hanging for wordcount example

Posted by john smith <js...@gmail.com>.

Hey My TT logs show this ,

2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.net.UnknownHostException: unknown host: rip-pc.local
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:853)
at org.apache.hadoop.ipc.Client.call(Client.java:723)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
^C at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)


May be some error in configs ?? I am using IPs in the conf files ..should I
put entries in /etc/hosts files?

On Tue, Sep 6, 2011 at 3:00 PM, john smith <js...@gmail.com> wrote:

> Hi Harsh,
>
> My jt log : http://pastebin.com/rXAEeDkC
>
> I have some startup exceptions (which doesn't matter much I guess) but the
> tail indicates that its locating the splits correctly and then it hangs !
>
> Any idea?
>
> Thanks
>
>
> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> I'd check the tail of JobTracker logs after a submit is done to see if
>> an error/warn there is causing this. And then dig further on
>> why/what/how.
>>
>> Hard to tell what your problem specifically is without logs :)
>>
>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com>
>> wrote:
>> > Hi Folks,
>> >
>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test
>> data
>> > with replication factor 3 (around 400MB data). However when I run
>> wordcount
>> > example , it hangs at map 0%.
>> >
>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data
>> > /out2
>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
>> process :
>> > 2
>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
>> job_201109061248_0002
>> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
>> >
>> > TTs and DNs are running fine on my slaves . I see them running when I
>> run
>> > jps command.
>> >
>> >
>> > Can any one help me out on this? Any idea why this would happen? I am
>> > totally clueless as nothing shows up in logs too.!
>> >
>> > Thanks,
>> > jS
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Map phase hanging for wordcount example

Posted by john smith <js...@gmail.com>.

Hi Harsh,

My jt log : http://pastebin.com/rXAEeDkC

I have some startup exceptions (which doesn't matter much I guess) but the
tail indicates that its locating the splits correctly and then it hangs !

Any idea?

Thanks


On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <ha...@cloudera.com> wrote:

> I'd check the tail of JobTracker logs after a submit is done to see if
> an error/warn there is causing this. And then dig further on
> why/what/how.
>
> Hard to tell what your problem specifically is without logs :)
>
> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com> wrote:
> > Hi Folks,
> >
> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test data
> > with replication factor 3 (around 400MB data). However when I run
> wordcount
> > example , it hangs at map 0%.
> >
> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data
> > /out2
> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
> process :
> > 2
> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
> job_201109061248_0002
> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > TTs and DNs are running fine on my slaves . I see them running when I run
> > jps command.
> >
> >
> > Can any one help me out on this? Any idea why this would happen? I am
> > totally clueless as nothing shows up in logs too.!
> >
> > Thanks,
> > jS
> >
>
>
>
> --
> Harsh J
>

Re: Map phase hanging for wordcount example

Posted by Harsh J <ha...@cloudera.com>.

I'd check the tail of JobTracker logs after a submit is done to see if
an error/warn there is causing this. And then dig further on
why/what/how.

Hard to tell what your problem specifically is without logs :)

On Tue, Sep 6, 2011 at 1:18 PM, john smith <js...@gmail.com> wrote:
> Hi Folks,
>
> I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test data
> with replication factor 3 (around 400MB data). However when I run wordcount
> example , it hangs at map 0%.
>
> bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data
> /out2
> 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to process :
> 2
> 11/09/06 13:07:28 INFO mapred.JobClient: Running job: job_201109061248_0002
> 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
>
> TTs and DNs are running fine on my slaves . I see them running when I run
> jps command.
>
>
> Can any one help me out on this? Any idea why this would happen? I am
> totally clueless as nothing shows up in logs too.!
>
> Thanks,
> jS
>



-- 
Harsh J