You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mithila Nagendra <mn...@asu.edu> on 2009/04/12 17:39:42 UTC

Map-Reduce Slow Down

Hey all
I recently setup a three node hadoop cluster and ran an examples on it. It
was pretty fast, and all the three nodes were being used (I checked the log
files to make sure that the slaves are utilized).

Now I ve setup another cluster consisting of 15 nodes. I ran the same
example, but instead of speeding up, the map-reduce task seems to take
forever! The slaves are not being used for some reason. This second cluster
has a lower, per node processing power, but should that make any difference?
How can I ensure that the data is being mapped to all the nodes? Presently,
the only node that seems to be doing all the work is the Master node.

Does 15 nodes in a cluster increase the network cost? What can I do to setup
the cluster to function more efficiently?

Thanks!
Mithila Nagendra
Arizona State University

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Hey Jason
The problem s fixed! :) My network admin had messed something up! Now it
works! Thanks for your help!

Mithila

On Thu, Apr 16, 2009 at 11:58 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Thanks Jason! This helps a lot. I m planning to talk to my network admin
> tomorrow. I hoping he ll be able to fix this problem.
> Mithila
>
>
> On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop <ja...@gmail.com>wrote:
>
>> Assuming you are on a linux box, on both machines
>> verify that the servers are listening on the ports you expect via
>> netstat -a -n -t -p
>> -a show sockets accepting connections
>> -n do not translate ip addresses to host names
>> -t only list tcp sockets
>> -p list the pid/process name
>>
>> on the machine 192.168.0.18
>> you should have sockets bound to 0.0.0.0:54310 with a process of java,
>> and
>> the pid should be the pid of your namenode process.
>>
>> On the remote machine you should be able to *telnet 192.168.0.18 54310*
>> and
>> have it connect
>> *Connected to 192.168.0.18.
>> Escape character is '^]'.
>> *
>>
>> If the netstat shows the socket accepting and the telnet does not connect,
>> then something is blocking the TCP packets between the machines. one or
>> both
>> machines has a firewall, an intervening router has a firewall, or there is
>> some routing problem
>> the command /sbin/iptables -L will normally list the firewall rules, if
>> any
>> for a linux machine.
>>
>>
>> You should be able to use telnet to verify that you can connect from the
>> remote machine.
>>
>> On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > Thanks! I ll see what I can find out.
>> >
>> > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <jason.hadoop@gmail.com
>> > >wrote:
>> >
>> > > The firewall was run at system startup, I think there was a
>> > > /etc/sysconfig/iptables file present which triggered the firewall.
>> > > I don't currently have access to any centos 5 machines so I can't
>> easily
>> > > check.
>> > >
>> > >
>> > >
>> > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <jason.hadoop@gmail.com
>> > > >wrote:
>> > >
>> > > > The kickstart script was something that the operations staff was
>> using
>> > to
>> > > > initialize new machines, I never actually saw the script, just
>> figured
>> > > out
>> > > > that there was a firewall in place.
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mnagendr@asu.edu
>> > > >wrote:
>> > > >
>> > > >> Jason: the kickstart script - was it something you wrote or is it
>> run
>> > > when
>> > > >> the system turns on?
>> > > >> Mithila
>> > > >>
>> > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <
>> mnagendr@asu.edu>
>> > > >> wrote:
>> > > >>
>> > > >> > Thanks Jason! Will check that out.
>> > > >> > Mithila
>> > > >> >
>> > > >> >
>> > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
>> > jason.hadoop@gmail.com
>> > > >> >wrote:
>> > > >> >
>> > > >> >> Double check that there is no firewall in place.
>> > > >> >> At one point a bunch of new machines were kickstarted and placed
>> in
>> > a
>> > > >> >> cluster and they all failed with something similar.
>> > > >> >> It turned out the kickstart script turned enabled the firewall
>> with
>> > a
>> > > >> rule
>> > > >> >> that blocked ports in the 50k range.
>> > > >> >> It took us a while to even think to check that was not a part of
>> > our
>> > > >> >> normal
>> > > >> >> machine configuration
>> > > >> >>
>> > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
>> > mnagendr@asu.edu
>> > > >
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >> > Hi Aaron
>> > > >> >> > I will look into that thanks!
>> > > >> >> >
>> > > >> >> > I spoke to the admin who overlooks the cluster. He said that
>> the
>> > > >> gateway
>> > > >> >> > comes in to the picture only when one of the nodes
>> communicates
>> > > with
>> > > >> a
>> > > >> >> node
>> > > >> >> > outside of the cluster. But in my case the communication is
>> > carried
>> > > >> out
>> > > >> >> > between the nodes which all belong to the same cluster.
>> > > >> >> >
>> > > >> >> > Mithila
>> > > >> >> >
>> > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
>> > aaron@cloudera.com
>> > > >
>> > > >> >> wrote:
>> > > >> >> >
>> > > >> >> > > Hi,
>> > > >> >> > >
>> > > >> >> > > I wrote a blog post a while back about connecting nodes via
>> a
>> > > >> gateway.
>> > > >> >> > See
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >>
>> > >
>> >
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> > > >> >> > >
>> > > >> >> > > This assumes that the client is outside the gateway and all
>> > > >> >> > > datanodes/namenode are inside, but the same principles
>> apply.
>> > > >> You'll
>> > > >> >> just
>> > > >> >> > > need to set up ssh tunnels from every datanode to the
>> namenode.
>> > > >> >> > >
>> > > >> >> > > - Aaron
>> > > >> >> > >
>> > > >> >> > >
>> > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> > > >> >> rphulari@yahoo-inc.com
>> > > >> >> > >wrote:
>> > > >> >> > >
>> > > >> >> > >> Looks like your NameNode is down .
>> > > >> >> > >> Verify if hadoop process are running (   jps should show
>> you
>> > all
>> > > >> java
>> > > >> >> > >> running process).
>> > > >> >> > >> If your hadoop process are running try restarting your
>> hadoop
>> > > >> process
>> > > >> >> .
>> > > >> >> > >> I guess this problem is due to your fsimage not being
>> correct
>> > .
>> > > >> >> > >> You might have to format your namenode.
>> > > >> >> > >> Hope this helps.
>> > > >> >> > >>
>> > > >> >> > >> Thanks,
>> > > >> >> > >> --
>> > > >> >> > >> Ravi
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu>
>> > > wrote:
>> > > >> >> > >>
>> > > >> >> > >> The log file runs into thousands of line with the same
>> message
>> > > >> being
>> > > >> >> > >> displayed every time.
>> > > >> >> > >>
>> > > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
>> > > >> mnagendr@asu.edu>
>> > > >> >> > >> wrote:
>> > > >> >> > >>
>> > > >> >> > >> > The log file :
>> hadoop-mithila-datanode-node19.log.2009-04-14
>> > > has
>> > > >> >> the
>> > > >> >> > >> > following in it:
>> > > >> >> > >> >
>> > > >> >> > >> > 2009-04-14 10:08:11,499 INFO
>> org.apache.hadoop.dfs.DataNode:
>> > > >> >> > >> STARTUP_MSG:
>> > > >> >> > >> >
>> > /************************************************************
>> > > >> >> > >> > STARTUP_MSG: Starting DataNode
>> > > >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> > > >> >> > >> > STARTUP_MSG:   args = []
>> > > >> >> > >> > STARTUP_MSG:   version = 0.18.3
>> > > >> >> > >> > STARTUP_MSG:   build =
>> > > >> >> > >> >
>> > > >>
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> > > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC
>> 2009
>> > > >> >> > >> >
>> > ************************************************************/
>> > > >> >> > >> > 2009-04-14 10:08:12,915 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:13,925 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:14,935 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:15,945 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:16,955 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:17,965 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:18,975 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:19,985 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:20,995 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:22,005 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC:
>> > Server
>> > > >> at
>> > > >> >> > >> node18/
>> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > > >> >> > >> > 2009-04-14 10:08:24,025 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:25,035 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:26,045 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:27,055 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:28,065 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:29,075 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:30,085 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:31,095 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:32,105 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:33,115 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC:
>> > Server
>> > > >> at
>> > > >> >> > >> node18/
>> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > > >> >> > >> > 2009-04-14 10:08:35,135 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:36,145 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:37,155 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> >
>> > > >> >> > >> >
>> > > >> >> > >> > Hmmm I still cant figure it out..
>> > > >> >> > >> >
>> > > >> >> > >> > Mithila
>> > > >> >> > >> >
>> > > >> >> > >> >
>> > > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
>> > > >> >> mnagendr@asu.edu
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >
>> > > >> >> > >> >> Also, Would the way the port is accessed change if all
>> > these
>> > > >> node
>> > > >> >> are
>> > > >> >> > >> >> connected through a gateway? I mean in the
>> hadoop-site.xml
>> > > >> file?
>> > > >> >> The
>> > > >> >> > >> Ubuntu
>> > > >> >> > >> >> systems we worked with earlier didnt have a gateway.
>> > > >> >> > >> >> Mithila
>> > > >> >> > >> >>
>> > > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
>> > > >> >> mnagendr@asu.edu
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >>
>> > > >> >> > >> >>> Aaron: Which log file do I look into - there are alot
>> of
>> > > them.
>> > > >> >> Here
>> > > >> >> > s
>> > > >> >> > >> >>> what the error looks like:
>> > > >> >> > >> >>> [mithila@node19:~]$ cd hadoop
>> > > >> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>> > > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> > > >> >> > >> >>> Bad connection to FS. command aborted.
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> Node19 is a slave and Node18 is the master.
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> Mithila
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
>> > > >> >> aaron@cloudera.com
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>> Are there any error messages in the log files on those
>> > > nodes?
>> > > >> >> > >> >>>> - Aaron
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
>> > > >> >> > mnagendr@asu.edu>
>> > > >> >> > >> >>>> wrote:
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s
>> wrong
>> > > with
>> > > >> >> the
>> > > >> >> > >> ports.
>> > > >> >> > >> >>>> I
>> > > >> >> > >> >>>> > can
>> > > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from
>> the
>> > > >> slaves
>> > > >> >> -
>> > > >> >> > >> says
>> > > >> >> > >> >>>> "Bad
>> > > >> >> > >> >>>> > connection to DFS". Master seems to be fine.
>> > > >> >> > >> >>>> > Mithila
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> > > >> >> > >> mnagendr@asu.edu>
>> > > >> >> > >> >>>> > wrote:
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>> > > Yes I can..
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> > > >> >> > >> jim.twensky@gmail.com
>> > > >> >> > >> >>>> > >wrote:
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >> Can you ssh between the nodes?
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> -jim
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra
>> <
>> > > >> >> > >> >>>> mnagendr@asu.edu>
>> > > >> >> > >> >>>> > >> wrote:
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> > Thanks Aaron.
>> > > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu
>> running
>> > > on
>> > > >> >> them
>> > > >> >> > and
>> > > >> >> > >> >>>> the dfs
>> > > >> >> > >> >>>> > >> was
>> > > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I
>> ve
>> > > >> setup
>> > > >> >> has
>> > > >> >> > >> Red
>> > > >> >> > >> >>>> Hat
>> > > >> >> > >> >>>> > >> Linux
>> > > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I
>> try
>> > to
>> > > >> >> access
>> > > >> >> > >> the
>> > > >> >> > >> >>>> dfs
>> > > >> >> > >> >>>> > from
>> > > >> >> > >> >>>> > >> > one
>> > > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
>> > > cannot
>> > > >> be
>> > > >> >> > >> >>>> accessed.
>> > > >> >> > >> >>>> > When
>> > > >> >> > >> >>>> > >> I
>> > > >> >> > >> >>>> > >> > access the DFS throught the master there s no
>> > > problem.
>> > > >> So
>> > > >> >> I
>> > > >> >> > >> feel
>> > > >> >> > >> >>>> there
>> > > >> >> > >> >>>> > a
>> > > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check
>> the
>> > > list
>> > > >> of
>> > > >> >> > >> slaves,
>> > > >> >> > >> >>>> it
>> > > >> >> > >> >>>> > >> looks
>> > > >> >> > >> >>>> > >> > fine to me.
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > Mithila
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> > > >> >> > >> >>>> jim.twensky@gmail.com>
>> > > >> >> > >> >>>> > >> > wrote:
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > > Mithila,
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > You said all the slaves were being utilized
>> in
>> > the
>> > > 3
>> > > >> >> node
>> > > >> >> > >> >>>> cluster.
>> > > >> >> > >> >>>> > >> Which
>> > > >> >> > >> >>>> > >> > > application did you run to test that and what
>> > was
>> > > >> your
>> > > >> >> > input
>> > > >> >> > >> >>>> size?
>> > > >> >> > >> >>>> > If
>> > > >> >> > >> >>>> > >> you
>> > > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB
>> > input
>> > > >> file
>> > > >> >> on
>> > > >> >> > >> both
>> > > >> >> > >> >>>> > >> cluster
>> > > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15
>> node
>> > > >> cluster
>> > > >> >> may
>> > > >> >> > >> not
>> > > >> >> > >> >>>> be
>> > > >> >> > >> >>>> > >> running
>> > > >> >> > >> >>>> > >> > > at
>> > > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to
>> each
>> > > >> input
>> > > >> >> > split
>> > > >> >> > >> and
>> > > >> >> > >> >>>> if
>> > > >> >> > >> >>>> > you
>> > > >> >> > >> >>>> > >> > are
>> > > >> >> > >> >>>> > >> > > running your cluster with the defaults, the
>> > splits
>> > > >> are
>> > > >> >> 64
>> > > >> >> > MB
>> > > >> >> > >> >>>> each. I
>> > > >> >> > >> >>>> > >> got
>> > > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to
>> do
>> > > all
>> > > >> >> the
>> > > >> >> > >> work.
>> > > >> >> > >> >>>> Can
>> > > >> >> > >> >>>> > you
>> > > >> >> > >> >>>> > >> > > check
>> > > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names
>> of
>> > all
>> > > >> task
>> > > >> >> > >> >>>> trackers
>> > > >> >> > >> >>>> > >> there? I
>> > > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a
>> > larger
>> > > >> >> input
>> > > >> >> > >> size,
>> > > >> >> > >> >>>> say
>> > > >> >> > >> >>>> > at
>> > > >> >> > >> >>>> > >> > least
>> > > >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > Jim
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron
>> Kimball <
>> > > >> >> > >> >>>> aaron@cloudera.com>
>> > > >> >> > >> >>>> > >> > wrote:
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use
>> "randomwriter"
>> > to
>> > > >> >> generate
>> > > >> >> > >> the
>> > > >> >> > >> >>>> data
>> > > >> >> > >> >>>> > >> and
>> > > >> >> > >> >>>> > >> > > > "sort"
>> > > >> >> > >> >>>> > >> > > > to sort it.
>> > > >> >> > >> >>>> > >> > > > - Aaron
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil
>> Doshi
>> > <
>> > > >> >> > >> >>>> > forpankil@gmail.com>
>> > > >> >> > >> >>>> > >> > > wrote:
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15
>> > clusters
>> > > >> ..So
>> > > >> >> it
>> > > >> >> > >> >>>> might be
>> > > >> >> > >> >>>> > >> > > overhead
>> > > >> >> > >> >>>> > >> > > > > time of these clusters making your total
>> MR
>> > > jobs
>> > > >> >> more
>> > > >> >> > >> time
>> > > >> >> > >> >>>> > >> consuming.
>> > > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger
>> set
>> > > of
>> > > >> >> data..
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > > > Pankil
>> > > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
>> > > >> Nagendra <
>> > > >> >> > >> >>>> > >> mnagendr@asu.edu>
>> > > >> >> > >> >>>> > >> > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > > > > Aaron
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is
>> just
>> > > 516MB
>> > > >> -
>> > > >> >> > >> wouldn't
>> > > >> >> > >> >>>> this
>> > > >> >> > >> >>>> > >> see
>> > > >> >> > >> >>>> > >> > a
>> > > >> >> > >> >>>> > >> > > > bit
>> > > >> >> > >> >>>> > >> > > > > of
>> > > >> >> > >> >>>> > >> > > > > > speed up?
>> > > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll
>> > run
>> > > my
>> > > >> >> > cluster
>> > > >> >> > >> on
>> > > >> >> > >> >>>> it
>> > > >> >> > >> >>>> > and
>> > > >> >> > >> >>>> > >> > see
>> > > >> >> > >> >>>> > >> > > > what
>> > > >> >> > >> >>>> > >> > > > > I
>> > > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java
>> > timer
>> > > >> >> running
>> > > >> >> > to
>> > > >> >> > >> >>>> record
>> > > >> >> > >> >>>> > >> the
>> > > >> >> > >> >>>> > >> > > time
>> > > >> >> > >> >>>> > >> > > > > > taken
>> > > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have
>> an
>> > > >> >> inbuilt
>> > > >> >> > >> timer?
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > Mithila
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
>> > > Kimball
>> > > >> <
>> > > >> >> > >> >>>> > >> aaron@cloudera.com
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that
>> ship
>> > > >> with
>> > > >> >> > Hadoop
>> > > >> >> > >> >>>> are
>> > > >> >> > >> >>>> > >> designed
>> > > >> >> > >> >>>> > >> > > to
>> > > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup
>> > comes
>> > > >> from
>> > > >> >> > its
>> > > >> >> > >> >>>> ability
>> > > >> >> > >> >>>> > to
>> > > >> >> > >> >>>> > >> > > > process
>> > > >> >> > >> >>>> > >> > > > > > very
>> > > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting
>> around,
>> > > say,
>> > > >> >> tens
>> > > >> >> > of
>> > > >> >> > >> GB
>> > > >> >> > >> >>>> per
>> > > >> >> > >> >>>> > >> job,
>> > > >> >> > >> >>>> > >> > > and
>> > > >> >> > >> >>>> > >> > > > > > going
>> > > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from
>> there).
>> > So
>> > > if
>> > > >> >> you
>> > > >> >> > are
>> > > >> >> > >> >>>> timing
>> > > >> >> > >> >>>> > >> the
>> > > >> >> > >> >>>> > >> > pi
>> > > >> >> > >> >>>> > >> > > > > > > calculator (or something like that),
>> its
>> > > >> >> results
>> > > >> >> > >> won't
>> > > >> >> > >> >>>> > >> > necessarily
>> > > >> >> > >> >>>> > >> > > be
>> > > >> >> > >> >>>> > >> > > > > > very
>> > > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have
>> enough
>> > > >> >> fragments
>> > > >> >> > >> of
>> > > >> >> > >> >>>> data
>> > > >> >> > >> >>>> > to
>> > > >> >> > >> >>>> > >> > > > allocate
>> > > >> >> > >> >>>> > >> > > > > > one
>> > > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will
>> > also
>> > > >> just
>> > > >> >> go
>> > > >> >> > >> >>>> unused.
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to
>> > use
>> > > >> >> > >> randomwriter
>> > > >> >> > >> >>>> to
>> > > >> >> > >> >>>> > fill
>> > > >> >> > >> >>>> > >> up
>> > > >> >> > >> >>>> > >> > > > your
>> > > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random
>> data
>> > and
>> > > >> then
>> > > >> >> > run
>> > > >> >> > >> the
>> > > >> >> > >> >>>> sort
>> > > >> >> > >> >>>> > >> > > program.
>> > > >> >> > >> >>>> > >> > > > > If
>> > > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance
>> from 3
>> > > >> nodes
>> > > >> >> to
>> > > >> >> > >> 15,
>> > > >> >> > >> >>>> then
>> > > >> >> > >> >>>> > >> you've
>> > > >> >> > >> >>>> > >> > > > > > > definitely
>> > > >> >> > >> >>>> > >> > > > > > > got something strange going on.
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > - Aaron
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM,
>> Mithila
>> > > >> >> Nagendra
>> > > >> >> > <
>> > > >> >> > >> >>>> > >> > > mnagendr@asu.edu>
>> > > >> >> > >> >>>> > >> > > > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Hey all
>> > > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node
>> hadoop
>> > > >> cluster
>> > > >> >> and
>> > > >> >> > >> ran
>> > > >> >> > >> >>>> an
>> > > >> >> > >> >>>> > >> > examples
>> > > >> >> > >> >>>> > >> > > on
>> > > >> >> > >> >>>> > >> > > > > it.
>> > > >> >> > >> >>>> > >> > > > > > > It
>> > > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three
>> > nodes
>> > > >> were
>> > > >> >> > >> being
>> > > >> >> > >> >>>> used
>> > > >> >> > >> >>>> > (I
>> > > >> >> > >> >>>> > >> > > checked
>> > > >> >> > >> >>>> > >> > > > > the
>> > > >> >> > >> >>>> > >> > > > > > > log
>> > > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves
>> are
>> > > >> >> > utilized).
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
>> > > consisting
>> > > >> of
>> > > >> >> 15
>> > > >> >> > >> >>>> nodes. I
>> > > >> >> > >> >>>> > ran
>> > > >> >> > >> >>>> > >> > the
>> > > >> >> > >> >>>> > >> > > > same
>> > > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding
>> up,
>> > the
>> > > >> >> > >> map-reduce
>> > > >> >> > >> >>>> task
>> > > >> >> > >> >>>> > >> seems
>> > > >> >> > >> >>>> > >> > to
>> > > >> >> > >> >>>> > >> > > > > take
>> > > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being
>> used
>> > > for
>> > > >> >> some
>> > > >> >> > >> >>>> reason.
>> > > >> >> > >> >>>> > This
>> > > >> >> > >> >>>> > >> > > second
>> > > >> >> > >> >>>> > >> > > > > > > cluster
>> > > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing
>> > power,
>> > > >> but
>> > > >> >> > should
>> > > >> >> > >> >>>> that
>> > > >> >> > >> >>>> > make
>> > > >> >> > >> >>>> > >> > any
>> > > >> >> > >> >>>> > >> > > > > > > > difference?
>> > > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is
>> > being
>> > > >> >> mapped
>> > > >> >> > to
>> > > >> >> > >> all
>> > > >> >> > >> >>>> the
>> > > >> >> > >> >>>> > >> > nodes?
>> > > >> >> > >> >>>> > >> > > > > > > Presently,
>> > > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be
>> doing
>> > all
>> > > >> the
>> > > >> >> > work
>> > > >> >> > >> is
>> > > >> >> > >> >>>> the
>> > > >> >> > >> >>>> > >> Master
>> > > >> >> > >> >>>> > >> > > > node.
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase
>> > the
>> > > >> >> network
>> > > >> >> > >> cost?
>> > > >> >> > >> >>>> What
>> > > >> >> > >> >>>> > >> can
>> > > >> >> > >> >>>> > >> > I
>> > > >> >> > >> >>>> > >> > > do
>> > > >> >> > >> >>>> > >> > > > > to
>> > > >> >> > >> >>>> > >> > > > > > > > setup
>> > > >> >> > >> >>>> > >> > > > > > > > the cluster to function more
>> > > efficiently?
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Thanks!
>> > > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
>> > > >> >> > >> >>>> > >> > > > > > > > Arizona State University
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>
>> > > >> >> > >> >
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >> Ravi
>> > > >> >> > >> --
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Alpha Chapters of my book on Hadoop are available
>> > > >> >> http://www.apress.com/book/view/9781430219422
>> > > >> >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Alpha Chapters of my book on Hadoop are available
>> > > > http://www.apress.com/book/view/9781430219422
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Alpha Chapters of my book on Hadoop are available
>> > > http://www.apress.com/book/view/9781430219422
>> > >
>> >
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Thanks Jason! This helps a lot. I m planning to talk to my network admin
tomorrow. I hoping he ll be able to fix this problem.
Mithila

On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop <ja...@gmail.com>wrote:

> Assuming you are on a linux box, on both machines
> verify that the servers are listening on the ports you expect via
> netstat -a -n -t -p
> -a show sockets accepting connections
> -n do not translate ip addresses to host names
> -t only list tcp sockets
> -p list the pid/process name
>
> on the machine 192.168.0.18
> you should have sockets bound to 0.0.0.0:54310 with a process of java, and
> the pid should be the pid of your namenode process.
>
> On the remote machine you should be able to *telnet 192.168.0.18 54310* and
> have it connect
> *Connected to 192.168.0.18.
> Escape character is '^]'.
> *
>
> If the netstat shows the socket accepting and the telnet does not connect,
> then something is blocking the TCP packets between the machines. one or
> both
> machines has a firewall, an intervening router has a firewall, or there is
> some routing problem
> the command /sbin/iptables -L will normally list the firewall rules, if any
> for a linux machine.
>
>
> You should be able to use telnet to verify that you can connect from the
> remote machine.
>
> On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Thanks! I ll see what I can find out.
> >
> > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <jason.hadoop@gmail.com
> > >wrote:
> >
> > > The firewall was run at system startup, I think there was a
> > > /etc/sysconfig/iptables file present which triggered the firewall.
> > > I don't currently have access to any centos 5 machines so I can't
> easily
> > > check.
> > >
> > >
> > >
> > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <jason.hadoop@gmail.com
> > > >wrote:
> > >
> > > > The kickstart script was something that the operations staff was
> using
> > to
> > > > initialize new machines, I never actually saw the script, just
> figured
> > > out
> > > > that there was a firewall in place.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mnagendr@asu.edu
> > > >wrote:
> > > >
> > > >> Jason: the kickstart script - was it something you wrote or is it
> run
> > > when
> > > >> the system turns on?
> > > >> Mithila
> > > >>
> > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mnagendr@asu.edu
> >
> > > >> wrote:
> > > >>
> > > >> > Thanks Jason! Will check that out.
> > > >> > Mithila
> > > >> >
> > > >> >
> > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
> > jason.hadoop@gmail.com
> > > >> >wrote:
> > > >> >
> > > >> >> Double check that there is no firewall in place.
> > > >> >> At one point a bunch of new machines were kickstarted and placed
> in
> > a
> > > >> >> cluster and they all failed with something similar.
> > > >> >> It turned out the kickstart script turned enabled the firewall
> with
> > a
> > > >> rule
> > > >> >> that blocked ports in the 50k range.
> > > >> >> It took us a while to even think to check that was not a part of
> > our
> > > >> >> normal
> > > >> >> machine configuration
> > > >> >>
> > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
> > mnagendr@asu.edu
> > > >
> > > >> >> wrote:
> > > >> >>
> > > >> >> > Hi Aaron
> > > >> >> > I will look into that thanks!
> > > >> >> >
> > > >> >> > I spoke to the admin who overlooks the cluster. He said that
> the
> > > >> gateway
> > > >> >> > comes in to the picture only when one of the nodes communicates
> > > with
> > > >> a
> > > >> >> node
> > > >> >> > outside of the cluster. But in my case the communication is
> > carried
> > > >> out
> > > >> >> > between the nodes which all belong to the same cluster.
> > > >> >> >
> > > >> >> > Mithila
> > > >> >> >
> > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
> > aaron@cloudera.com
> > > >
> > > >> >> wrote:
> > > >> >> >
> > > >> >> > > Hi,
> > > >> >> > >
> > > >> >> > > I wrote a blog post a while back about connecting nodes via a
> > > >> gateway.
> > > >> >> > See
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > > >> >> > >
> > > >> >> > > This assumes that the client is outside the gateway and all
> > > >> >> > > datanodes/namenode are inside, but the same principles apply.
> > > >> You'll
> > > >> >> just
> > > >> >> > > need to set up ssh tunnels from every datanode to the
> namenode.
> > > >> >> > >
> > > >> >> > > - Aaron
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> > > >> >> rphulari@yahoo-inc.com
> > > >> >> > >wrote:
> > > >> >> > >
> > > >> >> > >> Looks like your NameNode is down .
> > > >> >> > >> Verify if hadoop process are running (   jps should show you
> > all
> > > >> java
> > > >> >> > >> running process).
> > > >> >> > >> If your hadoop process are running try restarting your
> hadoop
> > > >> process
> > > >> >> .
> > > >> >> > >> I guess this problem is due to your fsimage not being
> correct
> > .
> > > >> >> > >> You might have to format your namenode.
> > > >> >> > >> Hope this helps.
> > > >> >> > >>
> > > >> >> > >> Thanks,
> > > >> >> > >> --
> > > >> >> > >> Ravi
> > > >> >> > >>
> > > >> >> > >>
> > > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu>
> > > wrote:
> > > >> >> > >>
> > > >> >> > >> The log file runs into thousands of line with the same
> message
> > > >> being
> > > >> >> > >> displayed every time.
> > > >> >> > >>
> > > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> > > >> mnagendr@asu.edu>
> > > >> >> > >> wrote:
> > > >> >> > >>
> > > >> >> > >> > The log file :
> hadoop-mithila-datanode-node19.log.2009-04-14
> > > has
> > > >> >> the
> > > >> >> > >> > following in it:
> > > >> >> > >> >
> > > >> >> > >> > 2009-04-14 10:08:11,499 INFO
> org.apache.hadoop.dfs.DataNode:
> > > >> >> > >> STARTUP_MSG:
> > > >> >> > >> >
> > /************************************************************
> > > >> >> > >> > STARTUP_MSG: Starting DataNode
> > > >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> > > >> >> > >> > STARTUP_MSG:   args = []
> > > >> >> > >> > STARTUP_MSG:   version = 0.18.3
> > > >> >> > >> > STARTUP_MSG:   build =
> > > >> >> > >> >
> > > >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> > > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC
> 2009
> > > >> >> > >> >
> > ************************************************************/
> > > >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC:
> > Server
> > > >> at
> > > >> >> > >> node18/
> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > > >> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC:
> > Server
> > > >> at
> > > >> >> > >> node18/
> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > > >> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > > time(s).
> > > >> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
> > > >> Retrying
> > > >> >> > >> connect
> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > > time(s).
> > > >> >> > >> >
> > > >> >> > >> >
> > > >> >> > >> > Hmmm I still cant figure it out..
> > > >> >> > >> >
> > > >> >> > >> > Mithila
> > > >> >> > >> >
> > > >> >> > >> >
> > > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> > > >> >> mnagendr@asu.edu
> > > >> >> > >> >wrote:
> > > >> >> > >> >
> > > >> >> > >> >> Also, Would the way the port is accessed change if all
> > these
> > > >> node
> > > >> >> are
> > > >> >> > >> >> connected through a gateway? I mean in the
> hadoop-site.xml
> > > >> file?
> > > >> >> The
> > > >> >> > >> Ubuntu
> > > >> >> > >> >> systems we worked with earlier didnt have a gateway.
> > > >> >> > >> >> Mithila
> > > >> >> > >> >>
> > > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> > > >> >> mnagendr@asu.edu
> > > >> >> > >> >wrote:
> > > >> >> > >> >>
> > > >> >> > >> >>> Aaron: Which log file do I look into - there are alot of
> > > them.
> > > >> >> Here
> > > >> >> > s
> > > >> >> > >> >>> what the error looks like:
> > > >> >> > >> >>> [mithila@node19:~]$ cd hadoop
> > > >> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> > > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> > > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
> > > server:
> > > >> >> > node18/
> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> > > >> >> > >> >>> Bad connection to FS. command aborted.
> > > >> >> > >> >>>
> > > >> >> > >> >>> Node19 is a slave and Node18 is the master.
> > > >> >> > >> >>>
> > > >> >> > >> >>> Mithila
> > > >> >> > >> >>>
> > > >> >> > >> >>>
> > > >> >> > >> >>>
> > > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> > > >> >> aaron@cloudera.com
> > > >> >> > >> >wrote:
> > > >> >> > >> >>>
> > > >> >> > >> >>>> Are there any error messages in the log files on those
> > > nodes?
> > > >> >> > >> >>>> - Aaron
> > > >> >> > >> >>>>
> > > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> > > >> >> > mnagendr@asu.edu>
> > > >> >> > >> >>>> wrote:
> > > >> >> > >> >>>>
> > > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s
> wrong
> > > with
> > > >> >> the
> > > >> >> > >> ports.
> > > >> >> > >> >>>> I
> > > >> >> > >> >>>> > can
> > > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from
> the
> > > >> slaves
> > > >> >> -
> > > >> >> > >> says
> > > >> >> > >> >>>> "Bad
> > > >> >> > >> >>>> > connection to DFS". Master seems to be fine.
> > > >> >> > >> >>>> > Mithila
> > > >> >> > >> >>>> >
> > > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> > > >> >> > >> mnagendr@asu.edu>
> > > >> >> > >> >>>> > wrote:
> > > >> >> > >> >>>> >
> > > >> >> > >> >>>> > > Yes I can..
> > > >> >> > >> >>>> > >
> > > >> >> > >> >>>> > >
> > > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> > > >> >> > >> jim.twensky@gmail.com
> > > >> >> > >> >>>> > >wrote:
> > > >> >> > >> >>>> > >
> > > >> >> > >> >>>> > >> Can you ssh between the nodes?
> > > >> >> > >> >>>> > >>
> > > >> >> > >> >>>> > >> -jim
> > > >> >> > >> >>>> > >>
> > > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra
> <
> > > >> >> > >> >>>> mnagendr@asu.edu>
> > > >> >> > >> >>>> > >> wrote:
> > > >> >> > >> >>>> > >>
> > > >> >> > >> >>>> > >> > Thanks Aaron.
> > > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu
> running
> > > on
> > > >> >> them
> > > >> >> > and
> > > >> >> > >> >>>> the dfs
> > > >> >> > >> >>>> > >> was
> > > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I
> ve
> > > >> setup
> > > >> >> has
> > > >> >> > >> Red
> > > >> >> > >> >>>> Hat
> > > >> >> > >> >>>> > >> Linux
> > > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I
> try
> > to
> > > >> >> access
> > > >> >> > >> the
> > > >> >> > >> >>>> dfs
> > > >> >> > >> >>>> > from
> > > >> >> > >> >>>> > >> > one
> > > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
> > > cannot
> > > >> be
> > > >> >> > >> >>>> accessed.
> > > >> >> > >> >>>> > When
> > > >> >> > >> >>>> > >> I
> > > >> >> > >> >>>> > >> > access the DFS throught the master there s no
> > > problem.
> > > >> So
> > > >> >> I
> > > >> >> > >> feel
> > > >> >> > >> >>>> there
> > > >> >> > >> >>>> > a
> > > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check
> the
> > > list
> > > >> of
> > > >> >> > >> slaves,
> > > >> >> > >> >>>> it
> > > >> >> > >> >>>> > >> looks
> > > >> >> > >> >>>> > >> > fine to me.
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> > Mithila
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> > > >> >> > >> >>>> jim.twensky@gmail.com>
> > > >> >> > >> >>>> > >> > wrote:
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >> > > Mithila,
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> > > You said all the slaves were being utilized in
> > the
> > > 3
> > > >> >> node
> > > >> >> > >> >>>> cluster.
> > > >> >> > >> >>>> > >> Which
> > > >> >> > >> >>>> > >> > > application did you run to test that and what
> > was
> > > >> your
> > > >> >> > input
> > > >> >> > >> >>>> size?
> > > >> >> > >> >>>> > If
> > > >> >> > >> >>>> > >> you
> > > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB
> > input
> > > >> file
> > > >> >> on
> > > >> >> > >> both
> > > >> >> > >> >>>> > >> cluster
> > > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
> > > >> cluster
> > > >> >> may
> > > >> >> > >> not
> > > >> >> > >> >>>> be
> > > >> >> > >> >>>> > >> running
> > > >> >> > >> >>>> > >> > > at
> > > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to
> each
> > > >> input
> > > >> >> > split
> > > >> >> > >> and
> > > >> >> > >> >>>> if
> > > >> >> > >> >>>> > you
> > > >> >> > >> >>>> > >> > are
> > > >> >> > >> >>>> > >> > > running your cluster with the defaults, the
> > splits
> > > >> are
> > > >> >> 64
> > > >> >> > MB
> > > >> >> > >> >>>> each. I
> > > >> >> > >> >>>> > >> got
> > > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to
> do
> > > all
> > > >> >> the
> > > >> >> > >> work.
> > > >> >> > >> >>>> Can
> > > >> >> > >> >>>> > you
> > > >> >> > >> >>>> > >> > > check
> > > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of
> > all
> > > >> task
> > > >> >> > >> >>>> trackers
> > > >> >> > >> >>>> > >> there? I
> > > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a
> > larger
> > > >> >> input
> > > >> >> > >> size,
> > > >> >> > >> >>>> say
> > > >> >> > >> >>>> > at
> > > >> >> > >> >>>> > >> > least
> > > >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> > > Jim
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball
> <
> > > >> >> > >> >>>> aaron@cloudera.com>
> > > >> >> > >> >>>> > >> > wrote:
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter"
> > to
> > > >> >> generate
> > > >> >> > >> the
> > > >> >> > >> >>>> data
> > > >> >> > >> >>>> > >> and
> > > >> >> > >> >>>> > >> > > > "sort"
> > > >> >> > >> >>>> > >> > > > to sort it.
> > > >> >> > >> >>>> > >> > > > - Aaron
> > > >> >> > >> >>>> > >> > > >
> > > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil
> Doshi
> > <
> > > >> >> > >> >>>> > forpankil@gmail.com>
> > > >> >> > >> >>>> > >> > > wrote:
> > > >> >> > >> >>>> > >> > > >
> > > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15
> > clusters
> > > >> ..So
> > > >> >> it
> > > >> >> > >> >>>> might be
> > > >> >> > >> >>>> > >> > > overhead
> > > >> >> > >> >>>> > >> > > > > time of these clusters making your total
> MR
> > > jobs
> > > >> >> more
> > > >> >> > >> time
> > > >> >> > >> >>>> > >> consuming.
> > > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger
> set
> > > of
> > > >> >> data..
> > > >> >> > >> >>>> > >> > > > >
> > > >> >> > >> >>>> > >> > > > > Pankil
> > > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
> > > >> Nagendra <
> > > >> >> > >> >>>> > >> mnagendr@asu.edu>
> > > >> >> > >> >>>> > >> > > > > wrote:
> > > >> >> > >> >>>> > >> > > > >
> > > >> >> > >> >>>> > >> > > > > > Aaron
> > > >> >> > >> >>>> > >> > > > > >
> > > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is just
> > > 516MB
> > > >> -
> > > >> >> > >> wouldn't
> > > >> >> > >> >>>> this
> > > >> >> > >> >>>> > >> see
> > > >> >> > >> >>>> > >> > a
> > > >> >> > >> >>>> > >> > > > bit
> > > >> >> > >> >>>> > >> > > > > of
> > > >> >> > >> >>>> > >> > > > > > speed up?
> > > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll
> > run
> > > my
> > > >> >> > cluster
> > > >> >> > >> on
> > > >> >> > >> >>>> it
> > > >> >> > >> >>>> > and
> > > >> >> > >> >>>> > >> > see
> > > >> >> > >> >>>> > >> > > > what
> > > >> >> > >> >>>> > >> > > > > I
> > > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java
> > timer
> > > >> >> running
> > > >> >> > to
> > > >> >> > >> >>>> record
> > > >> >> > >> >>>> > >> the
> > > >> >> > >> >>>> > >> > > time
> > > >> >> > >> >>>> > >> > > > > > taken
> > > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have
> an
> > > >> >> inbuilt
> > > >> >> > >> timer?
> > > >> >> > >> >>>> > >> > > > > >
> > > >> >> > >> >>>> > >> > > > > > Mithila
> > > >> >> > >> >>>> > >> > > > > >
> > > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
> > > Kimball
> > > >> <
> > > >> >> > >> >>>> > >> aaron@cloudera.com
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> > > > > wrote:
> > > >> >> > >> >>>> > >> > > > > >
> > > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that
> ship
> > > >> with
> > > >> >> > Hadoop
> > > >> >> > >> >>>> are
> > > >> >> > >> >>>> > >> designed
> > > >> >> > >> >>>> > >> > > to
> > > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup
> > comes
> > > >> from
> > > >> >> > its
> > > >> >> > >> >>>> ability
> > > >> >> > >> >>>> > to
> > > >> >> > >> >>>> > >> > > > process
> > > >> >> > >> >>>> > >> > > > > > very
> > > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting
> around,
> > > say,
> > > >> >> tens
> > > >> >> > of
> > > >> >> > >> GB
> > > >> >> > >> >>>> per
> > > >> >> > >> >>>> > >> job,
> > > >> >> > >> >>>> > >> > > and
> > > >> >> > >> >>>> > >> > > > > > going
> > > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there).
> > So
> > > if
> > > >> >> you
> > > >> >> > are
> > > >> >> > >> >>>> timing
> > > >> >> > >> >>>> > >> the
> > > >> >> > >> >>>> > >> > pi
> > > >> >> > >> >>>> > >> > > > > > > calculator (or something like that),
> its
> > > >> >> results
> > > >> >> > >> won't
> > > >> >> > >> >>>> > >> > necessarily
> > > >> >> > >> >>>> > >> > > be
> > > >> >> > >> >>>> > >> > > > > > very
> > > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have
> enough
> > > >> >> fragments
> > > >> >> > >> of
> > > >> >> > >> >>>> data
> > > >> >> > >> >>>> > to
> > > >> >> > >> >>>> > >> > > > allocate
> > > >> >> > >> >>>> > >> > > > > > one
> > > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will
> > also
> > > >> just
> > > >> >> go
> > > >> >> > >> >>>> unused.
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to
> > use
> > > >> >> > >> randomwriter
> > > >> >> > >> >>>> to
> > > >> >> > >> >>>> > fill
> > > >> >> > >> >>>> > >> up
> > > >> >> > >> >>>> > >> > > > your
> > > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random data
> > and
> > > >> then
> > > >> >> > run
> > > >> >> > >> the
> > > >> >> > >> >>>> sort
> > > >> >> > >> >>>> > >> > > program.
> > > >> >> > >> >>>> > >> > > > > If
> > > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from
> 3
> > > >> nodes
> > > >> >> to
> > > >> >> > >> 15,
> > > >> >> > >> >>>> then
> > > >> >> > >> >>>> > >> you've
> > > >> >> > >> >>>> > >> > > > > > > definitely
> > > >> >> > >> >>>> > >> > > > > > > got something strange going on.
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > - Aaron
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM,
> Mithila
> > > >> >> Nagendra
> > > >> >> > <
> > > >> >> > >> >>>> > >> > > mnagendr@asu.edu>
> > > >> >> > >> >>>> > >> > > > > > > wrote:
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > > Hey all
> > > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
> > > >> cluster
> > > >> >> and
> > > >> >> > >> ran
> > > >> >> > >> >>>> an
> > > >> >> > >> >>>> > >> > examples
> > > >> >> > >> >>>> > >> > > on
> > > >> >> > >> >>>> > >> > > > > it.
> > > >> >> > >> >>>> > >> > > > > > > It
> > > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three
> > nodes
> > > >> were
> > > >> >> > >> being
> > > >> >> > >> >>>> used
> > > >> >> > >> >>>> > (I
> > > >> >> > >> >>>> > >> > > checked
> > > >> >> > >> >>>> > >> > > > > the
> > > >> >> > >> >>>> > >> > > > > > > log
> > > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves
> are
> > > >> >> > utilized).
> > > >> >> > >> >>>> > >> > > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
> > > consisting
> > > >> of
> > > >> >> 15
> > > >> >> > >> >>>> nodes. I
> > > >> >> > >> >>>> > ran
> > > >> >> > >> >>>> > >> > the
> > > >> >> > >> >>>> > >> > > > same
> > > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up,
> > the
> > > >> >> > >> map-reduce
> > > >> >> > >> >>>> task
> > > >> >> > >> >>>> > >> seems
> > > >> >> > >> >>>> > >> > to
> > > >> >> > >> >>>> > >> > > > > take
> > > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being
> used
> > > for
> > > >> >> some
> > > >> >> > >> >>>> reason.
> > > >> >> > >> >>>> > This
> > > >> >> > >> >>>> > >> > > second
> > > >> >> > >> >>>> > >> > > > > > > cluster
> > > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing
> > power,
> > > >> but
> > > >> >> > should
> > > >> >> > >> >>>> that
> > > >> >> > >> >>>> > make
> > > >> >> > >> >>>> > >> > any
> > > >> >> > >> >>>> > >> > > > > > > > difference?
> > > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is
> > being
> > > >> >> mapped
> > > >> >> > to
> > > >> >> > >> all
> > > >> >> > >> >>>> the
> > > >> >> > >> >>>> > >> > nodes?
> > > >> >> > >> >>>> > >> > > > > > > Presently,
> > > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing
> > all
> > > >> the
> > > >> >> > work
> > > >> >> > >> is
> > > >> >> > >> >>>> the
> > > >> >> > >> >>>> > >> Master
> > > >> >> > >> >>>> > >> > > > node.
> > > >> >> > >> >>>> > >> > > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase
> > the
> > > >> >> network
> > > >> >> > >> cost?
> > > >> >> > >> >>>> What
> > > >> >> > >> >>>> > >> can
> > > >> >> > >> >>>> > >> > I
> > > >> >> > >> >>>> > >> > > do
> > > >> >> > >> >>>> > >> > > > > to
> > > >> >> > >> >>>> > >> > > > > > > > setup
> > > >> >> > >> >>>> > >> > > > > > > > the cluster to function more
> > > efficiently?
> > > >> >> > >> >>>> > >> > > > > > > >
> > > >> >> > >> >>>> > >> > > > > > > > Thanks!
> > > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
> > > >> >> > >> >>>> > >> > > > > > > > Arizona State University
> > > >> >> > >> >>>> > >> > > > > > > >
> > > >> >> > >> >>>> > >> > > > > > >
> > > >> >> > >> >>>> > >> > > > > >
> > > >> >> > >> >>>> > >> > > > >
> > > >> >> > >> >>>> > >> > > >
> > > >> >> > >> >>>> > >> > >
> > > >> >> > >> >>>> > >> >
> > > >> >> > >> >>>> > >>
> > > >> >> > >> >>>> > >
> > > >> >> > >> >>>> > >
> > > >> >> > >> >>>> >
> > > >> >> > >> >>>>
> > > >> >> > >> >>>
> > > >> >> > >> >>>
> > > >> >> > >> >>
> > > >> >> > >> >
> > > >> >> > >>
> > > >> >> > >>
> > > >> >> > >> Ravi
> > > >> >> > >> --
> > > >> >> > >>
> > > >> >> > >>
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Alpha Chapters of my book on Hadoop are available
> > > >> >> http://www.apress.com/book/view/9781430219422
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Alpha Chapters of my book on Hadoop are available
> > > > http://www.apress.com/book/view/9781430219422
> > > >
> > >
> > >
> > >
> > > --
> > > Alpha Chapters of my book on Hadoop are available
> > > http://www.apress.com/book/view/9781430219422
> > >
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Map-Reduce Slow Down

Posted by jason hadoop <ja...@gmail.com>.
Assuming you are on a linux box, on both machines
verify that the servers are listening on the ports you expect via
netstat -a -n -t -p
-a show sockets accepting connections
-n do not translate ip addresses to host names
-t only list tcp sockets
-p list the pid/process name

on the machine 192.168.0.18
you should have sockets bound to 0.0.0.0:54310 with a process of java, and
the pid should be the pid of your namenode process.

On the remote machine you should be able to *telnet 192.168.0.18 54310* and
have it connect
*Connected to 192.168.0.18.
Escape character is '^]'.
*

If the netstat shows the socket accepting and the telnet does not connect,
then something is blocking the TCP packets between the machines. one or both
machines has a firewall, an intervening router has a firewall, or there is
some routing problem
the command /sbin/iptables -L will normally list the firewall rules, if any
for a linux machine.


You should be able to use telnet to verify that you can connect from the
remote machine.

On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Thanks! I ll see what I can find out.
>
> On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <jason.hadoop@gmail.com
> >wrote:
>
> > The firewall was run at system startup, I think there was a
> > /etc/sysconfig/iptables file present which triggered the firewall.
> > I don't currently have access to any centos 5 machines so I can't easily
> > check.
> >
> >
> >
> > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <jason.hadoop@gmail.com
> > >wrote:
> >
> > > The kickstart script was something that the operations staff was using
> to
> > > initialize new machines, I never actually saw the script, just figured
> > out
> > > that there was a firewall in place.
> > >
> > >
> > >
> > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mnagendr@asu.edu
> > >wrote:
> > >
> > >> Jason: the kickstart script - was it something you wrote or is it run
> > when
> > >> the system turns on?
> > >> Mithila
> > >>
> > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mn...@asu.edu>
> > >> wrote:
> > >>
> > >> > Thanks Jason! Will check that out.
> > >> > Mithila
> > >> >
> > >> >
> > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
> jason.hadoop@gmail.com
> > >> >wrote:
> > >> >
> > >> >> Double check that there is no firewall in place.
> > >> >> At one point a bunch of new machines were kickstarted and placed in
> a
> > >> >> cluster and they all failed with something similar.
> > >> >> It turned out the kickstart script turned enabled the firewall with
> a
> > >> rule
> > >> >> that blocked ports in the 50k range.
> > >> >> It took us a while to even think to check that was not a part of
> our
> > >> >> normal
> > >> >> machine configuration
> > >> >>
> > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
> mnagendr@asu.edu
> > >
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Aaron
> > >> >> > I will look into that thanks!
> > >> >> >
> > >> >> > I spoke to the admin who overlooks the cluster. He said that the
> > >> gateway
> > >> >> > comes in to the picture only when one of the nodes communicates
> > with
> > >> a
> > >> >> node
> > >> >> > outside of the cluster. But in my case the communication is
> carried
> > >> out
> > >> >> > between the nodes which all belong to the same cluster.
> > >> >> >
> > >> >> > Mithila
> > >> >> >
> > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
> aaron@cloudera.com
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > I wrote a blog post a while back about connecting nodes via a
> > >> gateway.
> > >> >> > See
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > >> >> > >
> > >> >> > > This assumes that the client is outside the gateway and all
> > >> >> > > datanodes/namenode are inside, but the same principles apply.
> > >> You'll
> > >> >> just
> > >> >> > > need to set up ssh tunnels from every datanode to the namenode.
> > >> >> > >
> > >> >> > > - Aaron
> > >> >> > >
> > >> >> > >
> > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> > >> >> rphulari@yahoo-inc.com
> > >> >> > >wrote:
> > >> >> > >
> > >> >> > >> Looks like your NameNode is down .
> > >> >> > >> Verify if hadoop process are running (   jps should show you
> all
> > >> java
> > >> >> > >> running process).
> > >> >> > >> If your hadoop process are running try restarting your hadoop
> > >> process
> > >> >> .
> > >> >> > >> I guess this problem is due to your fsimage not being correct
> .
> > >> >> > >> You might have to format your namenode.
> > >> >> > >> Hope this helps.
> > >> >> > >>
> > >> >> > >> Thanks,
> > >> >> > >> --
> > >> >> > >> Ravi
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu>
> > wrote:
> > >> >> > >>
> > >> >> > >> The log file runs into thousands of line with the same message
> > >> being
> > >> >> > >> displayed every time.
> > >> >> > >>
> > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> > >> mnagendr@asu.edu>
> > >> >> > >> wrote:
> > >> >> > >>
> > >> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14
> > has
> > >> >> the
> > >> >> > >> > following in it:
> > >> >> > >> >
> > >> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> > >> >> > >> STARTUP_MSG:
> > >> >> > >> >
> /************************************************************
> > >> >> > >> > STARTUP_MSG: Starting DataNode
> > >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> > >> >> > >> > STARTUP_MSG:   args = []
> > >> >> > >> > STARTUP_MSG:   version = 0.18.3
> > >> >> > >> > STARTUP_MSG:   build =
> > >> >> > >> >
> > >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> > >> >> > >> >
> ************************************************************/
> > >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC:
> Server
> > >> at
> > >> >> > >> node18/
> > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC:
> Server
> > >> at
> > >> >> > >> node18/
> > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > Hmmm I still cant figure it out..
> > >> >> > >> >
> > >> >> > >> > Mithila
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> > >> >> mnagendr@asu.edu
> > >> >> > >> >wrote:
> > >> >> > >> >
> > >> >> > >> >> Also, Would the way the port is accessed change if all
> these
> > >> node
> > >> >> are
> > >> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml
> > >> file?
> > >> >> The
> > >> >> > >> Ubuntu
> > >> >> > >> >> systems we worked with earlier didnt have a gateway.
> > >> >> > >> >> Mithila
> > >> >> > >> >>
> > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> > >> >> mnagendr@asu.edu
> > >> >> > >> >wrote:
> > >> >> > >> >>
> > >> >> > >> >>> Aaron: Which log file do I look into - there are alot of
> > them.
> > >> >> Here
> > >> >> > s
> > >> >> > >> >>> what the error looks like:
> > >> >> > >> >>> [mithila@node19:~]$ cd hadoop
> > >> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> > >> >> > >> >>> Bad connection to FS. command aborted.
> > >> >> > >> >>>
> > >> >> > >> >>> Node19 is a slave and Node18 is the master.
> > >> >> > >> >>>
> > >> >> > >> >>> Mithila
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> > >> >> aaron@cloudera.com
> > >> >> > >> >wrote:
> > >> >> > >> >>>
> > >> >> > >> >>>> Are there any error messages in the log files on those
> > nodes?
> > >> >> > >> >>>> - Aaron
> > >> >> > >> >>>>
> > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> > >> >> > mnagendr@asu.edu>
> > >> >> > >> >>>> wrote:
> > >> >> > >> >>>>
> > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong
> > with
> > >> >> the
> > >> >> > >> ports.
> > >> >> > >> >>>> I
> > >> >> > >> >>>> > can
> > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
> > >> slaves
> > >> >> -
> > >> >> > >> says
> > >> >> > >> >>>> "Bad
> > >> >> > >> >>>> > connection to DFS". Master seems to be fine.
> > >> >> > >> >>>> > Mithila
> > >> >> > >> >>>> >
> > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> > >> >> > >> mnagendr@asu.edu>
> > >> >> > >> >>>> > wrote:
> > >> >> > >> >>>> >
> > >> >> > >> >>>> > > Yes I can..
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> > >> >> > >> jim.twensky@gmail.com
> > >> >> > >> >>>> > >wrote:
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >> Can you ssh between the nodes?
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> -jim
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> > >> >> > >> >>>> mnagendr@asu.edu>
> > >> >> > >> >>>> > >> wrote:
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> > Thanks Aaron.
> > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running
> > on
> > >> >> them
> > >> >> > and
> > >> >> > >> >>>> the dfs
> > >> >> > >> >>>> > >> was
> > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
> > >> setup
> > >> >> has
> > >> >> > >> Red
> > >> >> > >> >>>> Hat
> > >> >> > >> >>>> > >> Linux
> > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try
> to
> > >> >> access
> > >> >> > >> the
> > >> >> > >> >>>> dfs
> > >> >> > >> >>>> > from
> > >> >> > >> >>>> > >> > one
> > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
> > cannot
> > >> be
> > >> >> > >> >>>> accessed.
> > >> >> > >> >>>> > When
> > >> >> > >> >>>> > >> I
> > >> >> > >> >>>> > >> > access the DFS throught the master there s no
> > problem.
> > >> So
> > >> >> I
> > >> >> > >> feel
> > >> >> > >> >>>> there
> > >> >> > >> >>>> > a
> > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the
> > list
> > >> of
> > >> >> > >> slaves,
> > >> >> > >> >>>> it
> > >> >> > >> >>>> > >> looks
> > >> >> > >> >>>> > >> > fine to me.
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > Mithila
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> > >> >> > >> >>>> jim.twensky@gmail.com>
> > >> >> > >> >>>> > >> > wrote:
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > > Mithila,
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > You said all the slaves were being utilized in
> the
> > 3
> > >> >> node
> > >> >> > >> >>>> cluster.
> > >> >> > >> >>>> > >> Which
> > >> >> > >> >>>> > >> > > application did you run to test that and what
> was
> > >> your
> > >> >> > input
> > >> >> > >> >>>> size?
> > >> >> > >> >>>> > If
> > >> >> > >> >>>> > >> you
> > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB
> input
> > >> file
> > >> >> on
> > >> >> > >> both
> > >> >> > >> >>>> > >> cluster
> > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
> > >> cluster
> > >> >> may
> > >> >> > >> not
> > >> >> > >> >>>> be
> > >> >> > >> >>>> > >> running
> > >> >> > >> >>>> > >> > > at
> > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each
> > >> input
> > >> >> > split
> > >> >> > >> and
> > >> >> > >> >>>> if
> > >> >> > >> >>>> > you
> > >> >> > >> >>>> > >> > are
> > >> >> > >> >>>> > >> > > running your cluster with the defaults, the
> splits
> > >> are
> > >> >> 64
> > >> >> > MB
> > >> >> > >> >>>> each. I
> > >> >> > >> >>>> > >> got
> > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do
> > all
> > >> >> the
> > >> >> > >> work.
> > >> >> > >> >>>> Can
> > >> >> > >> >>>> > you
> > >> >> > >> >>>> > >> > > check
> > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of
> all
> > >> task
> > >> >> > >> >>>> trackers
> > >> >> > >> >>>> > >> there? I
> > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a
> larger
> > >> >> input
> > >> >> > >> size,
> > >> >> > >> >>>> say
> > >> >> > >> >>>> > at
> > >> >> > >> >>>> > >> > least
> > >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > Jim
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> > >> >> > >> >>>> aaron@cloudera.com>
> > >> >> > >> >>>> > >> > wrote:
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter"
> to
> > >> >> generate
> > >> >> > >> the
> > >> >> > >> >>>> data
> > >> >> > >> >>>> > >> and
> > >> >> > >> >>>> > >> > > > "sort"
> > >> >> > >> >>>> > >> > > > to sort it.
> > >> >> > >> >>>> > >> > > > - Aaron
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi
> <
> > >> >> > >> >>>> > forpankil@gmail.com>
> > >> >> > >> >>>> > >> > > wrote:
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15
> clusters
> > >> ..So
> > >> >> it
> > >> >> > >> >>>> might be
> > >> >> > >> >>>> > >> > > overhead
> > >> >> > >> >>>> > >> > > > > time of these clusters making your total MR
> > jobs
> > >> >> more
> > >> >> > >> time
> > >> >> > >> >>>> > >> consuming.
> > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger set
> > of
> > >> >> data..
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > > > Pankil
> > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
> > >> Nagendra <
> > >> >> > >> >>>> > >> mnagendr@asu.edu>
> > >> >> > >> >>>> > >> > > > > wrote:
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > > > > Aaron
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is just
> > 516MB
> > >> -
> > >> >> > >> wouldn't
> > >> >> > >> >>>> this
> > >> >> > >> >>>> > >> see
> > >> >> > >> >>>> > >> > a
> > >> >> > >> >>>> > >> > > > bit
> > >> >> > >> >>>> > >> > > > > of
> > >> >> > >> >>>> > >> > > > > > speed up?
> > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll
> run
> > my
> > >> >> > cluster
> > >> >> > >> on
> > >> >> > >> >>>> it
> > >> >> > >> >>>> > and
> > >> >> > >> >>>> > >> > see
> > >> >> > >> >>>> > >> > > > what
> > >> >> > >> >>>> > >> > > > > I
> > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java
> timer
> > >> >> running
> > >> >> > to
> > >> >> > >> >>>> record
> > >> >> > >> >>>> > >> the
> > >> >> > >> >>>> > >> > > time
> > >> >> > >> >>>> > >> > > > > > taken
> > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
> > >> >> inbuilt
> > >> >> > >> timer?
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > Mithila
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
> > Kimball
> > >> <
> > >> >> > >> >>>> > >> aaron@cloudera.com
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > > > wrote:
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship
> > >> with
> > >> >> > Hadoop
> > >> >> > >> >>>> are
> > >> >> > >> >>>> > >> designed
> > >> >> > >> >>>> > >> > > to
> > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup
> comes
> > >> from
> > >> >> > its
> > >> >> > >> >>>> ability
> > >> >> > >> >>>> > to
> > >> >> > >> >>>> > >> > > > process
> > >> >> > >> >>>> > >> > > > > > very
> > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting around,
> > say,
> > >> >> tens
> > >> >> > of
> > >> >> > >> GB
> > >> >> > >> >>>> per
> > >> >> > >> >>>> > >> job,
> > >> >> > >> >>>> > >> > > and
> > >> >> > >> >>>> > >> > > > > > going
> > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there).
> So
> > if
> > >> >> you
> > >> >> > are
> > >> >> > >> >>>> timing
> > >> >> > >> >>>> > >> the
> > >> >> > >> >>>> > >> > pi
> > >> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
> > >> >> results
> > >> >> > >> won't
> > >> >> > >> >>>> > >> > necessarily
> > >> >> > >> >>>> > >> > > be
> > >> >> > >> >>>> > >> > > > > > very
> > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
> > >> >> fragments
> > >> >> > >> of
> > >> >> > >> >>>> data
> > >> >> > >> >>>> > to
> > >> >> > >> >>>> > >> > > > allocate
> > >> >> > >> >>>> > >> > > > > > one
> > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will
> also
> > >> just
> > >> >> go
> > >> >> > >> >>>> unused.
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to
> use
> > >> >> > >> randomwriter
> > >> >> > >> >>>> to
> > >> >> > >> >>>> > fill
> > >> >> > >> >>>> > >> up
> > >> >> > >> >>>> > >> > > > your
> > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random data
> and
> > >> then
> > >> >> > run
> > >> >> > >> the
> > >> >> > >> >>>> sort
> > >> >> > >> >>>> > >> > > program.
> > >> >> > >> >>>> > >> > > > > If
> > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
> > >> nodes
> > >> >> to
> > >> >> > >> 15,
> > >> >> > >> >>>> then
> > >> >> > >> >>>> > >> you've
> > >> >> > >> >>>> > >> > > > > > > definitely
> > >> >> > >> >>>> > >> > > > > > > got something strange going on.
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > - Aaron
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
> > >> >> Nagendra
> > >> >> > <
> > >> >> > >> >>>> > >> > > mnagendr@asu.edu>
> > >> >> > >> >>>> > >> > > > > > > wrote:
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Hey all
> > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
> > >> cluster
> > >> >> and
> > >> >> > >> ran
> > >> >> > >> >>>> an
> > >> >> > >> >>>> > >> > examples
> > >> >> > >> >>>> > >> > > on
> > >> >> > >> >>>> > >> > > > > it.
> > >> >> > >> >>>> > >> > > > > > > It
> > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three
> nodes
> > >> were
> > >> >> > >> being
> > >> >> > >> >>>> used
> > >> >> > >> >>>> > (I
> > >> >> > >> >>>> > >> > > checked
> > >> >> > >> >>>> > >> > > > > the
> > >> >> > >> >>>> > >> > > > > > > log
> > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
> > >> >> > utilized).
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
> > consisting
> > >> of
> > >> >> 15
> > >> >> > >> >>>> nodes. I
> > >> >> > >> >>>> > ran
> > >> >> > >> >>>> > >> > the
> > >> >> > >> >>>> > >> > > > same
> > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up,
> the
> > >> >> > >> map-reduce
> > >> >> > >> >>>> task
> > >> >> > >> >>>> > >> seems
> > >> >> > >> >>>> > >> > to
> > >> >> > >> >>>> > >> > > > > take
> > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used
> > for
> > >> >> some
> > >> >> > >> >>>> reason.
> > >> >> > >> >>>> > This
> > >> >> > >> >>>> > >> > > second
> > >> >> > >> >>>> > >> > > > > > > cluster
> > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing
> power,
> > >> but
> > >> >> > should
> > >> >> > >> >>>> that
> > >> >> > >> >>>> > make
> > >> >> > >> >>>> > >> > any
> > >> >> > >> >>>> > >> > > > > > > > difference?
> > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is
> being
> > >> >> mapped
> > >> >> > to
> > >> >> > >> all
> > >> >> > >> >>>> the
> > >> >> > >> >>>> > >> > nodes?
> > >> >> > >> >>>> > >> > > > > > > Presently,
> > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing
> all
> > >> the
> > >> >> > work
> > >> >> > >> is
> > >> >> > >> >>>> the
> > >> >> > >> >>>> > >> Master
> > >> >> > >> >>>> > >> > > > node.
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase
> the
> > >> >> network
> > >> >> > >> cost?
> > >> >> > >> >>>> What
> > >> >> > >> >>>> > >> can
> > >> >> > >> >>>> > >> > I
> > >> >> > >> >>>> > >> > > do
> > >> >> > >> >>>> > >> > > > > to
> > >> >> > >> >>>> > >> > > > > > > > setup
> > >> >> > >> >>>> > >> > > > > > > > the cluster to function more
> > efficiently?
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Thanks!
> > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
> > >> >> > >> >>>> > >> > > > > > > > Arizona State University
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> >
> > >> >> > >> >>>>
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>
> > >> >> > >> >
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> Ravi
> > >> >> > >> --
> > >> >> > >>
> > >> >> > >>
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Alpha Chapters of my book on Hadoop are available
> > >> >> http://www.apress.com/book/view/9781430219422
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Alpha Chapters of my book on Hadoop are available
> > > http://www.apress.com/book/view/9781430219422
> > >
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Thanks! I ll see what I can find out.

On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <ja...@gmail.com>wrote:

> The firewall was run at system startup, I think there was a
> /etc/sysconfig/iptables file present which triggered the firewall.
> I don't currently have access to any centos 5 machines so I can't easily
> check.
>
>
>
> On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <jason.hadoop@gmail.com
> >wrote:
>
> > The kickstart script was something that the operations staff was using to
> > initialize new machines, I never actually saw the script, just figured
> out
> > that there was a firewall in place.
> >
> >
> >
> > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mnagendr@asu.edu
> >wrote:
> >
> >> Jason: the kickstart script - was it something you wrote or is it run
> when
> >> the system turns on?
> >> Mithila
> >>
> >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mn...@asu.edu>
> >> wrote:
> >>
> >> > Thanks Jason! Will check that out.
> >> > Mithila
> >> >
> >> >
> >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <jason.hadoop@gmail.com
> >> >wrote:
> >> >
> >> >> Double check that there is no firewall in place.
> >> >> At one point a bunch of new machines were kickstarted and placed in a
> >> >> cluster and they all failed with something similar.
> >> >> It turned out the kickstart script turned enabled the firewall with a
> >> rule
> >> >> that blocked ports in the 50k range.
> >> >> It took us a while to even think to check that was not a part of our
> >> >> normal
> >> >> machine configuration
> >> >>
> >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mnagendr@asu.edu
> >
> >> >> wrote:
> >> >>
> >> >> > Hi Aaron
> >> >> > I will look into that thanks!
> >> >> >
> >> >> > I spoke to the admin who overlooks the cluster. He said that the
> >> gateway
> >> >> > comes in to the picture only when one of the nodes communicates
> with
> >> a
> >> >> node
> >> >> > outside of the cluster. But in my case the communication is carried
> >> out
> >> >> > between the nodes which all belong to the same cluster.
> >> >> >
> >> >> > Mithila
> >> >> >
> >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aaron@cloudera.com
> >
> >> >> wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > I wrote a blog post a while back about connecting nodes via a
> >> gateway.
> >> >> > See
> >> >> > >
> >> >> >
> >> >>
> >>
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> >> >> > >
> >> >> > > This assumes that the client is outside the gateway and all
> >> >> > > datanodes/namenode are inside, but the same principles apply.
> >> You'll
> >> >> just
> >> >> > > need to set up ssh tunnels from every datanode to the namenode.
> >> >> > >
> >> >> > > - Aaron
> >> >> > >
> >> >> > >
> >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> >> >> rphulari@yahoo-inc.com
> >> >> > >wrote:
> >> >> > >
> >> >> > >> Looks like your NameNode is down .
> >> >> > >> Verify if hadoop process are running (   jps should show you all
> >> java
> >> >> > >> running process).
> >> >> > >> If your hadoop process are running try restarting your hadoop
> >> process
> >> >> .
> >> >> > >> I guess this problem is due to your fsimage not being correct .
> >> >> > >> You might have to format your namenode.
> >> >> > >> Hope this helps.
> >> >> > >>
> >> >> > >> Thanks,
> >> >> > >> --
> >> >> > >> Ravi
> >> >> > >>
> >> >> > >>
> >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu>
> wrote:
> >> >> > >>
> >> >> > >> The log file runs into thousands of line with the same message
> >> being
> >> >> > >> displayed every time.
> >> >> > >>
> >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> >> mnagendr@asu.edu>
> >> >> > >> wrote:
> >> >> > >>
> >> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14
> has
> >> >> the
> >> >> > >> > following in it:
> >> >> > >> >
> >> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> >> >> > >> STARTUP_MSG:
> >> >> > >> > /************************************************************
> >> >> > >> > STARTUP_MSG: Starting DataNode
> >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> >> >> > >> > STARTUP_MSG:   args = []
> >> >> > >> > STARTUP_MSG:   version = 0.18.3
> >> >> > >> > STARTUP_MSG:   build =
> >> >> > >> >
> >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> >> >> > >> > ************************************************************/
> >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> time(s).
> >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> time(s).
> >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> time(s).
> >> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> time(s).
> >> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> time(s).
> >> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> time(s).
> >> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> time(s).
> >> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> time(s).
> >> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> time(s).
> >> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> time(s).
> >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server
> >> at
> >> >> > >> node18/
> >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> time(s).
> >> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> time(s).
> >> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> time(s).
> >> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> time(s).
> >> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> time(s).
> >> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> time(s).
> >> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> time(s).
> >> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> time(s).
> >> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> time(s).
> >> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> time(s).
> >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server
> >> at
> >> >> > >> node18/
> >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> time(s).
> >> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> time(s).
> >> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> time(s).
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > Hmmm I still cant figure it out..
> >> >> > >> >
> >> >> > >> > Mithila
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> >> >> mnagendr@asu.edu
> >> >> > >> >wrote:
> >> >> > >> >
> >> >> > >> >> Also, Would the way the port is accessed change if all these
> >> node
> >> >> are
> >> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml
> >> file?
> >> >> The
> >> >> > >> Ubuntu
> >> >> > >> >> systems we worked with earlier didnt have a gateway.
> >> >> > >> >> Mithila
> >> >> > >> >>
> >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> >> >> mnagendr@asu.edu
> >> >> > >> >wrote:
> >> >> > >> >>
> >> >> > >> >>> Aaron: Which log file do I look into - there are alot of
> them.
> >> >> Here
> >> >> > s
> >> >> > >> >>> what the error looks like:
> >> >> > >> >>> [mithila@node19:~]$ cd hadoop
> >> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
> server:
> >> >> > node18/
> >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> >> >> > >> >>> Bad connection to FS. command aborted.
> >> >> > >> >>>
> >> >> > >> >>> Node19 is a slave and Node18 is the master.
> >> >> > >> >>>
> >> >> > >> >>> Mithila
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> >> >> aaron@cloudera.com
> >> >> > >> >wrote:
> >> >> > >> >>>
> >> >> > >> >>>> Are there any error messages in the log files on those
> nodes?
> >> >> > >> >>>> - Aaron
> >> >> > >> >>>>
> >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> >> >> > mnagendr@asu.edu>
> >> >> > >> >>>> wrote:
> >> >> > >> >>>>
> >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong
> with
> >> >> the
> >> >> > >> ports.
> >> >> > >> >>>> I
> >> >> > >> >>>> > can
> >> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
> >> slaves
> >> >> -
> >> >> > >> says
> >> >> > >> >>>> "Bad
> >> >> > >> >>>> > connection to DFS". Master seems to be fine.
> >> >> > >> >>>> > Mithila
> >> >> > >> >>>> >
> >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> >> >> > >> mnagendr@asu.edu>
> >> >> > >> >>>> > wrote:
> >> >> > >> >>>> >
> >> >> > >> >>>> > > Yes I can..
> >> >> > >> >>>> > >
> >> >> > >> >>>> > >
> >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> >> >> > >> jim.twensky@gmail.com
> >> >> > >> >>>> > >wrote:
> >> >> > >> >>>> > >
> >> >> > >> >>>> > >> Can you ssh between the nodes?
> >> >> > >> >>>> > >>
> >> >> > >> >>>> > >> -jim
> >> >> > >> >>>> > >>
> >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> >> >> > >> >>>> mnagendr@asu.edu>
> >> >> > >> >>>> > >> wrote:
> >> >> > >> >>>> > >>
> >> >> > >> >>>> > >> > Thanks Aaron.
> >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running
> on
> >> >> them
> >> >> > and
> >> >> > >> >>>> the dfs
> >> >> > >> >>>> > >> was
> >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
> >> setup
> >> >> has
> >> >> > >> Red
> >> >> > >> >>>> Hat
> >> >> > >> >>>> > >> Linux
> >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
> >> >> access
> >> >> > >> the
> >> >> > >> >>>> dfs
> >> >> > >> >>>> > from
> >> >> > >> >>>> > >> > one
> >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
> cannot
> >> be
> >> >> > >> >>>> accessed.
> >> >> > >> >>>> > When
> >> >> > >> >>>> > >> I
> >> >> > >> >>>> > >> > access the DFS throught the master there s no
> problem.
> >> So
> >> >> I
> >> >> > >> feel
> >> >> > >> >>>> there
> >> >> > >> >>>> > a
> >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the
> list
> >> of
> >> >> > >> slaves,
> >> >> > >> >>>> it
> >> >> > >> >>>> > >> looks
> >> >> > >> >>>> > >> > fine to me.
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> > Mithila
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> >> >> > >> >>>> jim.twensky@gmail.com>
> >> >> > >> >>>> > >> > wrote:
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >> > > Mithila,
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> > > You said all the slaves were being utilized in the
> 3
> >> >> node
> >> >> > >> >>>> cluster.
> >> >> > >> >>>> > >> Which
> >> >> > >> >>>> > >> > > application did you run to test that and what was
> >> your
> >> >> > input
> >> >> > >> >>>> size?
> >> >> > >> >>>> > If
> >> >> > >> >>>> > >> you
> >> >> > >> >>>> > >> > > tried the word count application on a 516 MB input
> >> file
> >> >> on
> >> >> > >> both
> >> >> > >> >>>> > >> cluster
> >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
> >> cluster
> >> >> may
> >> >> > >> not
> >> >> > >> >>>> be
> >> >> > >> >>>> > >> running
> >> >> > >> >>>> > >> > > at
> >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each
> >> input
> >> >> > split
> >> >> > >> and
> >> >> > >> >>>> if
> >> >> > >> >>>> > you
> >> >> > >> >>>> > >> > are
> >> >> > >> >>>> > >> > > running your cluster with the defaults, the splits
> >> are
> >> >> 64
> >> >> > MB
> >> >> > >> >>>> each. I
> >> >> > >> >>>> > >> got
> >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do
> all
> >> >> the
> >> >> > >> work.
> >> >> > >> >>>> Can
> >> >> > >> >>>> > you
> >> >> > >> >>>> > >> > > check
> >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of all
> >> task
> >> >> > >> >>>> trackers
> >> >> > >> >>>> > >> there? I
> >> >> > >> >>>> > >> > > also suggest comparing both clusters with a larger
> >> >> input
> >> >> > >> size,
> >> >> > >> >>>> say
> >> >> > >> >>>> > at
> >> >> > >> >>>> > >> > least
> >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> > > Jim
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> >> >> > >> >>>> aaron@cloudera.com>
> >> >> > >> >>>> > >> > wrote:
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
> >> >> generate
> >> >> > >> the
> >> >> > >> >>>> data
> >> >> > >> >>>> > >> and
> >> >> > >> >>>> > >> > > > "sort"
> >> >> > >> >>>> > >> > > > to sort it.
> >> >> > >> >>>> > >> > > > - Aaron
> >> >> > >> >>>> > >> > > >
> >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> >> >> > >> >>>> > forpankil@gmail.com>
> >> >> > >> >>>> > >> > > wrote:
> >> >> > >> >>>> > >> > > >
> >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters
> >> ..So
> >> >> it
> >> >> > >> >>>> might be
> >> >> > >> >>>> > >> > > overhead
> >> >> > >> >>>> > >> > > > > time of these clusters making your total MR
> jobs
> >> >> more
> >> >> > >> time
> >> >> > >> >>>> > >> consuming.
> >> >> > >> >>>> > >> > > > > I guess you will have to try with larger set
> of
> >> >> data..
> >> >> > >> >>>> > >> > > > >
> >> >> > >> >>>> > >> > > > > Pankil
> >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
> >> Nagendra <
> >> >> > >> >>>> > >> mnagendr@asu.edu>
> >> >> > >> >>>> > >> > > > > wrote:
> >> >> > >> >>>> > >> > > > >
> >> >> > >> >>>> > >> > > > > > Aaron
> >> >> > >> >>>> > >> > > > > >
> >> >> > >> >>>> > >> > > > > > That could be the issue, my data is just
> 516MB
> >> -
> >> >> > >> wouldn't
> >> >> > >> >>>> this
> >> >> > >> >>>> > >> see
> >> >> > >> >>>> > >> > a
> >> >> > >> >>>> > >> > > > bit
> >> >> > >> >>>> > >> > > > > of
> >> >> > >> >>>> > >> > > > > > speed up?
> >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run
> my
> >> >> > cluster
> >> >> > >> on
> >> >> > >> >>>> it
> >> >> > >> >>>> > and
> >> >> > >> >>>> > >> > see
> >> >> > >> >>>> > >> > > > what
> >> >> > >> >>>> > >> > > > > I
> >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
> >> >> running
> >> >> > to
> >> >> > >> >>>> record
> >> >> > >> >>>> > >> the
> >> >> > >> >>>> > >> > > time
> >> >> > >> >>>> > >> > > > > > taken
> >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
> >> >> inbuilt
> >> >> > >> timer?
> >> >> > >> >>>> > >> > > > > >
> >> >> > >> >>>> > >> > > > > > Mithila
> >> >> > >> >>>> > >> > > > > >
> >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
> Kimball
> >> <
> >> >> > >> >>>> > >> aaron@cloudera.com
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> > > > > wrote:
> >> >> > >> >>>> > >> > > > > >
> >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship
> >> with
> >> >> > Hadoop
> >> >> > >> >>>> are
> >> >> > >> >>>> > >> designed
> >> >> > >> >>>> > >> > > to
> >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes
> >> from
> >> >> > its
> >> >> > >> >>>> ability
> >> >> > >> >>>> > to
> >> >> > >> >>>> > >> > > > process
> >> >> > >> >>>> > >> > > > > > very
> >> >> > >> >>>> > >> > > > > > > large volumes of data (starting around,
> say,
> >> >> tens
> >> >> > of
> >> >> > >> GB
> >> >> > >> >>>> per
> >> >> > >> >>>> > >> job,
> >> >> > >> >>>> > >> > > and
> >> >> > >> >>>> > >> > > > > > going
> >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So
> if
> >> >> you
> >> >> > are
> >> >> > >> >>>> timing
> >> >> > >> >>>> > >> the
> >> >> > >> >>>> > >> > pi
> >> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
> >> >> results
> >> >> > >> won't
> >> >> > >> >>>> > >> > necessarily
> >> >> > >> >>>> > >> > > be
> >> >> > >> >>>> > >> > > > > > very
> >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
> >> >> fragments
> >> >> > >> of
> >> >> > >> >>>> data
> >> >> > >> >>>> > to
> >> >> > >> >>>> > >> > > > allocate
> >> >> > >> >>>> > >> > > > > > one
> >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will also
> >> just
> >> >> go
> >> >> > >> >>>> unused.
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > > > The best example for you to run is to use
> >> >> > >> randomwriter
> >> >> > >> >>>> to
> >> >> > >> >>>> > fill
> >> >> > >> >>>> > >> up
> >> >> > >> >>>> > >> > > > your
> >> >> > >> >>>> > >> > > > > > > cluster with several GB of random data and
> >> then
> >> >> > run
> >> >> > >> the
> >> >> > >> >>>> sort
> >> >> > >> >>>> > >> > > program.
> >> >> > >> >>>> > >> > > > > If
> >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
> >> nodes
> >> >> to
> >> >> > >> 15,
> >> >> > >> >>>> then
> >> >> > >> >>>> > >> you've
> >> >> > >> >>>> > >> > > > > > > definitely
> >> >> > >> >>>> > >> > > > > > > got something strange going on.
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > > > - Aaron
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
> >> >> Nagendra
> >> >> > <
> >> >> > >> >>>> > >> > > mnagendr@asu.edu>
> >> >> > >> >>>> > >> > > > > > > wrote:
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > > > > Hey all
> >> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
> >> cluster
> >> >> and
> >> >> > >> ran
> >> >> > >> >>>> an
> >> >> > >> >>>> > >> > examples
> >> >> > >> >>>> > >> > > on
> >> >> > >> >>>> > >> > > > > it.
> >> >> > >> >>>> > >> > > > > > > It
> >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes
> >> were
> >> >> > >> being
> >> >> > >> >>>> used
> >> >> > >> >>>> > (I
> >> >> > >> >>>> > >> > > checked
> >> >> > >> >>>> > >> > > > > the
> >> >> > >> >>>> > >> > > > > > > log
> >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
> >> >> > utilized).
> >> >> > >> >>>> > >> > > > > > > >
> >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
> consisting
> >> of
> >> >> 15
> >> >> > >> >>>> nodes. I
> >> >> > >> >>>> > ran
> >> >> > >> >>>> > >> > the
> >> >> > >> >>>> > >> > > > same
> >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
> >> >> > >> map-reduce
> >> >> > >> >>>> task
> >> >> > >> >>>> > >> seems
> >> >> > >> >>>> > >> > to
> >> >> > >> >>>> > >> > > > > take
> >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used
> for
> >> >> some
> >> >> > >> >>>> reason.
> >> >> > >> >>>> > This
> >> >> > >> >>>> > >> > > second
> >> >> > >> >>>> > >> > > > > > > cluster
> >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing power,
> >> but
> >> >> > should
> >> >> > >> >>>> that
> >> >> > >> >>>> > make
> >> >> > >> >>>> > >> > any
> >> >> > >> >>>> > >> > > > > > > > difference?
> >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is being
> >> >> mapped
> >> >> > to
> >> >> > >> all
> >> >> > >> >>>> the
> >> >> > >> >>>> > >> > nodes?
> >> >> > >> >>>> > >> > > > > > > Presently,
> >> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing all
> >> the
> >> >> > work
> >> >> > >> is
> >> >> > >> >>>> the
> >> >> > >> >>>> > >> Master
> >> >> > >> >>>> > >> > > > node.
> >> >> > >> >>>> > >> > > > > > > >
> >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
> >> >> network
> >> >> > >> cost?
> >> >> > >> >>>> What
> >> >> > >> >>>> > >> can
> >> >> > >> >>>> > >> > I
> >> >> > >> >>>> > >> > > do
> >> >> > >> >>>> > >> > > > > to
> >> >> > >> >>>> > >> > > > > > > > setup
> >> >> > >> >>>> > >> > > > > > > > the cluster to function more
> efficiently?
> >> >> > >> >>>> > >> > > > > > > >
> >> >> > >> >>>> > >> > > > > > > > Thanks!
> >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
> >> >> > >> >>>> > >> > > > > > > > Arizona State University
> >> >> > >> >>>> > >> > > > > > > >
> >> >> > >> >>>> > >> > > > > > >
> >> >> > >> >>>> > >> > > > > >
> >> >> > >> >>>> > >> > > > >
> >> >> > >> >>>> > >> > > >
> >> >> > >> >>>> > >> > >
> >> >> > >> >>>> > >> >
> >> >> > >> >>>> > >>
> >> >> > >> >>>> > >
> >> >> > >> >>>> > >
> >> >> > >> >>>> >
> >> >> > >> >>>>
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>
> >> >> > >> >
> >> >> > >>
> >> >> > >>
> >> >> > >> Ravi
> >> >> > >> --
> >> >> > >>
> >> >> > >>
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Alpha Chapters of my book on Hadoop are available
> >> >> http://www.apress.com/book/view/9781430219422
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Map-Reduce Slow Down

Posted by jason hadoop <ja...@gmail.com>.
The firewall was run at system startup, I think there was a
/etc/sysconfig/iptables file present which triggered the firewall.
I don't currently have access to any centos 5 machines so I can't easily
check.



On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <ja...@gmail.com>wrote:

> The kickstart script was something that the operations staff was using to
> initialize new machines, I never actually saw the script, just figured out
> that there was a firewall in place.
>
>
>
> On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>
>> Jason: the kickstart script - was it something you wrote or is it run when
>> the system turns on?
>> Mithila
>>
>> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > Thanks Jason! Will check that out.
>> > Mithila
>> >
>> >
>> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <jason.hadoop@gmail.com
>> >wrote:
>> >
>> >> Double check that there is no firewall in place.
>> >> At one point a bunch of new machines were kickstarted and placed in a
>> >> cluster and they all failed with something similar.
>> >> It turned out the kickstart script turned enabled the firewall with a
>> rule
>> >> that blocked ports in the 50k range.
>> >> It took us a while to even think to check that was not a part of our
>> >> normal
>> >> machine configuration
>> >>
>> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mn...@asu.edu>
>> >> wrote:
>> >>
>> >> > Hi Aaron
>> >> > I will look into that thanks!
>> >> >
>> >> > I spoke to the admin who overlooks the cluster. He said that the
>> gateway
>> >> > comes in to the picture only when one of the nodes communicates with
>> a
>> >> node
>> >> > outside of the cluster. But in my case the communication is carried
>> out
>> >> > between the nodes which all belong to the same cluster.
>> >> >
>> >> > Mithila
>> >> >
>> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com>
>> >> wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I wrote a blog post a while back about connecting nodes via a
>> gateway.
>> >> > See
>> >> > >
>> >> >
>> >>
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> >> > >
>> >> > > This assumes that the client is outside the gateway and all
>> >> > > datanodes/namenode are inside, but the same principles apply.
>> You'll
>> >> just
>> >> > > need to set up ssh tunnels from every datanode to the namenode.
>> >> > >
>> >> > > - Aaron
>> >> > >
>> >> > >
>> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> >> rphulari@yahoo-inc.com
>> >> > >wrote:
>> >> > >
>> >> > >> Looks like your NameNode is down .
>> >> > >> Verify if hadoop process are running (   jps should show you all
>> java
>> >> > >> running process).
>> >> > >> If your hadoop process are running try restarting your hadoop
>> process
>> >> .
>> >> > >> I guess this problem is due to your fsimage not being correct .
>> >> > >> You might have to format your namenode.
>> >> > >> Hope this helps.
>> >> > >>
>> >> > >> Thanks,
>> >> > >> --
>> >> > >> Ravi
>> >> > >>
>> >> > >>
>> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
>> >> > >>
>> >> > >> The log file runs into thousands of line with the same message
>> being
>> >> > >> displayed every time.
>> >> > >>
>> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
>> mnagendr@asu.edu>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
>> >> the
>> >> > >> > following in it:
>> >> > >> >
>> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> >> > >> STARTUP_MSG:
>> >> > >> > /************************************************************
>> >> > >> > STARTUP_MSG: Starting DataNode
>> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> >> > >> > STARTUP_MSG:   args = []
>> >> > >> > STARTUP_MSG:   version = 0.18.3
>> >> > >> > STARTUP_MSG:   build =
>> >> > >> >
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> >> > >> > ************************************************************/
>> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server
>> at
>> >> > >> node18/
>> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server
>> at
>> >> > >> node18/
>> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> >
>> >> > >> >
>> >> > >> > Hmmm I still cant figure it out..
>> >> > >> >
>> >> > >> > Mithila
>> >> > >> >
>> >> > >> >
>> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
>> >> mnagendr@asu.edu
>> >> > >> >wrote:
>> >> > >> >
>> >> > >> >> Also, Would the way the port is accessed change if all these
>> node
>> >> are
>> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml
>> file?
>> >> The
>> >> > >> Ubuntu
>> >> > >> >> systems we worked with earlier didnt have a gateway.
>> >> > >> >> Mithila
>> >> > >> >>
>> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
>> >> mnagendr@asu.edu
>> >> > >> >wrote:
>> >> > >> >>
>> >> > >> >>> Aaron: Which log file do I look into - there are alot of them.
>> >> Here
>> >> > s
>> >> > >> >>> what the error looks like:
>> >> > >> >>> [mithila@node19:~]$ cd hadoop
>> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> >>> Bad connection to FS. command aborted.
>> >> > >> >>>
>> >> > >> >>> Node19 is a slave and Node18 is the master.
>> >> > >> >>>
>> >> > >> >>> Mithila
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
>> >> aaron@cloudera.com
>> >> > >> >wrote:
>> >> > >> >>>
>> >> > >> >>>> Are there any error messages in the log files on those nodes?
>> >> > >> >>>> - Aaron
>> >> > >> >>>>
>> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
>> >> > mnagendr@asu.edu>
>> >> > >> >>>> wrote:
>> >> > >> >>>>
>> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with
>> >> the
>> >> > >> ports.
>> >> > >> >>>> I
>> >> > >> >>>> > can
>> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
>> slaves
>> >> -
>> >> > >> says
>> >> > >> >>>> "Bad
>> >> > >> >>>> > connection to DFS". Master seems to be fine.
>> >> > >> >>>> > Mithila
>> >> > >> >>>> >
>> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> >> > >> mnagendr@asu.edu>
>> >> > >> >>>> > wrote:
>> >> > >> >>>> >
>> >> > >> >>>> > > Yes I can..
>> >> > >> >>>> > >
>> >> > >> >>>> > >
>> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> >> > >> jim.twensky@gmail.com
>> >> > >> >>>> > >wrote:
>> >> > >> >>>> > >
>> >> > >> >>>> > >> Can you ssh between the nodes?
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> -jim
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>> >> > >> >>>> mnagendr@asu.edu>
>> >> > >> >>>> > >> wrote:
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> > Thanks Aaron.
>> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on
>> >> them
>> >> > and
>> >> > >> >>>> the dfs
>> >> > >> >>>> > >> was
>> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
>> setup
>> >> has
>> >> > >> Red
>> >> > >> >>>> Hat
>> >> > >> >>>> > >> Linux
>> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
>> >> access
>> >> > >> the
>> >> > >> >>>> dfs
>> >> > >> >>>> > from
>> >> > >> >>>> > >> > one
>> >> > >> >>>> > >> > of the slaves i get the following response: dfs cannot
>> be
>> >> > >> >>>> accessed.
>> >> > >> >>>> > When
>> >> > >> >>>> > >> I
>> >> > >> >>>> > >> > access the DFS throught the master there s no problem.
>> So
>> >> I
>> >> > >> feel
>> >> > >> >>>> there
>> >> > >> >>>> > a
>> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the list
>> of
>> >> > >> slaves,
>> >> > >> >>>> it
>> >> > >> >>>> > >> looks
>> >> > >> >>>> > >> > fine to me.
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > Mithila
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> >> > >> >>>> jim.twensky@gmail.com>
>> >> > >> >>>> > >> > wrote:
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > > Mithila,
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > You said all the slaves were being utilized in the 3
>> >> node
>> >> > >> >>>> cluster.
>> >> > >> >>>> > >> Which
>> >> > >> >>>> > >> > > application did you run to test that and what was
>> your
>> >> > input
>> >> > >> >>>> size?
>> >> > >> >>>> > If
>> >> > >> >>>> > >> you
>> >> > >> >>>> > >> > > tried the word count application on a 516 MB input
>> file
>> >> on
>> >> > >> both
>> >> > >> >>>> > >> cluster
>> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
>> cluster
>> >> may
>> >> > >> not
>> >> > >> >>>> be
>> >> > >> >>>> > >> running
>> >> > >> >>>> > >> > > at
>> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each
>> input
>> >> > split
>> >> > >> and
>> >> > >> >>>> if
>> >> > >> >>>> > you
>> >> > >> >>>> > >> > are
>> >> > >> >>>> > >> > > running your cluster with the defaults, the splits
>> are
>> >> 64
>> >> > MB
>> >> > >> >>>> each. I
>> >> > >> >>>> > >> got
>> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do all
>> >> the
>> >> > >> work.
>> >> > >> >>>> Can
>> >> > >> >>>> > you
>> >> > >> >>>> > >> > > check
>> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of all
>> task
>> >> > >> >>>> trackers
>> >> > >> >>>> > >> there? I
>> >> > >> >>>> > >> > > also suggest comparing both clusters with a larger
>> >> input
>> >> > >> size,
>> >> > >> >>>> say
>> >> > >> >>>> > at
>> >> > >> >>>> > >> > least
>> >> > >> >>>> > >> > > 5 GB, to really see a difference.
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > Jim
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> >> > >> >>>> aaron@cloudera.com>
>> >> > >> >>>> > >> > wrote:
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
>> >> generate
>> >> > >> the
>> >> > >> >>>> data
>> >> > >> >>>> > >> and
>> >> > >> >>>> > >> > > > "sort"
>> >> > >> >>>> > >> > > > to sort it.
>> >> > >> >>>> > >> > > > - Aaron
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> >> > >> >>>> > forpankil@gmail.com>
>> >> > >> >>>> > >> > > wrote:
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters
>> ..So
>> >> it
>> >> > >> >>>> might be
>> >> > >> >>>> > >> > > overhead
>> >> > >> >>>> > >> > > > > time of these clusters making your total MR jobs
>> >> more
>> >> > >> time
>> >> > >> >>>> > >> consuming.
>> >> > >> >>>> > >> > > > > I guess you will have to try with larger set of
>> >> data..
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > > > Pankil
>> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
>> Nagendra <
>> >> > >> >>>> > >> mnagendr@asu.edu>
>> >> > >> >>>> > >> > > > > wrote:
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > > > > Aaron
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > That could be the issue, my data is just 516MB
>> -
>> >> > >> wouldn't
>> >> > >> >>>> this
>> >> > >> >>>> > >> see
>> >> > >> >>>> > >> > a
>> >> > >> >>>> > >> > > > bit
>> >> > >> >>>> > >> > > > > of
>> >> > >> >>>> > >> > > > > > speed up?
>> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
>> >> > cluster
>> >> > >> on
>> >> > >> >>>> it
>> >> > >> >>>> > and
>> >> > >> >>>> > >> > see
>> >> > >> >>>> > >> > > > what
>> >> > >> >>>> > >> > > > > I
>> >> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
>> >> running
>> >> > to
>> >> > >> >>>> record
>> >> > >> >>>> > >> the
>> >> > >> >>>> > >> > > time
>> >> > >> >>>> > >> > > > > > taken
>> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
>> >> inbuilt
>> >> > >> timer?
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > Mithila
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball
>> <
>> >> > >> >>>> > >> aaron@cloudera.com
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > > > wrote:
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship
>> with
>> >> > Hadoop
>> >> > >> >>>> are
>> >> > >> >>>> > >> designed
>> >> > >> >>>> > >> > > to
>> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes
>> from
>> >> > its
>> >> > >> >>>> ability
>> >> > >> >>>> > to
>> >> > >> >>>> > >> > > > process
>> >> > >> >>>> > >> > > > > > very
>> >> > >> >>>> > >> > > > > > > large volumes of data (starting around, say,
>> >> tens
>> >> > of
>> >> > >> GB
>> >> > >> >>>> per
>> >> > >> >>>> > >> job,
>> >> > >> >>>> > >> > > and
>> >> > >> >>>> > >> > > > > > going
>> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So if
>> >> you
>> >> > are
>> >> > >> >>>> timing
>> >> > >> >>>> > >> the
>> >> > >> >>>> > >> > pi
>> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
>> >> results
>> >> > >> won't
>> >> > >> >>>> > >> > necessarily
>> >> > >> >>>> > >> > > be
>> >> > >> >>>> > >> > > > > > very
>> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
>> >> fragments
>> >> > >> of
>> >> > >> >>>> data
>> >> > >> >>>> > to
>> >> > >> >>>> > >> > > > allocate
>> >> > >> >>>> > >> > > > > > one
>> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will also
>> just
>> >> go
>> >> > >> >>>> unused.
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > The best example for you to run is to use
>> >> > >> randomwriter
>> >> > >> >>>> to
>> >> > >> >>>> > fill
>> >> > >> >>>> > >> up
>> >> > >> >>>> > >> > > > your
>> >> > >> >>>> > >> > > > > > > cluster with several GB of random data and
>> then
>> >> > run
>> >> > >> the
>> >> > >> >>>> sort
>> >> > >> >>>> > >> > > program.
>> >> > >> >>>> > >> > > > > If
>> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
>> nodes
>> >> to
>> >> > >> 15,
>> >> > >> >>>> then
>> >> > >> >>>> > >> you've
>> >> > >> >>>> > >> > > > > > > definitely
>> >> > >> >>>> > >> > > > > > > got something strange going on.
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > - Aaron
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
>> >> Nagendra
>> >> > <
>> >> > >> >>>> > >> > > mnagendr@asu.edu>
>> >> > >> >>>> > >> > > > > > > wrote:
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > > Hey all
>> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
>> cluster
>> >> and
>> >> > >> ran
>> >> > >> >>>> an
>> >> > >> >>>> > >> > examples
>> >> > >> >>>> > >> > > on
>> >> > >> >>>> > >> > > > > it.
>> >> > >> >>>> > >> > > > > > > It
>> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes
>> were
>> >> > >> being
>> >> > >> >>>> used
>> >> > >> >>>> > (I
>> >> > >> >>>> > >> > > checked
>> >> > >> >>>> > >> > > > > the
>> >> > >> >>>> > >> > > > > > > log
>> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
>> >> > utilized).
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting
>> of
>> >> 15
>> >> > >> >>>> nodes. I
>> >> > >> >>>> > ran
>> >> > >> >>>> > >> > the
>> >> > >> >>>> > >> > > > same
>> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
>> >> > >> map-reduce
>> >> > >> >>>> task
>> >> > >> >>>> > >> seems
>> >> > >> >>>> > >> > to
>> >> > >> >>>> > >> > > > > take
>> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used for
>> >> some
>> >> > >> >>>> reason.
>> >> > >> >>>> > This
>> >> > >> >>>> > >> > > second
>> >> > >> >>>> > >> > > > > > > cluster
>> >> > >> >>>> > >> > > > > > > > has a lower, per node processing power,
>> but
>> >> > should
>> >> > >> >>>> that
>> >> > >> >>>> > make
>> >> > >> >>>> > >> > any
>> >> > >> >>>> > >> > > > > > > > difference?
>> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is being
>> >> mapped
>> >> > to
>> >> > >> all
>> >> > >> >>>> the
>> >> > >> >>>> > >> > nodes?
>> >> > >> >>>> > >> > > > > > > Presently,
>> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing all
>> the
>> >> > work
>> >> > >> is
>> >> > >> >>>> the
>> >> > >> >>>> > >> Master
>> >> > >> >>>> > >> > > > node.
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
>> >> network
>> >> > >> cost?
>> >> > >> >>>> What
>> >> > >> >>>> > >> can
>> >> > >> >>>> > >> > I
>> >> > >> >>>> > >> > > do
>> >> > >> >>>> > >> > > > > to
>> >> > >> >>>> > >> > > > > > > > setup
>> >> > >> >>>> > >> > > > > > > > the cluster to function more efficiently?
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Thanks!
>> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
>> >> > >> >>>> > >> > > > > > > > Arizona State University
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >>
>> >> > >> >>>> > >
>> >> > >> >>>> > >
>> >> > >> >>>> >
>> >> > >> >>>>
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>
>> >> > >> >
>> >> > >>
>> >> > >>
>> >> > >> Ravi
>> >> > >> --
>> >> > >>
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Alpha Chapters of my book on Hadoop are available
>> >> http://www.apress.com/book/view/9781430219422
>> >>
>> >
>> >
>>
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Posted by jason hadoop <ja...@gmail.com>.
The kickstart script was something that the operations staff was using to
initialize new machines, I never actually saw the script, just figured out
that there was a firewall in place.


On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Jason: the kickstart script - was it something you wrote or is it run when
> the system turns on?
> Mithila
>
> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Thanks Jason! Will check that out.
> > Mithila
> >
> >
> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <jason.hadoop@gmail.com
> >wrote:
> >
> >> Double check that there is no firewall in place.
> >> At one point a bunch of new machines were kickstarted and placed in a
> >> cluster and they all failed with something similar.
> >> It turned out the kickstart script turned enabled the firewall with a
> rule
> >> that blocked ports in the 50k range.
> >> It took us a while to even think to check that was not a part of our
> >> normal
> >> machine configuration
> >>
> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mn...@asu.edu>
> >> wrote:
> >>
> >> > Hi Aaron
> >> > I will look into that thanks!
> >> >
> >> > I spoke to the admin who overlooks the cluster. He said that the
> gateway
> >> > comes in to the picture only when one of the nodes communicates with a
> >> node
> >> > outside of the cluster. But in my case the communication is carried
> out
> >> > between the nodes which all belong to the same cluster.
> >> >
> >> > Mithila
> >> >
> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com>
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I wrote a blog post a while back about connecting nodes via a
> gateway.
> >> > See
> >> > >
> >> >
> >>
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> >> > >
> >> > > This assumes that the client is outside the gateway and all
> >> > > datanodes/namenode are inside, but the same principles apply. You'll
> >> just
> >> > > need to set up ssh tunnels from every datanode to the namenode.
> >> > >
> >> > > - Aaron
> >> > >
> >> > >
> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> >> rphulari@yahoo-inc.com
> >> > >wrote:
> >> > >
> >> > >> Looks like your NameNode is down .
> >> > >> Verify if hadoop process are running (   jps should show you all
> java
> >> > >> running process).
> >> > >> If your hadoop process are running try restarting your hadoop
> process
> >> .
> >> > >> I guess this problem is due to your fsimage not being correct .
> >> > >> You might have to format your namenode.
> >> > >> Hope this helps.
> >> > >>
> >> > >> Thanks,
> >> > >> --
> >> > >> Ravi
> >> > >>
> >> > >>
> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
> >> > >>
> >> > >> The log file runs into thousands of line with the same message
> being
> >> > >> displayed every time.
> >> > >>
> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> mnagendr@asu.edu>
> >> > >> wrote:
> >> > >>
> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
> >> the
> >> > >> > following in it:
> >> > >> >
> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> >> > >> STARTUP_MSG:
> >> > >> > /************************************************************
> >> > >> > STARTUP_MSG: Starting DataNode
> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> >> > >> > STARTUP_MSG:   args = []
> >> > >> > STARTUP_MSG:   version = 0.18.3
> >> > >> > STARTUP_MSG:   build =
> >> > >> >
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> >> > >> > ************************************************************/
> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
> >> > >> node18/
> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at
> >> > >> node18/
> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > >> >
> >> > >> >
> >> > >> > Hmmm I still cant figure it out..
> >> > >> >
> >> > >> > Mithila
> >> > >> >
> >> > >> >
> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> >> mnagendr@asu.edu
> >> > >> >wrote:
> >> > >> >
> >> > >> >> Also, Would the way the port is accessed change if all these
> node
> >> are
> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml file?
> >> The
> >> > >> Ubuntu
> >> > >> >> systems we worked with earlier didnt have a gateway.
> >> > >> >> Mithila
> >> > >> >>
> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> >> mnagendr@asu.edu
> >> > >> >wrote:
> >> > >> >>
> >> > >> >>> Aaron: Which log file do I look into - there are alot of them.
> >> Here
> >> > s
> >> > >> >>> what the error looks like:
> >> > >> >>> [mithila@node19:~]$ cd hadoop
> >> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
> >> > node18/
> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> >> > >> >>> Bad connection to FS. command aborted.
> >> > >> >>>
> >> > >> >>> Node19 is a slave and Node18 is the master.
> >> > >> >>>
> >> > >> >>> Mithila
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> >> aaron@cloudera.com
> >> > >> >wrote:
> >> > >> >>>
> >> > >> >>>> Are there any error messages in the log files on those nodes?
> >> > >> >>>> - Aaron
> >> > >> >>>>
> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> >> > mnagendr@asu.edu>
> >> > >> >>>> wrote:
> >> > >> >>>>
> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with
> >> the
> >> > >> ports.
> >> > >> >>>> I
> >> > >> >>>> > can
> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
> slaves
> >> -
> >> > >> says
> >> > >> >>>> "Bad
> >> > >> >>>> > connection to DFS". Master seems to be fine.
> >> > >> >>>> > Mithila
> >> > >> >>>> >
> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> >> > >> mnagendr@asu.edu>
> >> > >> >>>> > wrote:
> >> > >> >>>> >
> >> > >> >>>> > > Yes I can..
> >> > >> >>>> > >
> >> > >> >>>> > >
> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> >> > >> jim.twensky@gmail.com
> >> > >> >>>> > >wrote:
> >> > >> >>>> > >
> >> > >> >>>> > >> Can you ssh between the nodes?
> >> > >> >>>> > >>
> >> > >> >>>> > >> -jim
> >> > >> >>>> > >>
> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> >> > >> >>>> mnagendr@asu.edu>
> >> > >> >>>> > >> wrote:
> >> > >> >>>> > >>
> >> > >> >>>> > >> > Thanks Aaron.
> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on
> >> them
> >> > and
> >> > >> >>>> the dfs
> >> > >> >>>> > >> was
> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
> setup
> >> has
> >> > >> Red
> >> > >> >>>> Hat
> >> > >> >>>> > >> Linux
> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
> >> access
> >> > >> the
> >> > >> >>>> dfs
> >> > >> >>>> > from
> >> > >> >>>> > >> > one
> >> > >> >>>> > >> > of the slaves i get the following response: dfs cannot
> be
> >> > >> >>>> accessed.
> >> > >> >>>> > When
> >> > >> >>>> > >> I
> >> > >> >>>> > >> > access the DFS throught the master there s no problem.
> So
> >> I
> >> > >> feel
> >> > >> >>>> there
> >> > >> >>>> > a
> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the list
> of
> >> > >> slaves,
> >> > >> >>>> it
> >> > >> >>>> > >> looks
> >> > >> >>>> > >> > fine to me.
> >> > >> >>>> > >> >
> >> > >> >>>> > >> > Mithila
> >> > >> >>>> > >> >
> >> > >> >>>> > >> >
> >> > >> >>>> > >> >
> >> > >> >>>> > >> >
> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> >> > >> >>>> jim.twensky@gmail.com>
> >> > >> >>>> > >> > wrote:
> >> > >> >>>> > >> >
> >> > >> >>>> > >> > > Mithila,
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> > > You said all the slaves were being utilized in the 3
> >> node
> >> > >> >>>> cluster.
> >> > >> >>>> > >> Which
> >> > >> >>>> > >> > > application did you run to test that and what was
> your
> >> > input
> >> > >> >>>> size?
> >> > >> >>>> > If
> >> > >> >>>> > >> you
> >> > >> >>>> > >> > > tried the word count application on a 516 MB input
> file
> >> on
> >> > >> both
> >> > >> >>>> > >> cluster
> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
> cluster
> >> may
> >> > >> not
> >> > >> >>>> be
> >> > >> >>>> > >> running
> >> > >> >>>> > >> > > at
> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each input
> >> > split
> >> > >> and
> >> > >> >>>> if
> >> > >> >>>> > you
> >> > >> >>>> > >> > are
> >> > >> >>>> > >> > > running your cluster with the defaults, the splits
> are
> >> 64
> >> > MB
> >> > >> >>>> each. I
> >> > >> >>>> > >> got
> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do all
> >> the
> >> > >> work.
> >> > >> >>>> Can
> >> > >> >>>> > you
> >> > >> >>>> > >> > > check
> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of all
> task
> >> > >> >>>> trackers
> >> > >> >>>> > >> there? I
> >> > >> >>>> > >> > > also suggest comparing both clusters with a larger
> >> input
> >> > >> size,
> >> > >> >>>> say
> >> > >> >>>> > at
> >> > >> >>>> > >> > least
> >> > >> >>>> > >> > > 5 GB, to really see a difference.
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> > > Jim
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> >> > >> >>>> aaron@cloudera.com>
> >> > >> >>>> > >> > wrote:
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
> >> generate
> >> > >> the
> >> > >> >>>> data
> >> > >> >>>> > >> and
> >> > >> >>>> > >> > > > "sort"
> >> > >> >>>> > >> > > > to sort it.
> >> > >> >>>> > >> > > > - Aaron
> >> > >> >>>> > >> > > >
> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> >> > >> >>>> > forpankil@gmail.com>
> >> > >> >>>> > >> > > wrote:
> >> > >> >>>> > >> > > >
> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters
> ..So
> >> it
> >> > >> >>>> might be
> >> > >> >>>> > >> > > overhead
> >> > >> >>>> > >> > > > > time of these clusters making your total MR jobs
> >> more
> >> > >> time
> >> > >> >>>> > >> consuming.
> >> > >> >>>> > >> > > > > I guess you will have to try with larger set of
> >> data..
> >> > >> >>>> > >> > > > >
> >> > >> >>>> > >> > > > > Pankil
> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra
> <
> >> > >> >>>> > >> mnagendr@asu.edu>
> >> > >> >>>> > >> > > > > wrote:
> >> > >> >>>> > >> > > > >
> >> > >> >>>> > >> > > > > > Aaron
> >> > >> >>>> > >> > > > > >
> >> > >> >>>> > >> > > > > > That could be the issue, my data is just 516MB
> -
> >> > >> wouldn't
> >> > >> >>>> this
> >> > >> >>>> > >> see
> >> > >> >>>> > >> > a
> >> > >> >>>> > >> > > > bit
> >> > >> >>>> > >> > > > > of
> >> > >> >>>> > >> > > > > > speed up?
> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
> >> > cluster
> >> > >> on
> >> > >> >>>> it
> >> > >> >>>> > and
> >> > >> >>>> > >> > see
> >> > >> >>>> > >> > > > what
> >> > >> >>>> > >> > > > > I
> >> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
> >> running
> >> > to
> >> > >> >>>> record
> >> > >> >>>> > >> the
> >> > >> >>>> > >> > > time
> >> > >> >>>> > >> > > > > > taken
> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
> >> inbuilt
> >> > >> timer?
> >> > >> >>>> > >> > > > > >
> >> > >> >>>> > >> > > > > > Mithila
> >> > >> >>>> > >> > > > > >
> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball
> <
> >> > >> >>>> > >> aaron@cloudera.com
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> > > > > wrote:
> >> > >> >>>> > >> > > > > >
> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship with
> >> > Hadoop
> >> > >> >>>> are
> >> > >> >>>> > >> designed
> >> > >> >>>> > >> > > to
> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes
> from
> >> > its
> >> > >> >>>> ability
> >> > >> >>>> > to
> >> > >> >>>> > >> > > > process
> >> > >> >>>> > >> > > > > > very
> >> > >> >>>> > >> > > > > > > large volumes of data (starting around, say,
> >> tens
> >> > of
> >> > >> GB
> >> > >> >>>> per
> >> > >> >>>> > >> job,
> >> > >> >>>> > >> > > and
> >> > >> >>>> > >> > > > > > going
> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So if
> >> you
> >> > are
> >> > >> >>>> timing
> >> > >> >>>> > >> the
> >> > >> >>>> > >> > pi
> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
> >> results
> >> > >> won't
> >> > >> >>>> > >> > necessarily
> >> > >> >>>> > >> > > be
> >> > >> >>>> > >> > > > > > very
> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
> >> fragments
> >> > >> of
> >> > >> >>>> data
> >> > >> >>>> > to
> >> > >> >>>> > >> > > > allocate
> >> > >> >>>> > >> > > > > > one
> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will also
> just
> >> go
> >> > >> >>>> unused.
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > > > The best example for you to run is to use
> >> > >> randomwriter
> >> > >> >>>> to
> >> > >> >>>> > fill
> >> > >> >>>> > >> up
> >> > >> >>>> > >> > > > your
> >> > >> >>>> > >> > > > > > > cluster with several GB of random data and
> then
> >> > run
> >> > >> the
> >> > >> >>>> sort
> >> > >> >>>> > >> > > program.
> >> > >> >>>> > >> > > > > If
> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
> nodes
> >> to
> >> > >> 15,
> >> > >> >>>> then
> >> > >> >>>> > >> you've
> >> > >> >>>> > >> > > > > > > definitely
> >> > >> >>>> > >> > > > > > > got something strange going on.
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > > > - Aaron
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
> >> Nagendra
> >> > <
> >> > >> >>>> > >> > > mnagendr@asu.edu>
> >> > >> >>>> > >> > > > > > > wrote:
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > > > > Hey all
> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
> cluster
> >> and
> >> > >> ran
> >> > >> >>>> an
> >> > >> >>>> > >> > examples
> >> > >> >>>> > >> > > on
> >> > >> >>>> > >> > > > > it.
> >> > >> >>>> > >> > > > > > > It
> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes
> were
> >> > >> being
> >> > >> >>>> used
> >> > >> >>>> > (I
> >> > >> >>>> > >> > > checked
> >> > >> >>>> > >> > > > > the
> >> > >> >>>> > >> > > > > > > log
> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
> >> > utilized).
> >> > >> >>>> > >> > > > > > > >
> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting
> of
> >> 15
> >> > >> >>>> nodes. I
> >> > >> >>>> > ran
> >> > >> >>>> > >> > the
> >> > >> >>>> > >> > > > same
> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
> >> > >> map-reduce
> >> > >> >>>> task
> >> > >> >>>> > >> seems
> >> > >> >>>> > >> > to
> >> > >> >>>> > >> > > > > take
> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used for
> >> some
> >> > >> >>>> reason.
> >> > >> >>>> > This
> >> > >> >>>> > >> > > second
> >> > >> >>>> > >> > > > > > > cluster
> >> > >> >>>> > >> > > > > > > > has a lower, per node processing power, but
> >> > should
> >> > >> >>>> that
> >> > >> >>>> > make
> >> > >> >>>> > >> > any
> >> > >> >>>> > >> > > > > > > > difference?
> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is being
> >> mapped
> >> > to
> >> > >> all
> >> > >> >>>> the
> >> > >> >>>> > >> > nodes?
> >> > >> >>>> > >> > > > > > > Presently,
> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing all
> the
> >> > work
> >> > >> is
> >> > >> >>>> the
> >> > >> >>>> > >> Master
> >> > >> >>>> > >> > > > node.
> >> > >> >>>> > >> > > > > > > >
> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
> >> network
> >> > >> cost?
> >> > >> >>>> What
> >> > >> >>>> > >> can
> >> > >> >>>> > >> > I
> >> > >> >>>> > >> > > do
> >> > >> >>>> > >> > > > > to
> >> > >> >>>> > >> > > > > > > > setup
> >> > >> >>>> > >> > > > > > > > the cluster to function more efficiently?
> >> > >> >>>> > >> > > > > > > >
> >> > >> >>>> > >> > > > > > > > Thanks!
> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
> >> > >> >>>> > >> > > > > > > > Arizona State University
> >> > >> >>>> > >> > > > > > > >
> >> > >> >>>> > >> > > > > > >
> >> > >> >>>> > >> > > > > >
> >> > >> >>>> > >> > > > >
> >> > >> >>>> > >> > > >
> >> > >> >>>> > >> > >
> >> > >> >>>> > >> >
> >> > >> >>>> > >>
> >> > >> >>>> > >
> >> > >> >>>> > >
> >> > >> >>>> >
> >> > >> >>>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >>
> >> > >> Ravi
> >> > >> --
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Alpha Chapters of my book on Hadoop are available
> >> http://www.apress.com/book/view/9781430219422
> >>
> >
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Jason: the kickstart script - was it something you wrote or is it run when
the system turns on?
Mithila

On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> Thanks Jason! Will check that out.
> Mithila
>
>
> On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <ja...@gmail.com>wrote:
>
>> Double check that there is no firewall in place.
>> At one point a bunch of new machines were kickstarted and placed in a
>> cluster and they all failed with something similar.
>> It turned out the kickstart script turned enabled the firewall with a rule
>> that blocked ports in the 50k range.
>> It took us a while to even think to check that was not a part of our
>> normal
>> machine configuration
>>
>> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > Hi Aaron
>> > I will look into that thanks!
>> >
>> > I spoke to the admin who overlooks the cluster. He said that the gateway
>> > comes in to the picture only when one of the nodes communicates with a
>> node
>> > outside of the cluster. But in my case the communication is carried out
>> > between the nodes which all belong to the same cluster.
>> >
>> > Mithila
>> >
>> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I wrote a blog post a while back about connecting nodes via a gateway.
>> > See
>> > >
>> >
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> > >
>> > > This assumes that the client is outside the gateway and all
>> > > datanodes/namenode are inside, but the same principles apply. You'll
>> just
>> > > need to set up ssh tunnels from every datanode to the namenode.
>> > >
>> > > - Aaron
>> > >
>> > >
>> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> rphulari@yahoo-inc.com
>> > >wrote:
>> > >
>> > >> Looks like your NameNode is down .
>> > >> Verify if hadoop process are running (   jps should show you all java
>> > >> running process).
>> > >> If your hadoop process are running try restarting your hadoop process
>> .
>> > >> I guess this problem is due to your fsimage not being correct .
>> > >> You might have to format your namenode.
>> > >> Hope this helps.
>> > >>
>> > >> Thanks,
>> > >> --
>> > >> Ravi
>> > >>
>> > >>
>> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
>> > >>
>> > >> The log file runs into thousands of line with the same message being
>> > >> displayed every time.
>> > >>
>> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu>
>> > >> wrote:
>> > >>
>> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
>> the
>> > >> > following in it:
>> > >> >
>> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> > >> STARTUP_MSG:
>> > >> > /************************************************************
>> > >> > STARTUP_MSG: Starting DataNode
>> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> > >> > STARTUP_MSG:   args = []
>> > >> > STARTUP_MSG:   version = 0.18.3
>> > >> > STARTUP_MSG:   build =
>> > >> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> > >> > ************************************************************/
>> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
>> > >> node18/
>> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at
>> > >> node18/
>> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > >> >
>> > >> >
>> > >> > Hmmm I still cant figure it out..
>> > >> >
>> > >> > Mithila
>> > >> >
>> > >> >
>> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
>> mnagendr@asu.edu
>> > >> >wrote:
>> > >> >
>> > >> >> Also, Would the way the port is accessed change if all these node
>> are
>> > >> >> connected through a gateway? I mean in the hadoop-site.xml file?
>> The
>> > >> Ubuntu
>> > >> >> systems we worked with earlier didnt have a gateway.
>> > >> >> Mithila
>> > >> >>
>> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
>> mnagendr@asu.edu
>> > >> >wrote:
>> > >> >>
>> > >> >>> Aaron: Which log file do I look into - there are alot of them.
>> Here
>> > s
>> > >> >>> what the error looks like:
>> > >> >>> [mithila@node19:~]$ cd hadoop
>> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
>> > node18/
>> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> > >> >>> Bad connection to FS. command aborted.
>> > >> >>>
>> > >> >>> Node19 is a slave and Node18 is the master.
>> > >> >>>
>> > >> >>> Mithila
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
>> aaron@cloudera.com
>> > >> >wrote:
>> > >> >>>
>> > >> >>>> Are there any error messages in the log files on those nodes?
>> > >> >>>> - Aaron
>> > >> >>>>
>> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
>> > mnagendr@asu.edu>
>> > >> >>>> wrote:
>> > >> >>>>
>> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with
>> the
>> > >> ports.
>> > >> >>>> I
>> > >> >>>> > can
>> > >> >>>> > ssh between the nodes but cant access the DFS from the slaves
>> -
>> > >> says
>> > >> >>>> "Bad
>> > >> >>>> > connection to DFS". Master seems to be fine.
>> > >> >>>> > Mithila
>> > >> >>>> >
>> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> > >> mnagendr@asu.edu>
>> > >> >>>> > wrote:
>> > >> >>>> >
>> > >> >>>> > > Yes I can..
>> > >> >>>> > >
>> > >> >>>> > >
>> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> > >> jim.twensky@gmail.com
>> > >> >>>> > >wrote:
>> > >> >>>> > >
>> > >> >>>> > >> Can you ssh between the nodes?
>> > >> >>>> > >>
>> > >> >>>> > >> -jim
>> > >> >>>> > >>
>> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>> > >> >>>> mnagendr@asu.edu>
>> > >> >>>> > >> wrote:
>> > >> >>>> > >>
>> > >> >>>> > >> > Thanks Aaron.
>> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on
>> them
>> > and
>> > >> >>>> the dfs
>> > >> >>>> > >> was
>> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve setup
>> has
>> > >> Red
>> > >> >>>> Hat
>> > >> >>>> > >> Linux
>> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
>> access
>> > >> the
>> > >> >>>> dfs
>> > >> >>>> > from
>> > >> >>>> > >> > one
>> > >> >>>> > >> > of the slaves i get the following response: dfs cannot be
>> > >> >>>> accessed.
>> > >> >>>> > When
>> > >> >>>> > >> I
>> > >> >>>> > >> > access the DFS throught the master there s no problem. So
>> I
>> > >> feel
>> > >> >>>> there
>> > >> >>>> > a
>> > >> >>>> > >> > problem with the port. Any ideas? I did check the list of
>> > >> slaves,
>> > >> >>>> it
>> > >> >>>> > >> looks
>> > >> >>>> > >> > fine to me.
>> > >> >>>> > >> >
>> > >> >>>> > >> > Mithila
>> > >> >>>> > >> >
>> > >> >>>> > >> >
>> > >> >>>> > >> >
>> > >> >>>> > >> >
>> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> > >> >>>> jim.twensky@gmail.com>
>> > >> >>>> > >> > wrote:
>> > >> >>>> > >> >
>> > >> >>>> > >> > > Mithila,
>> > >> >>>> > >> > >
>> > >> >>>> > >> > > You said all the slaves were being utilized in the 3
>> node
>> > >> >>>> cluster.
>> > >> >>>> > >> Which
>> > >> >>>> > >> > > application did you run to test that and what was your
>> > input
>> > >> >>>> size?
>> > >> >>>> > If
>> > >> >>>> > >> you
>> > >> >>>> > >> > > tried the word count application on a 516 MB input file
>> on
>> > >> both
>> > >> >>>> > >> cluster
>> > >> >>>> > >> > > setups, than some of your nodes in the 15 node cluster
>> may
>> > >> not
>> > >> >>>> be
>> > >> >>>> > >> running
>> > >> >>>> > >> > > at
>> > >> >>>> > >> > > all. Generally, one map job is assigned to each input
>> > split
>> > >> and
>> > >> >>>> if
>> > >> >>>> > you
>> > >> >>>> > >> > are
>> > >> >>>> > >> > > running your cluster with the defaults, the splits are
>> 64
>> > MB
>> > >> >>>> each. I
>> > >> >>>> > >> got
>> > >> >>>> > >> > > confused when you said the Namenode seemed to do all
>> the
>> > >> work.
>> > >> >>>> Can
>> > >> >>>> > you
>> > >> >>>> > >> > > check
>> > >> >>>> > >> > > conf/slaves and make sure you put the names of all task
>> > >> >>>> trackers
>> > >> >>>> > >> there? I
>> > >> >>>> > >> > > also suggest comparing both clusters with a larger
>> input
>> > >> size,
>> > >> >>>> say
>> > >> >>>> > at
>> > >> >>>> > >> > least
>> > >> >>>> > >> > > 5 GB, to really see a difference.
>> > >> >>>> > >> > >
>> > >> >>>> > >> > > Jim
>> > >> >>>> > >> > >
>> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> > >> >>>> aaron@cloudera.com>
>> > >> >>>> > >> > wrote:
>> > >> >>>> > >> > >
>> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
>> generate
>> > >> the
>> > >> >>>> data
>> > >> >>>> > >> and
>> > >> >>>> > >> > > > "sort"
>> > >> >>>> > >> > > > to sort it.
>> > >> >>>> > >> > > > - Aaron
>> > >> >>>> > >> > > >
>> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> > >> >>>> > forpankil@gmail.com>
>> > >> >>>> > >> > > wrote:
>> > >> >>>> > >> > > >
>> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So
>> it
>> > >> >>>> might be
>> > >> >>>> > >> > > overhead
>> > >> >>>> > >> > > > > time of these clusters making your total MR jobs
>> more
>> > >> time
>> > >> >>>> > >> consuming.
>> > >> >>>> > >> > > > > I guess you will have to try with larger set of
>> data..
>> > >> >>>> > >> > > > >
>> > >> >>>> > >> > > > > Pankil
>> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>> > >> >>>> > >> mnagendr@asu.edu>
>> > >> >>>> > >> > > > > wrote:
>> > >> >>>> > >> > > > >
>> > >> >>>> > >> > > > > > Aaron
>> > >> >>>> > >> > > > > >
>> > >> >>>> > >> > > > > > That could be the issue, my data is just 516MB -
>> > >> wouldn't
>> > >> >>>> this
>> > >> >>>> > >> see
>> > >> >>>> > >> > a
>> > >> >>>> > >> > > > bit
>> > >> >>>> > >> > > > > of
>> > >> >>>> > >> > > > > > speed up?
>> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
>> > cluster
>> > >> on
>> > >> >>>> it
>> > >> >>>> > and
>> > >> >>>> > >> > see
>> > >> >>>> > >> > > > what
>> > >> >>>> > >> > > > > I
>> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
>> running
>> > to
>> > >> >>>> record
>> > >> >>>> > >> the
>> > >> >>>> > >> > > time
>> > >> >>>> > >> > > > > > taken
>> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
>> inbuilt
>> > >> timer?
>> > >> >>>> > >> > > > > >
>> > >> >>>> > >> > > > > > Mithila
>> > >> >>>> > >> > > > > >
>> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>> > >> >>>> > >> aaron@cloudera.com
>> > >> >>>> > >> > >
>> > >> >>>> > >> > > > > wrote:
>> > >> >>>> > >> > > > > >
>> > >> >>>> > >> > > > > > > Virtually none of the examples that ship with
>> > Hadoop
>> > >> >>>> are
>> > >> >>>> > >> designed
>> > >> >>>> > >> > > to
>> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from
>> > its
>> > >> >>>> ability
>> > >> >>>> > to
>> > >> >>>> > >> > > > process
>> > >> >>>> > >> > > > > > very
>> > >> >>>> > >> > > > > > > large volumes of data (starting around, say,
>> tens
>> > of
>> > >> GB
>> > >> >>>> per
>> > >> >>>> > >> job,
>> > >> >>>> > >> > > and
>> > >> >>>> > >> > > > > > going
>> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So if
>> you
>> > are
>> > >> >>>> timing
>> > >> >>>> > >> the
>> > >> >>>> > >> > pi
>> > >> >>>> > >> > > > > > > calculator (or something like that), its
>> results
>> > >> won't
>> > >> >>>> > >> > necessarily
>> > >> >>>> > >> > > be
>> > >> >>>> > >> > > > > > very
>> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
>> fragments
>> > >> of
>> > >> >>>> data
>> > >> >>>> > to
>> > >> >>>> > >> > > > allocate
>> > >> >>>> > >> > > > > > one
>> > >> >>>> > >> > > > > > > per each node, some of the nodes will also just
>> go
>> > >> >>>> unused.
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > > > The best example for you to run is to use
>> > >> randomwriter
>> > >> >>>> to
>> > >> >>>> > fill
>> > >> >>>> > >> up
>> > >> >>>> > >> > > > your
>> > >> >>>> > >> > > > > > > cluster with several GB of random data and then
>> > run
>> > >> the
>> > >> >>>> sort
>> > >> >>>> > >> > > program.
>> > >> >>>> > >> > > > > If
>> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes
>> to
>> > >> 15,
>> > >> >>>> then
>> > >> >>>> > >> you've
>> > >> >>>> > >> > > > > > > definitely
>> > >> >>>> > >> > > > > > > got something strange going on.
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > > > - Aaron
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
>> Nagendra
>> > <
>> > >> >>>> > >> > > mnagendr@asu.edu>
>> > >> >>>> > >> > > > > > > wrote:
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > > > > Hey all
>> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop cluster
>> and
>> > >> ran
>> > >> >>>> an
>> > >> >>>> > >> > examples
>> > >> >>>> > >> > > on
>> > >> >>>> > >> > > > > it.
>> > >> >>>> > >> > > > > > > It
>> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes were
>> > >> being
>> > >> >>>> used
>> > >> >>>> > (I
>> > >> >>>> > >> > > checked
>> > >> >>>> > >> > > > > the
>> > >> >>>> > >> > > > > > > log
>> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
>> > utilized).
>> > >> >>>> > >> > > > > > > >
>> > >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting of
>> 15
>> > >> >>>> nodes. I
>> > >> >>>> > ran
>> > >> >>>> > >> > the
>> > >> >>>> > >> > > > same
>> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
>> > >> map-reduce
>> > >> >>>> task
>> > >> >>>> > >> seems
>> > >> >>>> > >> > to
>> > >> >>>> > >> > > > > take
>> > >> >>>> > >> > > > > > > > forever! The slaves are not being used for
>> some
>> > >> >>>> reason.
>> > >> >>>> > This
>> > >> >>>> > >> > > second
>> > >> >>>> > >> > > > > > > cluster
>> > >> >>>> > >> > > > > > > > has a lower, per node processing power, but
>> > should
>> > >> >>>> that
>> > >> >>>> > make
>> > >> >>>> > >> > any
>> > >> >>>> > >> > > > > > > > difference?
>> > >> >>>> > >> > > > > > > > How can I ensure that the data is being
>> mapped
>> > to
>> > >> all
>> > >> >>>> the
>> > >> >>>> > >> > nodes?
>> > >> >>>> > >> > > > > > > Presently,
>> > >> >>>> > >> > > > > > > > the only node that seems to be doing all the
>> > work
>> > >> is
>> > >> >>>> the
>> > >> >>>> > >> Master
>> > >> >>>> > >> > > > node.
>> > >> >>>> > >> > > > > > > >
>> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
>> network
>> > >> cost?
>> > >> >>>> What
>> > >> >>>> > >> can
>> > >> >>>> > >> > I
>> > >> >>>> > >> > > do
>> > >> >>>> > >> > > > > to
>> > >> >>>> > >> > > > > > > > setup
>> > >> >>>> > >> > > > > > > > the cluster to function more efficiently?
>> > >> >>>> > >> > > > > > > >
>> > >> >>>> > >> > > > > > > > Thanks!
>> > >> >>>> > >> > > > > > > > Mithila Nagendra
>> > >> >>>> > >> > > > > > > > Arizona State University
>> > >> >>>> > >> > > > > > > >
>> > >> >>>> > >> > > > > > >
>> > >> >>>> > >> > > > > >
>> > >> >>>> > >> > > > >
>> > >> >>>> > >> > > >
>> > >> >>>> > >> > >
>> > >> >>>> > >> >
>> > >> >>>> > >>
>> > >> >>>> > >
>> > >> >>>> > >
>> > >> >>>> >
>> > >> >>>>
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >> >
>> > >>
>> > >>
>> > >> Ravi
>> > >> --
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Thanks Jason! Will check that out.
Mithila

On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <ja...@gmail.com>wrote:

> Double check that there is no firewall in place.
> At one point a bunch of new machines were kickstarted and placed in a
> cluster and they all failed with something similar.
> It turned out the kickstart script turned enabled the firewall with a rule
> that blocked ports in the 50k range.
> It took us a while to even think to check that was not a part of our normal
> machine configuration
>
> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Hi Aaron
> > I will look into that thanks!
> >
> > I spoke to the admin who overlooks the cluster. He said that the gateway
> > comes in to the picture only when one of the nodes communicates with a
> node
> > outside of the cluster. But in my case the communication is carried out
> > between the nodes which all belong to the same cluster.
> >
> > Mithila
> >
> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com>
> wrote:
> >
> > > Hi,
> > >
> > > I wrote a blog post a while back about connecting nodes via a gateway.
> > See
> > >
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > >
> > > This assumes that the client is outside the gateway and all
> > > datanodes/namenode are inside, but the same principles apply. You'll
> just
> > > need to set up ssh tunnels from every datanode to the namenode.
> > >
> > > - Aaron
> > >
> > >
> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <rphulari@yahoo-inc.com
> > >wrote:
> > >
> > >> Looks like your NameNode is down .
> > >> Verify if hadoop process are running (   jps should show you all java
> > >> running process).
> > >> If your hadoop process are running try restarting your hadoop process
> .
> > >> I guess this problem is due to your fsimage not being correct .
> > >> You might have to format your namenode.
> > >> Hope this helps.
> > >>
> > >> Thanks,
> > >> --
> > >> Ravi
> > >>
> > >>
> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
> > >>
> > >> The log file runs into thousands of line with the same message being
> > >> displayed every time.
> > >>
> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu>
> > >> wrote:
> > >>
> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> > >> > following in it:
> > >> >
> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> > >> STARTUP_MSG:
> > >> > /************************************************************
> > >> > STARTUP_MSG: Starting DataNode
> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> > >> > STARTUP_MSG:   args = []
> > >> > STARTUP_MSG:   version = 0.18.3
> > >> > STARTUP_MSG:   build =
> > >> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> > >> > ************************************************************/
> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
> > >> node18/
> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at
> > >> node18/
> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > >> >
> > >> >
> > >> > Hmmm I still cant figure it out..
> > >> >
> > >> > Mithila
> > >> >
> > >> >
> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> mnagendr@asu.edu
> > >> >wrote:
> > >> >
> > >> >> Also, Would the way the port is accessed change if all these node
> are
> > >> >> connected through a gateway? I mean in the hadoop-site.xml file?
> The
> > >> Ubuntu
> > >> >> systems we worked with earlier didnt have a gateway.
> > >> >> Mithila
> > >> >>
> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> mnagendr@asu.edu
> > >> >wrote:
> > >> >>
> > >> >>> Aaron: Which log file do I look into - there are alot of them.
> Here
> > s
> > >> >>> what the error looks like:
> > >> >>> [mithila@node19:~]$ cd hadoop
> > >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
> > node18/
> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> > >> >>> Bad connection to FS. command aborted.
> > >> >>>
> > >> >>> Node19 is a slave and Node18 is the master.
> > >> >>>
> > >> >>> Mithila
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> aaron@cloudera.com
> > >> >wrote:
> > >> >>>
> > >> >>>> Are there any error messages in the log files on those nodes?
> > >> >>>> - Aaron
> > >> >>>>
> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> > mnagendr@asu.edu>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with the
> > >> ports.
> > >> >>>> I
> > >> >>>> > can
> > >> >>>> > ssh between the nodes but cant access the DFS from the slaves -
> > >> says
> > >> >>>> "Bad
> > >> >>>> > connection to DFS". Master seems to be fine.
> > >> >>>> > Mithila
> > >> >>>> >
> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> > >> mnagendr@asu.edu>
> > >> >>>> > wrote:
> > >> >>>> >
> > >> >>>> > > Yes I can..
> > >> >>>> > >
> > >> >>>> > >
> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> > >> jim.twensky@gmail.com
> > >> >>>> > >wrote:
> > >> >>>> > >
> > >> >>>> > >> Can you ssh between the nodes?
> > >> >>>> > >>
> > >> >>>> > >> -jim
> > >> >>>> > >>
> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> > >> >>>> mnagendr@asu.edu>
> > >> >>>> > >> wrote:
> > >> >>>> > >>
> > >> >>>> > >> > Thanks Aaron.
> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on them
> > and
> > >> >>>> the dfs
> > >> >>>> > >> was
> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve setup
> has
> > >> Red
> > >> >>>> Hat
> > >> >>>> > >> Linux
> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
> access
> > >> the
> > >> >>>> dfs
> > >> >>>> > from
> > >> >>>> > >> > one
> > >> >>>> > >> > of the slaves i get the following response: dfs cannot be
> > >> >>>> accessed.
> > >> >>>> > When
> > >> >>>> > >> I
> > >> >>>> > >> > access the DFS throught the master there s no problem. So
> I
> > >> feel
> > >> >>>> there
> > >> >>>> > a
> > >> >>>> > >> > problem with the port. Any ideas? I did check the list of
> > >> slaves,
> > >> >>>> it
> > >> >>>> > >> looks
> > >> >>>> > >> > fine to me.
> > >> >>>> > >> >
> > >> >>>> > >> > Mithila
> > >> >>>> > >> >
> > >> >>>> > >> >
> > >> >>>> > >> >
> > >> >>>> > >> >
> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> > >> >>>> jim.twensky@gmail.com>
> > >> >>>> > >> > wrote:
> > >> >>>> > >> >
> > >> >>>> > >> > > Mithila,
> > >> >>>> > >> > >
> > >> >>>> > >> > > You said all the slaves were being utilized in the 3
> node
> > >> >>>> cluster.
> > >> >>>> > >> Which
> > >> >>>> > >> > > application did you run to test that and what was your
> > input
> > >> >>>> size?
> > >> >>>> > If
> > >> >>>> > >> you
> > >> >>>> > >> > > tried the word count application on a 516 MB input file
> on
> > >> both
> > >> >>>> > >> cluster
> > >> >>>> > >> > > setups, than some of your nodes in the 15 node cluster
> may
> > >> not
> > >> >>>> be
> > >> >>>> > >> running
> > >> >>>> > >> > > at
> > >> >>>> > >> > > all. Generally, one map job is assigned to each input
> > split
> > >> and
> > >> >>>> if
> > >> >>>> > you
> > >> >>>> > >> > are
> > >> >>>> > >> > > running your cluster with the defaults, the splits are
> 64
> > MB
> > >> >>>> each. I
> > >> >>>> > >> got
> > >> >>>> > >> > > confused when you said the Namenode seemed to do all the
> > >> work.
> > >> >>>> Can
> > >> >>>> > you
> > >> >>>> > >> > > check
> > >> >>>> > >> > > conf/slaves and make sure you put the names of all task
> > >> >>>> trackers
> > >> >>>> > >> there? I
> > >> >>>> > >> > > also suggest comparing both clusters with a larger input
> > >> size,
> > >> >>>> say
> > >> >>>> > at
> > >> >>>> > >> > least
> > >> >>>> > >> > > 5 GB, to really see a difference.
> > >> >>>> > >> > >
> > >> >>>> > >> > > Jim
> > >> >>>> > >> > >
> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> > >> >>>> aaron@cloudera.com>
> > >> >>>> > >> > wrote:
> > >> >>>> > >> > >
> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
> generate
> > >> the
> > >> >>>> data
> > >> >>>> > >> and
> > >> >>>> > >> > > > "sort"
> > >> >>>> > >> > > > to sort it.
> > >> >>>> > >> > > > - Aaron
> > >> >>>> > >> > > >
> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> > >> >>>> > forpankil@gmail.com>
> > >> >>>> > >> > > wrote:
> > >> >>>> > >> > > >
> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So
> it
> > >> >>>> might be
> > >> >>>> > >> > > overhead
> > >> >>>> > >> > > > > time of these clusters making your total MR jobs
> more
> > >> time
> > >> >>>> > >> consuming.
> > >> >>>> > >> > > > > I guess you will have to try with larger set of
> data..
> > >> >>>> > >> > > > >
> > >> >>>> > >> > > > > Pankil
> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> > >> >>>> > >> mnagendr@asu.edu>
> > >> >>>> > >> > > > > wrote:
> > >> >>>> > >> > > > >
> > >> >>>> > >> > > > > > Aaron
> > >> >>>> > >> > > > > >
> > >> >>>> > >> > > > > > That could be the issue, my data is just 516MB -
> > >> wouldn't
> > >> >>>> this
> > >> >>>> > >> see
> > >> >>>> > >> > a
> > >> >>>> > >> > > > bit
> > >> >>>> > >> > > > > of
> > >> >>>> > >> > > > > > speed up?
> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
> > cluster
> > >> on
> > >> >>>> it
> > >> >>>> > and
> > >> >>>> > >> > see
> > >> >>>> > >> > > > what
> > >> >>>> > >> > > > > I
> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
> running
> > to
> > >> >>>> record
> > >> >>>> > >> the
> > >> >>>> > >> > > time
> > >> >>>> > >> > > > > > taken
> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt
> > >> timer?
> > >> >>>> > >> > > > > >
> > >> >>>> > >> > > > > > Mithila
> > >> >>>> > >> > > > > >
> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> > >> >>>> > >> aaron@cloudera.com
> > >> >>>> > >> > >
> > >> >>>> > >> > > > > wrote:
> > >> >>>> > >> > > > > >
> > >> >>>> > >> > > > > > > Virtually none of the examples that ship with
> > Hadoop
> > >> >>>> are
> > >> >>>> > >> designed
> > >> >>>> > >> > > to
> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from
> > its
> > >> >>>> ability
> > >> >>>> > to
> > >> >>>> > >> > > > process
> > >> >>>> > >> > > > > > very
> > >> >>>> > >> > > > > > > large volumes of data (starting around, say,
> tens
> > of
> > >> GB
> > >> >>>> per
> > >> >>>> > >> job,
> > >> >>>> > >> > > and
> > >> >>>> > >> > > > > > going
> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So if you
> > are
> > >> >>>> timing
> > >> >>>> > >> the
> > >> >>>> > >> > pi
> > >> >>>> > >> > > > > > > calculator (or something like that), its results
> > >> won't
> > >> >>>> > >> > necessarily
> > >> >>>> > >> > > be
> > >> >>>> > >> > > > > > very
> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
> fragments
> > >> of
> > >> >>>> data
> > >> >>>> > to
> > >> >>>> > >> > > > allocate
> > >> >>>> > >> > > > > > one
> > >> >>>> > >> > > > > > > per each node, some of the nodes will also just
> go
> > >> >>>> unused.
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > > > The best example for you to run is to use
> > >> randomwriter
> > >> >>>> to
> > >> >>>> > fill
> > >> >>>> > >> up
> > >> >>>> > >> > > > your
> > >> >>>> > >> > > > > > > cluster with several GB of random data and then
> > run
> > >> the
> > >> >>>> sort
> > >> >>>> > >> > > program.
> > >> >>>> > >> > > > > If
> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes
> to
> > >> 15,
> > >> >>>> then
> > >> >>>> > >> you've
> > >> >>>> > >> > > > > > > definitely
> > >> >>>> > >> > > > > > > got something strange going on.
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > > > - Aaron
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
> Nagendra
> > <
> > >> >>>> > >> > > mnagendr@asu.edu>
> > >> >>>> > >> > > > > > > wrote:
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > > > > Hey all
> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop cluster
> and
> > >> ran
> > >> >>>> an
> > >> >>>> > >> > examples
> > >> >>>> > >> > > on
> > >> >>>> > >> > > > > it.
> > >> >>>> > >> > > > > > > It
> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes were
> > >> being
> > >> >>>> used
> > >> >>>> > (I
> > >> >>>> > >> > > checked
> > >> >>>> > >> > > > > the
> > >> >>>> > >> > > > > > > log
> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
> > utilized).
> > >> >>>> > >> > > > > > > >
> > >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting of
> 15
> > >> >>>> nodes. I
> > >> >>>> > ran
> > >> >>>> > >> > the
> > >> >>>> > >> > > > same
> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
> > >> map-reduce
> > >> >>>> task
> > >> >>>> > >> seems
> > >> >>>> > >> > to
> > >> >>>> > >> > > > > take
> > >> >>>> > >> > > > > > > > forever! The slaves are not being used for
> some
> > >> >>>> reason.
> > >> >>>> > This
> > >> >>>> > >> > > second
> > >> >>>> > >> > > > > > > cluster
> > >> >>>> > >> > > > > > > > has a lower, per node processing power, but
> > should
> > >> >>>> that
> > >> >>>> > make
> > >> >>>> > >> > any
> > >> >>>> > >> > > > > > > > difference?
> > >> >>>> > >> > > > > > > > How can I ensure that the data is being mapped
> > to
> > >> all
> > >> >>>> the
> > >> >>>> > >> > nodes?
> > >> >>>> > >> > > > > > > Presently,
> > >> >>>> > >> > > > > > > > the only node that seems to be doing all the
> > work
> > >> is
> > >> >>>> the
> > >> >>>> > >> Master
> > >> >>>> > >> > > > node.
> > >> >>>> > >> > > > > > > >
> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
> network
> > >> cost?
> > >> >>>> What
> > >> >>>> > >> can
> > >> >>>> > >> > I
> > >> >>>> > >> > > do
> > >> >>>> > >> > > > > to
> > >> >>>> > >> > > > > > > > setup
> > >> >>>> > >> > > > > > > > the cluster to function more efficiently?
> > >> >>>> > >> > > > > > > >
> > >> >>>> > >> > > > > > > > Thanks!
> > >> >>>> > >> > > > > > > > Mithila Nagendra
> > >> >>>> > >> > > > > > > > Arizona State University
> > >> >>>> > >> > > > > > > >
> > >> >>>> > >> > > > > > >
> > >> >>>> > >> > > > > >
> > >> >>>> > >> > > > >
> > >> >>>> > >> > > >
> > >> >>>> > >> > >
> > >> >>>> > >> >
> > >> >>>> > >>
> > >> >>>> > >
> > >> >>>> > >
> > >> >>>> >
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> > >> Ravi
> > >> --
> > >>
> > >>
> > >
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Map-Reduce Slow Down

Posted by jason hadoop <ja...@gmail.com>.
Double check that there is no firewall in place.
At one point a bunch of new machines were kickstarted and placed in a
cluster and they all failed with something similar.
It turned out the kickstart script turned enabled the firewall with a rule
that blocked ports in the 50k range.
It took us a while to even think to check that was not a part of our normal
machine configuration

On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> Hi Aaron
> I will look into that thanks!
>
> I spoke to the admin who overlooks the cluster. He said that the gateway
> comes in to the picture only when one of the nodes communicates with a node
> outside of the cluster. But in my case the communication is carried out
> between the nodes which all belong to the same cluster.
>
> Mithila
>
> On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> > Hi,
> >
> > I wrote a blog post a while back about connecting nodes via a gateway.
> See
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> >
> > This assumes that the client is outside the gateway and all
> > datanodes/namenode are inside, but the same principles apply. You'll just
> > need to set up ssh tunnels from every datanode to the namenode.
> >
> > - Aaron
> >
> >
> > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <rphulari@yahoo-inc.com
> >wrote:
> >
> >> Looks like your NameNode is down .
> >> Verify if hadoop process are running (   jps should show you all java
> >> running process).
> >> If your hadoop process are running try restarting your hadoop process .
> >> I guess this problem is due to your fsimage not being correct .
> >> You might have to format your namenode.
> >> Hope this helps.
> >>
> >> Thanks,
> >> --
> >> Ravi
> >>
> >>
> >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
> >>
> >> The log file runs into thousands of line with the same message being
> >> displayed every time.
> >>
> >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu>
> >> wrote:
> >>
> >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> >> > following in it:
> >> >
> >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> >> STARTUP_MSG:
> >> > /************************************************************
> >> > STARTUP_MSG: Starting DataNode
> >> > STARTUP_MSG:   host = node19/127.0.0.1
> >> > STARTUP_MSG:   args = []
> >> > STARTUP_MSG:   version = 0.18.3
> >> > STARTUP_MSG:   build =
> >> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
> >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> >> > ************************************************************/
> >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
> >> node18/
> >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at
> >> node18/
> >> > 192.168.0.18:54310 not available yet, Zzzzz...
> >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying
> >> connect
> >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> >
> >> >
> >> > Hmmm I still cant figure it out..
> >> >
> >> > Mithila
> >> >
> >> >
> >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mnagendr@asu.edu
> >> >wrote:
> >> >
> >> >> Also, Would the way the port is accessed change if all these node are
> >> >> connected through a gateway? I mean in the hadoop-site.xml file? The
> >> Ubuntu
> >> >> systems we worked with earlier didnt have a gateway.
> >> >> Mithila
> >> >>
> >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mnagendr@asu.edu
> >> >wrote:
> >> >>
> >> >>> Aaron: Which log file do I look into - there are alot of them. Here
> s
> >> >>> what the error looks like:
> >> >>> [mithila@node19:~]$ cd hadoop
> >> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
> node18/
> >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> >> >>> Bad connection to FS. command aborted.
> >> >>>
> >> >>> Node19 is a slave and Node18 is the master.
> >> >>>
> >> >>> Mithila
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aaron@cloudera.com
> >> >wrote:
> >> >>>
> >> >>>> Are there any error messages in the log files on those nodes?
> >> >>>> - Aaron
> >> >>>>
> >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> mnagendr@asu.edu>
> >> >>>> wrote:
> >> >>>>
> >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with the
> >> ports.
> >> >>>> I
> >> >>>> > can
> >> >>>> > ssh between the nodes but cant access the DFS from the slaves -
> >> says
> >> >>>> "Bad
> >> >>>> > connection to DFS". Master seems to be fine.
> >> >>>> > Mithila
> >> >>>> >
> >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> >> mnagendr@asu.edu>
> >> >>>> > wrote:
> >> >>>> >
> >> >>>> > > Yes I can..
> >> >>>> > >
> >> >>>> > >
> >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> >> jim.twensky@gmail.com
> >> >>>> > >wrote:
> >> >>>> > >
> >> >>>> > >> Can you ssh between the nodes?
> >> >>>> > >>
> >> >>>> > >> -jim
> >> >>>> > >>
> >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> >> >>>> mnagendr@asu.edu>
> >> >>>> > >> wrote:
> >> >>>> > >>
> >> >>>> > >> > Thanks Aaron.
> >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on them
> and
> >> >>>> the dfs
> >> >>>> > >> was
> >> >>>> > >> > accessed at port 54310. The new cluster which I ve setup has
> >> Red
> >> >>>> Hat
> >> >>>> > >> Linux
> >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access
> >> the
> >> >>>> dfs
> >> >>>> > from
> >> >>>> > >> > one
> >> >>>> > >> > of the slaves i get the following response: dfs cannot be
> >> >>>> accessed.
> >> >>>> > When
> >> >>>> > >> I
> >> >>>> > >> > access the DFS throught the master there s no problem. So I
> >> feel
> >> >>>> there
> >> >>>> > a
> >> >>>> > >> > problem with the port. Any ideas? I did check the list of
> >> slaves,
> >> >>>> it
> >> >>>> > >> looks
> >> >>>> > >> > fine to me.
> >> >>>> > >> >
> >> >>>> > >> > Mithila
> >> >>>> > >> >
> >> >>>> > >> >
> >> >>>> > >> >
> >> >>>> > >> >
> >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> >> >>>> jim.twensky@gmail.com>
> >> >>>> > >> > wrote:
> >> >>>> > >> >
> >> >>>> > >> > > Mithila,
> >> >>>> > >> > >
> >> >>>> > >> > > You said all the slaves were being utilized in the 3 node
> >> >>>> cluster.
> >> >>>> > >> Which
> >> >>>> > >> > > application did you run to test that and what was your
> input
> >> >>>> size?
> >> >>>> > If
> >> >>>> > >> you
> >> >>>> > >> > > tried the word count application on a 516 MB input file on
> >> both
> >> >>>> > >> cluster
> >> >>>> > >> > > setups, than some of your nodes in the 15 node cluster may
> >> not
> >> >>>> be
> >> >>>> > >> running
> >> >>>> > >> > > at
> >> >>>> > >> > > all. Generally, one map job is assigned to each input
> split
> >> and
> >> >>>> if
> >> >>>> > you
> >> >>>> > >> > are
> >> >>>> > >> > > running your cluster with the defaults, the splits are 64
> MB
> >> >>>> each. I
> >> >>>> > >> got
> >> >>>> > >> > > confused when you said the Namenode seemed to do all the
> >> work.
> >> >>>> Can
> >> >>>> > you
> >> >>>> > >> > > check
> >> >>>> > >> > > conf/slaves and make sure you put the names of all task
> >> >>>> trackers
> >> >>>> > >> there? I
> >> >>>> > >> > > also suggest comparing both clusters with a larger input
> >> size,
> >> >>>> say
> >> >>>> > at
> >> >>>> > >> > least
> >> >>>> > >> > > 5 GB, to really see a difference.
> >> >>>> > >> > >
> >> >>>> > >> > > Jim
> >> >>>> > >> > >
> >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> >> >>>> aaron@cloudera.com>
> >> >>>> > >> > wrote:
> >> >>>> > >> > >
> >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate
> >> the
> >> >>>> data
> >> >>>> > >> and
> >> >>>> > >> > > > "sort"
> >> >>>> > >> > > > to sort it.
> >> >>>> > >> > > > - Aaron
> >> >>>> > >> > > >
> >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> >> >>>> > forpankil@gmail.com>
> >> >>>> > >> > > wrote:
> >> >>>> > >> > > >
> >> >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it
> >> >>>> might be
> >> >>>> > >> > > overhead
> >> >>>> > >> > > > > time of these clusters making your total MR jobs more
> >> time
> >> >>>> > >> consuming.
> >> >>>> > >> > > > > I guess you will have to try with larger set of data..
> >> >>>> > >> > > > >
> >> >>>> > >> > > > > Pankil
> >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> >> >>>> > >> mnagendr@asu.edu>
> >> >>>> > >> > > > > wrote:
> >> >>>> > >> > > > >
> >> >>>> > >> > > > > > Aaron
> >> >>>> > >> > > > > >
> >> >>>> > >> > > > > > That could be the issue, my data is just 516MB -
> >> wouldn't
> >> >>>> this
> >> >>>> > >> see
> >> >>>> > >> > a
> >> >>>> > >> > > > bit
> >> >>>> > >> > > > > of
> >> >>>> > >> > > > > > speed up?
> >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
> cluster
> >> on
> >> >>>> it
> >> >>>> > and
> >> >>>> > >> > see
> >> >>>> > >> > > > what
> >> >>>> > >> > > > > I
> >> >>>> > >> > > > > > get. Also for my program I had a java timer running
> to
> >> >>>> record
> >> >>>> > >> the
> >> >>>> > >> > > time
> >> >>>> > >> > > > > > taken
> >> >>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt
> >> timer?
> >> >>>> > >> > > > > >
> >> >>>> > >> > > > > > Mithila
> >> >>>> > >> > > > > >
> >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> >> >>>> > >> aaron@cloudera.com
> >> >>>> > >> > >
> >> >>>> > >> > > > > wrote:
> >> >>>> > >> > > > > >
> >> >>>> > >> > > > > > > Virtually none of the examples that ship with
> Hadoop
> >> >>>> are
> >> >>>> > >> designed
> >> >>>> > >> > > to
> >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from
> its
> >> >>>> ability
> >> >>>> > to
> >> >>>> > >> > > > process
> >> >>>> > >> > > > > > very
> >> >>>> > >> > > > > > > large volumes of data (starting around, say, tens
> of
> >> GB
> >> >>>> per
> >> >>>> > >> job,
> >> >>>> > >> > > and
> >> >>>> > >> > > > > > going
> >> >>>> > >> > > > > > > up in orders of magnitude from there). So if you
> are
> >> >>>> timing
> >> >>>> > >> the
> >> >>>> > >> > pi
> >> >>>> > >> > > > > > > calculator (or something like that), its results
> >> won't
> >> >>>> > >> > necessarily
> >> >>>> > >> > > be
> >> >>>> > >> > > > > > very
> >> >>>> > >> > > > > > > consistent. If a job doesn't have enough fragments
> >> of
> >> >>>> data
> >> >>>> > to
> >> >>>> > >> > > > allocate
> >> >>>> > >> > > > > > one
> >> >>>> > >> > > > > > > per each node, some of the nodes will also just go
> >> >>>> unused.
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > > > The best example for you to run is to use
> >> randomwriter
> >> >>>> to
> >> >>>> > fill
> >> >>>> > >> up
> >> >>>> > >> > > > your
> >> >>>> > >> > > > > > > cluster with several GB of random data and then
> run
> >> the
> >> >>>> sort
> >> >>>> > >> > > program.
> >> >>>> > >> > > > > If
> >> >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to
> >> 15,
> >> >>>> then
> >> >>>> > >> you've
> >> >>>> > >> > > > > > > definitely
> >> >>>> > >> > > > > > > got something strange going on.
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > > > - Aaron
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra
> <
> >> >>>> > >> > > mnagendr@asu.edu>
> >> >>>> > >> > > > > > > wrote:
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > > > > Hey all
> >> >>>> > >> > > > > > > > I recently setup a three node hadoop cluster and
> >> ran
> >> >>>> an
> >> >>>> > >> > examples
> >> >>>> > >> > > on
> >> >>>> > >> > > > > it.
> >> >>>> > >> > > > > > > It
> >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes were
> >> being
> >> >>>> used
> >> >>>> > (I
> >> >>>> > >> > > checked
> >> >>>> > >> > > > > the
> >> >>>> > >> > > > > > > log
> >> >>>> > >> > > > > > > > files to make sure that the slaves are
> utilized).
> >> >>>> > >> > > > > > > >
> >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15
> >> >>>> nodes. I
> >> >>>> > ran
> >> >>>> > >> > the
> >> >>>> > >> > > > same
> >> >>>> > >> > > > > > > > example, but instead of speeding up, the
> >> map-reduce
> >> >>>> task
> >> >>>> > >> seems
> >> >>>> > >> > to
> >> >>>> > >> > > > > take
> >> >>>> > >> > > > > > > > forever! The slaves are not being used for some
> >> >>>> reason.
> >> >>>> > This
> >> >>>> > >> > > second
> >> >>>> > >> > > > > > > cluster
> >> >>>> > >> > > > > > > > has a lower, per node processing power, but
> should
> >> >>>> that
> >> >>>> > make
> >> >>>> > >> > any
> >> >>>> > >> > > > > > > > difference?
> >> >>>> > >> > > > > > > > How can I ensure that the data is being mapped
> to
> >> all
> >> >>>> the
> >> >>>> > >> > nodes?
> >> >>>> > >> > > > > > > Presently,
> >> >>>> > >> > > > > > > > the only node that seems to be doing all the
> work
> >> is
> >> >>>> the
> >> >>>> > >> Master
> >> >>>> > >> > > > node.
> >> >>>> > >> > > > > > > >
> >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network
> >> cost?
> >> >>>> What
> >> >>>> > >> can
> >> >>>> > >> > I
> >> >>>> > >> > > do
> >> >>>> > >> > > > > to
> >> >>>> > >> > > > > > > > setup
> >> >>>> > >> > > > > > > > the cluster to function more efficiently?
> >> >>>> > >> > > > > > > >
> >> >>>> > >> > > > > > > > Thanks!
> >> >>>> > >> > > > > > > > Mithila Nagendra
> >> >>>> > >> > > > > > > > Arizona State University
> >> >>>> > >> > > > > > > >
> >> >>>> > >> > > > > > >
> >> >>>> > >> > > > > >
> >> >>>> > >> > > > >
> >> >>>> > >> > > >
> >> >>>> > >> > >
> >> >>>> > >> >
> >> >>>> > >>
> >> >>>> > >
> >> >>>> > >
> >> >>>> >
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >> Ravi
> >> --
> >>
> >>
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Hi Aaron
I will look into that thanks!

I spoke to the admin who overlooks the cluster. He said that the gateway
comes in to the picture only when one of the nodes communicates with a node
outside of the cluster. But in my case the communication is carried out
between the nodes which all belong to the same cluster.

Mithila

On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Hi,
>
> I wrote a blog post a while back about connecting nodes via a gateway. See
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>
> This assumes that the client is outside the gateway and all
> datanodes/namenode are inside, but the same principles apply. You'll just
> need to set up ssh tunnels from every datanode to the namenode.
>
> - Aaron
>
>
> On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <rp...@yahoo-inc.com>wrote:
>
>> Looks like your NameNode is down .
>> Verify if hadoop process are running (   jps should show you all java
>> running process).
>> If your hadoop process are running try restarting your hadoop process .
>> I guess this problem is due to your fsimage not being correct .
>> You might have to format your namenode.
>> Hope this helps.
>>
>> Thanks,
>> --
>> Ravi
>>
>>
>> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
>>
>> The log file runs into thousands of line with the same message being
>> displayed every time.
>>
>> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
>> > following in it:
>> >
>> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> STARTUP_MSG:
>> > /************************************************************
>> > STARTUP_MSG: Starting DataNode
>> > STARTUP_MSG:   host = node19/127.0.0.1
>> > STARTUP_MSG:   args = []
>> > STARTUP_MSG:   version = 0.18.3
>> > STARTUP_MSG:   build =
>> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
>> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> > ************************************************************/
>> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
>> node18/
>> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at
>> node18/
>> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >
>> >
>> > Hmmm I still cant figure it out..
>> >
>> > Mithila
>> >
>> >
>> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mnagendr@asu.edu
>> >wrote:
>> >
>> >> Also, Would the way the port is accessed change if all these node are
>> >> connected through a gateway? I mean in the hadoop-site.xml file? The
>> Ubuntu
>> >> systems we worked with earlier didnt have a gateway.
>> >> Mithila
>> >>
>> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mnagendr@asu.edu
>> >wrote:
>> >>
>> >>> Aaron: Which log file do I look into - there are alot of them. Here s
>> >>> what the error looks like:
>> >>> [mithila@node19:~]$ cd hadoop
>> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
>> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> >>> Bad connection to FS. command aborted.
>> >>>
>> >>> Node19 is a slave and Node18 is the master.
>> >>>
>> >>> Mithila
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aaron@cloudera.com
>> >wrote:
>> >>>
>> >>>> Are there any error messages in the log files on those nodes?
>> >>>> - Aaron
>> >>>>
>> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
>> >>>> wrote:
>> >>>>
>> >>>> > I ve drawn a blank here! Can't figure out what s wrong with the
>> ports.
>> >>>> I
>> >>>> > can
>> >>>> > ssh between the nodes but cant access the DFS from the slaves -
>> says
>> >>>> "Bad
>> >>>> > connection to DFS". Master seems to be fine.
>> >>>> > Mithila
>> >>>> >
>> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> mnagendr@asu.edu>
>> >>>> > wrote:
>> >>>> >
>> >>>> > > Yes I can..
>> >>>> > >
>> >>>> > >
>> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> jim.twensky@gmail.com
>> >>>> > >wrote:
>> >>>> > >
>> >>>> > >> Can you ssh between the nodes?
>> >>>> > >>
>> >>>> > >> -jim
>> >>>> > >>
>> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>> >>>> mnagendr@asu.edu>
>> >>>> > >> wrote:
>> >>>> > >>
>> >>>> > >> > Thanks Aaron.
>> >>>> > >> > Jim: The three clusters I setup had ubuntu running on them and
>> >>>> the dfs
>> >>>> > >> was
>> >>>> > >> > accessed at port 54310. The new cluster which I ve setup has
>> Red
>> >>>> Hat
>> >>>> > >> Linux
>> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access
>> the
>> >>>> dfs
>> >>>> > from
>> >>>> > >> > one
>> >>>> > >> > of the slaves i get the following response: dfs cannot be
>> >>>> accessed.
>> >>>> > When
>> >>>> > >> I
>> >>>> > >> > access the DFS throught the master there s no problem. So I
>> feel
>> >>>> there
>> >>>> > a
>> >>>> > >> > problem with the port. Any ideas? I did check the list of
>> slaves,
>> >>>> it
>> >>>> > >> looks
>> >>>> > >> > fine to me.
>> >>>> > >> >
>> >>>> > >> > Mithila
>> >>>> > >> >
>> >>>> > >> >
>> >>>> > >> >
>> >>>> > >> >
>> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> >>>> jim.twensky@gmail.com>
>> >>>> > >> > wrote:
>> >>>> > >> >
>> >>>> > >> > > Mithila,
>> >>>> > >> > >
>> >>>> > >> > > You said all the slaves were being utilized in the 3 node
>> >>>> cluster.
>> >>>> > >> Which
>> >>>> > >> > > application did you run to test that and what was your input
>> >>>> size?
>> >>>> > If
>> >>>> > >> you
>> >>>> > >> > > tried the word count application on a 516 MB input file on
>> both
>> >>>> > >> cluster
>> >>>> > >> > > setups, than some of your nodes in the 15 node cluster may
>> not
>> >>>> be
>> >>>> > >> running
>> >>>> > >> > > at
>> >>>> > >> > > all. Generally, one map job is assigned to each input split
>> and
>> >>>> if
>> >>>> > you
>> >>>> > >> > are
>> >>>> > >> > > running your cluster with the defaults, the splits are 64 MB
>> >>>> each. I
>> >>>> > >> got
>> >>>> > >> > > confused when you said the Namenode seemed to do all the
>> work.
>> >>>> Can
>> >>>> > you
>> >>>> > >> > > check
>> >>>> > >> > > conf/slaves and make sure you put the names of all task
>> >>>> trackers
>> >>>> > >> there? I
>> >>>> > >> > > also suggest comparing both clusters with a larger input
>> size,
>> >>>> say
>> >>>> > at
>> >>>> > >> > least
>> >>>> > >> > > 5 GB, to really see a difference.
>> >>>> > >> > >
>> >>>> > >> > > Jim
>> >>>> > >> > >
>> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> >>>> aaron@cloudera.com>
>> >>>> > >> > wrote:
>> >>>> > >> > >
>> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate
>> the
>> >>>> data
>> >>>> > >> and
>> >>>> > >> > > > "sort"
>> >>>> > >> > > > to sort it.
>> >>>> > >> > > > - Aaron
>> >>>> > >> > > >
>> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> >>>> > forpankil@gmail.com>
>> >>>> > >> > > wrote:
>> >>>> > >> > > >
>> >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it
>> >>>> might be
>> >>>> > >> > > overhead
>> >>>> > >> > > > > time of these clusters making your total MR jobs more
>> time
>> >>>> > >> consuming.
>> >>>> > >> > > > > I guess you will have to try with larger set of data..
>> >>>> > >> > > > >
>> >>>> > >> > > > > Pankil
>> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>> >>>> > >> mnagendr@asu.edu>
>> >>>> > >> > > > > wrote:
>> >>>> > >> > > > >
>> >>>> > >> > > > > > Aaron
>> >>>> > >> > > > > >
>> >>>> > >> > > > > > That could be the issue, my data is just 516MB -
>> wouldn't
>> >>>> this
>> >>>> > >> see
>> >>>> > >> > a
>> >>>> > >> > > > bit
>> >>>> > >> > > > > of
>> >>>> > >> > > > > > speed up?
>> >>>> > >> > > > > > Could you guide me to the example? I ll run my cluster
>> on
>> >>>> it
>> >>>> > and
>> >>>> > >> > see
>> >>>> > >> > > > what
>> >>>> > >> > > > > I
>> >>>> > >> > > > > > get. Also for my program I had a java timer running to
>> >>>> record
>> >>>> > >> the
>> >>>> > >> > > time
>> >>>> > >> > > > > > taken
>> >>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt
>> timer?
>> >>>> > >> > > > > >
>> >>>> > >> > > > > > Mithila
>> >>>> > >> > > > > >
>> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>> >>>> > >> aaron@cloudera.com
>> >>>> > >> > >
>> >>>> > >> > > > > wrote:
>> >>>> > >> > > > > >
>> >>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop
>> >>>> are
>> >>>> > >> designed
>> >>>> > >> > > to
>> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>> >>>> ability
>> >>>> > to
>> >>>> > >> > > > process
>> >>>> > >> > > > > > very
>> >>>> > >> > > > > > > large volumes of data (starting around, say, tens of
>> GB
>> >>>> per
>> >>>> > >> job,
>> >>>> > >> > > and
>> >>>> > >> > > > > > going
>> >>>> > >> > > > > > > up in orders of magnitude from there). So if you are
>> >>>> timing
>> >>>> > >> the
>> >>>> > >> > pi
>> >>>> > >> > > > > > > calculator (or something like that), its results
>> won't
>> >>>> > >> > necessarily
>> >>>> > >> > > be
>> >>>> > >> > > > > > very
>> >>>> > >> > > > > > > consistent. If a job doesn't have enough fragments
>> of
>> >>>> data
>> >>>> > to
>> >>>> > >> > > > allocate
>> >>>> > >> > > > > > one
>> >>>> > >> > > > > > > per each node, some of the nodes will also just go
>> >>>> unused.
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > > > The best example for you to run is to use
>> randomwriter
>> >>>> to
>> >>>> > fill
>> >>>> > >> up
>> >>>> > >> > > > your
>> >>>> > >> > > > > > > cluster with several GB of random data and then run
>> the
>> >>>> sort
>> >>>> > >> > > program.
>> >>>> > >> > > > > If
>> >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to
>> 15,
>> >>>> then
>> >>>> > >> you've
>> >>>> > >> > > > > > > definitely
>> >>>> > >> > > > > > > got something strange going on.
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > > > - Aaron
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>> >>>> > >> > > mnagendr@asu.edu>
>> >>>> > >> > > > > > > wrote:
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > > > > Hey all
>> >>>> > >> > > > > > > > I recently setup a three node hadoop cluster and
>> ran
>> >>>> an
>> >>>> > >> > examples
>> >>>> > >> > > on
>> >>>> > >> > > > > it.
>> >>>> > >> > > > > > > It
>> >>>> > >> > > > > > > > was pretty fast, and all the three nodes were
>> being
>> >>>> used
>> >>>> > (I
>> >>>> > >> > > checked
>> >>>> > >> > > > > the
>> >>>> > >> > > > > > > log
>> >>>> > >> > > > > > > > files to make sure that the slaves are utilized).
>> >>>> > >> > > > > > > >
>> >>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15
>> >>>> nodes. I
>> >>>> > ran
>> >>>> > >> > the
>> >>>> > >> > > > same
>> >>>> > >> > > > > > > > example, but instead of speeding up, the
>> map-reduce
>> >>>> task
>> >>>> > >> seems
>> >>>> > >> > to
>> >>>> > >> > > > > take
>> >>>> > >> > > > > > > > forever! The slaves are not being used for some
>> >>>> reason.
>> >>>> > This
>> >>>> > >> > > second
>> >>>> > >> > > > > > > cluster
>> >>>> > >> > > > > > > > has a lower, per node processing power, but should
>> >>>> that
>> >>>> > make
>> >>>> > >> > any
>> >>>> > >> > > > > > > > difference?
>> >>>> > >> > > > > > > > How can I ensure that the data is being mapped to
>> all
>> >>>> the
>> >>>> > >> > nodes?
>> >>>> > >> > > > > > > Presently,
>> >>>> > >> > > > > > > > the only node that seems to be doing all the work
>> is
>> >>>> the
>> >>>> > >> Master
>> >>>> > >> > > > node.
>> >>>> > >> > > > > > > >
>> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network
>> cost?
>> >>>> What
>> >>>> > >> can
>> >>>> > >> > I
>> >>>> > >> > > do
>> >>>> > >> > > > > to
>> >>>> > >> > > > > > > > setup
>> >>>> > >> > > > > > > > the cluster to function more efficiently?
>> >>>> > >> > > > > > > >
>> >>>> > >> > > > > > > > Thanks!
>> >>>> > >> > > > > > > > Mithila Nagendra
>> >>>> > >> > > > > > > > Arizona State University
>> >>>> > >> > > > > > > >
>> >>>> > >> > > > > > >
>> >>>> > >> > > > > >
>> >>>> > >> > > > >
>> >>>> > >> > > >
>> >>>> > >> > >
>> >>>> > >> >
>> >>>> > >>
>> >>>> > >
>> >>>> > >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>> Ravi
>> --
>>
>>
>

Re: Map-Reduce Slow Down

Posted by Aaron Kimball <aa...@cloudera.com>.
Hi,

I wrote a blog post a while back about connecting nodes via a gateway. See
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/

This assumes that the client is outside the gateway and all
datanodes/namenode are inside, but the same principles apply. You'll just
need to set up ssh tunnels from every datanode to the namenode.

- Aaron

On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <rp...@yahoo-inc.com>wrote:

> Looks like your NameNode is down .
> Verify if hadoop process are running (   jps should show you all java
> running process).
> If your hadoop process are running try restarting your hadoop process .
> I guess this problem is due to your fsimage not being correct .
> You might have to format your namenode.
> Hope this helps.
>
> Thanks,
> --
> Ravi
>
>
> On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:
>
> The log file runs into thousands of line with the same message being
> displayed every time.
>
> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> > following in it:
> >
> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> > /************************************************************
> > STARTUP_MSG: Starting DataNode
> > STARTUP_MSG:   host = node19/127.0.0.1
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.18.3
> > STARTUP_MSG:   build =
> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> > ************************************************************/
> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> > 192.168.0.18:54310 not available yet, Zzzzz...
> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> > 192.168.0.18:54310 not available yet, Zzzzz...
> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >
> >
> > Hmmm I still cant figure it out..
> >
> > Mithila
> >
> >
> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mnagendr@asu.edu
> >wrote:
> >
> >> Also, Would the way the port is accessed change if all these node are
> >> connected through a gateway? I mean in the hadoop-site.xml file? The
> Ubuntu
> >> systems we worked with earlier didnt have a gateway.
> >> Mithila
> >>
> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mnagendr@asu.edu
> >wrote:
> >>
> >>> Aaron: Which log file do I look into - there are alot of them. Here s
> >>> what the error looks like:
> >>> [mithila@node19:~]$ cd hadoop
> >>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 0 time(s).
> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 1 time(s).
> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 2 time(s).
> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 3 time(s).
> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 4 time(s).
> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 5 time(s).
> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 6 time(s).
> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 7 time(s).
> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 8 time(s).
> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
> >>> 192.168.0.18:54310. Already tried 9 time(s).
> >>> Bad connection to FS. command aborted.
> >>>
> >>> Node19 is a slave and Node18 is the master.
> >>>
> >>> Mithila
> >>>
> >>>
> >>>
> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aaron@cloudera.com
> >wrote:
> >>>
> >>>> Are there any error messages in the log files on those nodes?
> >>>> - Aaron
> >>>>
> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
> >>>> wrote:
> >>>>
> >>>> > I ve drawn a blank here! Can't figure out what s wrong with the
> ports.
> >>>> I
> >>>> > can
> >>>> > ssh between the nodes but cant access the DFS from the slaves - says
> >>>> "Bad
> >>>> > connection to DFS". Master seems to be fine.
> >>>> > Mithila
> >>>> >
> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mnagendr@asu.edu
> >
> >>>> > wrote:
> >>>> >
> >>>> > > Yes I can..
> >>>> > >
> >>>> > >
> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> jim.twensky@gmail.com
> >>>> > >wrote:
> >>>> > >
> >>>> > >> Can you ssh between the nodes?
> >>>> > >>
> >>>> > >> -jim
> >>>> > >>
> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> >>>> mnagendr@asu.edu>
> >>>> > >> wrote:
> >>>> > >>
> >>>> > >> > Thanks Aaron.
> >>>> > >> > Jim: The three clusters I setup had ubuntu running on them and
> >>>> the dfs
> >>>> > >> was
> >>>> > >> > accessed at port 54310. The new cluster which I ve setup has
> Red
> >>>> Hat
> >>>> > >> Linux
> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the
> >>>> dfs
> >>>> > from
> >>>> > >> > one
> >>>> > >> > of the slaves i get the following response: dfs cannot be
> >>>> accessed.
> >>>> > When
> >>>> > >> I
> >>>> > >> > access the DFS throught the master there s no problem. So I
> feel
> >>>> there
> >>>> > a
> >>>> > >> > problem with the port. Any ideas? I did check the list of
> slaves,
> >>>> it
> >>>> > >> looks
> >>>> > >> > fine to me.
> >>>> > >> >
> >>>> > >> > Mithila
> >>>> > >> >
> >>>> > >> >
> >>>> > >> >
> >>>> > >> >
> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> >>>> jim.twensky@gmail.com>
> >>>> > >> > wrote:
> >>>> > >> >
> >>>> > >> > > Mithila,
> >>>> > >> > >
> >>>> > >> > > You said all the slaves were being utilized in the 3 node
> >>>> cluster.
> >>>> > >> Which
> >>>> > >> > > application did you run to test that and what was your input
> >>>> size?
> >>>> > If
> >>>> > >> you
> >>>> > >> > > tried the word count application on a 516 MB input file on
> both
> >>>> > >> cluster
> >>>> > >> > > setups, than some of your nodes in the 15 node cluster may
> not
> >>>> be
> >>>> > >> running
> >>>> > >> > > at
> >>>> > >> > > all. Generally, one map job is assigned to each input split
> and
> >>>> if
> >>>> > you
> >>>> > >> > are
> >>>> > >> > > running your cluster with the defaults, the splits are 64 MB
> >>>> each. I
> >>>> > >> got
> >>>> > >> > > confused when you said the Namenode seemed to do all the
> work.
> >>>> Can
> >>>> > you
> >>>> > >> > > check
> >>>> > >> > > conf/slaves and make sure you put the names of all task
> >>>> trackers
> >>>> > >> there? I
> >>>> > >> > > also suggest comparing both clusters with a larger input
> size,
> >>>> say
> >>>> > at
> >>>> > >> > least
> >>>> > >> > > 5 GB, to really see a difference.
> >>>> > >> > >
> >>>> > >> > > Jim
> >>>> > >> > >
> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> >>>> aaron@cloudera.com>
> >>>> > >> > wrote:
> >>>> > >> > >
> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate
> the
> >>>> data
> >>>> > >> and
> >>>> > >> > > > "sort"
> >>>> > >> > > > to sort it.
> >>>> > >> > > > - Aaron
> >>>> > >> > > >
> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> >>>> > forpankil@gmail.com>
> >>>> > >> > > wrote:
> >>>> > >> > > >
> >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it
> >>>> might be
> >>>> > >> > > overhead
> >>>> > >> > > > > time of these clusters making your total MR jobs more
> time
> >>>> > >> consuming.
> >>>> > >> > > > > I guess you will have to try with larger set of data..
> >>>> > >> > > > >
> >>>> > >> > > > > Pankil
> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> >>>> > >> mnagendr@asu.edu>
> >>>> > >> > > > > wrote:
> >>>> > >> > > > >
> >>>> > >> > > > > > Aaron
> >>>> > >> > > > > >
> >>>> > >> > > > > > That could be the issue, my data is just 516MB -
> wouldn't
> >>>> this
> >>>> > >> see
> >>>> > >> > a
> >>>> > >> > > > bit
> >>>> > >> > > > > of
> >>>> > >> > > > > > speed up?
> >>>> > >> > > > > > Could you guide me to the example? I ll run my cluster
> on
> >>>> it
> >>>> > and
> >>>> > >> > see
> >>>> > >> > > > what
> >>>> > >> > > > > I
> >>>> > >> > > > > > get. Also for my program I had a java timer running to
> >>>> record
> >>>> > >> the
> >>>> > >> > > time
> >>>> > >> > > > > > taken
> >>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt
> timer?
> >>>> > >> > > > > >
> >>>> > >> > > > > > Mithila
> >>>> > >> > > > > >
> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> >>>> > >> aaron@cloudera.com
> >>>> > >> > >
> >>>> > >> > > > > wrote:
> >>>> > >> > > > > >
> >>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop
> >>>> are
> >>>> > >> designed
> >>>> > >> > > to
> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
> >>>> ability
> >>>> > to
> >>>> > >> > > > process
> >>>> > >> > > > > > very
> >>>> > >> > > > > > > large volumes of data (starting around, say, tens of
> GB
> >>>> per
> >>>> > >> job,
> >>>> > >> > > and
> >>>> > >> > > > > > going
> >>>> > >> > > > > > > up in orders of magnitude from there). So if you are
> >>>> timing
> >>>> > >> the
> >>>> > >> > pi
> >>>> > >> > > > > > > calculator (or something like that), its results
> won't
> >>>> > >> > necessarily
> >>>> > >> > > be
> >>>> > >> > > > > > very
> >>>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
> >>>> data
> >>>> > to
> >>>> > >> > > > allocate
> >>>> > >> > > > > > one
> >>>> > >> > > > > > > per each node, some of the nodes will also just go
> >>>> unused.
> >>>> > >> > > > > > >
> >>>> > >> > > > > > > The best example for you to run is to use
> randomwriter
> >>>> to
> >>>> > fill
> >>>> > >> up
> >>>> > >> > > > your
> >>>> > >> > > > > > > cluster with several GB of random data and then run
> the
> >>>> sort
> >>>> > >> > > program.
> >>>> > >> > > > > If
> >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
> >>>> then
> >>>> > >> you've
> >>>> > >> > > > > > > definitely
> >>>> > >> > > > > > > got something strange going on.
> >>>> > >> > > > > > >
> >>>> > >> > > > > > > - Aaron
> >>>> > >> > > > > > >
> >>>> > >> > > > > > >
> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> >>>> > >> > > mnagendr@asu.edu>
> >>>> > >> > > > > > > wrote:
> >>>> > >> > > > > > >
> >>>> > >> > > > > > > > Hey all
> >>>> > >> > > > > > > > I recently setup a three node hadoop cluster and
> ran
> >>>> an
> >>>> > >> > examples
> >>>> > >> > > on
> >>>> > >> > > > > it.
> >>>> > >> > > > > > > It
> >>>> > >> > > > > > > > was pretty fast, and all the three nodes were being
> >>>> used
> >>>> > (I
> >>>> > >> > > checked
> >>>> > >> > > > > the
> >>>> > >> > > > > > > log
> >>>> > >> > > > > > > > files to make sure that the slaves are utilized).
> >>>> > >> > > > > > > >
> >>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15
> >>>> nodes. I
> >>>> > ran
> >>>> > >> > the
> >>>> > >> > > > same
> >>>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
> >>>> task
> >>>> > >> seems
> >>>> > >> > to
> >>>> > >> > > > > take
> >>>> > >> > > > > > > > forever! The slaves are not being used for some
> >>>> reason.
> >>>> > This
> >>>> > >> > > second
> >>>> > >> > > > > > > cluster
> >>>> > >> > > > > > > > has a lower, per node processing power, but should
> >>>> that
> >>>> > make
> >>>> > >> > any
> >>>> > >> > > > > > > > difference?
> >>>> > >> > > > > > > > How can I ensure that the data is being mapped to
> all
> >>>> the
> >>>> > >> > nodes?
> >>>> > >> > > > > > > Presently,
> >>>> > >> > > > > > > > the only node that seems to be doing all the work
> is
> >>>> the
> >>>> > >> Master
> >>>> > >> > > > node.
> >>>> > >> > > > > > > >
> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network
> cost?
> >>>> What
> >>>> > >> can
> >>>> > >> > I
> >>>> > >> > > do
> >>>> > >> > > > > to
> >>>> > >> > > > > > > > setup
> >>>> > >> > > > > > > > the cluster to function more efficiently?
> >>>> > >> > > > > > > >
> >>>> > >> > > > > > > > Thanks!
> >>>> > >> > > > > > > > Mithila Nagendra
> >>>> > >> > > > > > > > Arizona State University
> >>>> > >> > > > > > > >
> >>>> > >> > > > > > >
> >>>> > >> > > > > >
> >>>> > >> > > > >
> >>>> > >> > > >
> >>>> > >> > >
> >>>> > >> >
> >>>> > >>
> >>>> > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>
>
> Ravi
> --
>
>

Re: Map-Reduce Slow Down

Posted by Ravi Phulari <rp...@yahoo-inc.com>.
Looks like your NameNode is down .
Verify if hadoop process are running (   jps should show you all java running process).
If your hadoop process are running try restarting your hadoop process .
I guess this problem is due to your fsimage not being correct .
You might have to format your namenode.
Hope this helps.

Thanks,
--
Ravi


On 4/15/09 10:15 AM, "Mithila Nagendra" <mn...@asu.edu> wrote:

The log file runs into thousands of line with the same message being
displayed every time.

On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> following in it:
>
> 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = node19/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
> 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> ************************************************************/
> 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> 192.168.0.18:54310 not available yet, Zzzzz...
> 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> 192.168.0.18:54310 not available yet, Zzzzz...
> 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>
>
> Hmmm I still cant figure it out..
>
> Mithila
>
>
> On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>
>> Also, Would the way the port is accessed change if all these node are
>> connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu
>> systems we worked with earlier didnt have a gateway.
>> Mithila
>>
>> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>>
>>> Aaron: Which log file do I look into - there are alot of them. Here s
>>> what the error looks like:
>>> [mithila@node19:~]$ cd hadoop
>>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 0 time(s).
>>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 1 time(s).
>>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 2 time(s).
>>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 3 time(s).
>>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 4 time(s).
>>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 5 time(s).
>>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 6 time(s).
>>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 7 time(s).
>>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 8 time(s).
>>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 9 time(s).
>>> Bad connection to FS. command aborted.
>>>
>>> Node19 is a slave and Node18 is the master.
>>>
>>> Mithila
>>>
>>>
>>>
>>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>>
>>>> Are there any error messages in the log files on those nodes?
>>>> - Aaron
>>>>
>>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
>>>> wrote:
>>>>
>>>> > I ve drawn a blank here! Can't figure out what s wrong with the ports.
>>>> I
>>>> > can
>>>> > ssh between the nodes but cant access the DFS from the slaves - says
>>>> "Bad
>>>> > connection to DFS". Master seems to be fine.
>>>> > Mithila
>>>> >
>>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
>>>> > wrote:
>>>> >
>>>> > > Yes I can..
>>>> > >
>>>> > >
>>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
>>>> > >wrote:
>>>> > >
>>>> > >> Can you ssh between the nodes?
>>>> > >>
>>>> > >> -jim
>>>> > >>
>>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>>>> mnagendr@asu.edu>
>>>> > >> wrote:
>>>> > >>
>>>> > >> > Thanks Aaron.
>>>> > >> > Jim: The three clusters I setup had ubuntu running on them and
>>>> the dfs
>>>> > >> was
>>>> > >> > accessed at port 54310. The new cluster which I ve setup has Red
>>>> Hat
>>>> > >> Linux
>>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the
>>>> dfs
>>>> > from
>>>> > >> > one
>>>> > >> > of the slaves i get the following response: dfs cannot be
>>>> accessed.
>>>> > When
>>>> > >> I
>>>> > >> > access the DFS throught the master there s no problem. So I feel
>>>> there
>>>> > a
>>>> > >> > problem with the port. Any ideas? I did check the list of slaves,
>>>> it
>>>> > >> looks
>>>> > >> > fine to me.
>>>> > >> >
>>>> > >> > Mithila
>>>> > >> >
>>>> > >> >
>>>> > >> >
>>>> > >> >
>>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>>>> jim.twensky@gmail.com>
>>>> > >> > wrote:
>>>> > >> >
>>>> > >> > > Mithila,
>>>> > >> > >
>>>> > >> > > You said all the slaves were being utilized in the 3 node
>>>> cluster.
>>>> > >> Which
>>>> > >> > > application did you run to test that and what was your input
>>>> size?
>>>> > If
>>>> > >> you
>>>> > >> > > tried the word count application on a 516 MB input file on both
>>>> > >> cluster
>>>> > >> > > setups, than some of your nodes in the 15 node cluster may not
>>>> be
>>>> > >> running
>>>> > >> > > at
>>>> > >> > > all. Generally, one map job is assigned to each input split and
>>>> if
>>>> > you
>>>> > >> > are
>>>> > >> > > running your cluster with the defaults, the splits are 64 MB
>>>> each. I
>>>> > >> got
>>>> > >> > > confused when you said the Namenode seemed to do all the work.
>>>> Can
>>>> > you
>>>> > >> > > check
>>>> > >> > > conf/slaves and make sure you put the names of all task
>>>> trackers
>>>> > >> there? I
>>>> > >> > > also suggest comparing both clusters with a larger input size,
>>>> say
>>>> > at
>>>> > >> > least
>>>> > >> > > 5 GB, to really see a difference.
>>>> > >> > >
>>>> > >> > > Jim
>>>> > >> > >
>>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>>>> aaron@cloudera.com>
>>>> > >> > wrote:
>>>> > >> > >
>>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
>>>> data
>>>> > >> and
>>>> > >> > > > "sort"
>>>> > >> > > > to sort it.
>>>> > >> > > > - Aaron
>>>> > >> > > >
>>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>>>> > forpankil@gmail.com>
>>>> > >> > > wrote:
>>>> > >> > > >
>>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it
>>>> might be
>>>> > >> > > overhead
>>>> > >> > > > > time of these clusters making your total MR jobs more time
>>>> > >> consuming.
>>>> > >> > > > > I guess you will have to try with larger set of data..
>>>> > >> > > > >
>>>> > >> > > > > Pankil
>>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>>>> > >> mnagendr@asu.edu>
>>>> > >> > > > > wrote:
>>>> > >> > > > >
>>>> > >> > > > > > Aaron
>>>> > >> > > > > >
>>>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
>>>> this
>>>> > >> see
>>>> > >> > a
>>>> > >> > > > bit
>>>> > >> > > > > of
>>>> > >> > > > > > speed up?
>>>> > >> > > > > > Could you guide me to the example? I ll run my cluster on
>>>> it
>>>> > and
>>>> > >> > see
>>>> > >> > > > what
>>>> > >> > > > > I
>>>> > >> > > > > > get. Also for my program I had a java timer running to
>>>> record
>>>> > >> the
>>>> > >> > > time
>>>> > >> > > > > > taken
>>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>>>> > >> > > > > >
>>>> > >> > > > > > Mithila
>>>> > >> > > > > >
>>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>>>> > >> aaron@cloudera.com
>>>> > >> > >
>>>> > >> > > > > wrote:
>>>> > >> > > > > >
>>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop
>>>> are
>>>> > >> designed
>>>> > >> > > to
>>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>>>> ability
>>>> > to
>>>> > >> > > > process
>>>> > >> > > > > > very
>>>> > >> > > > > > > large volumes of data (starting around, say, tens of GB
>>>> per
>>>> > >> job,
>>>> > >> > > and
>>>> > >> > > > > > going
>>>> > >> > > > > > > up in orders of magnitude from there). So if you are
>>>> timing
>>>> > >> the
>>>> > >> > pi
>>>> > >> > > > > > > calculator (or something like that), its results won't
>>>> > >> > necessarily
>>>> > >> > > be
>>>> > >> > > > > > very
>>>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
>>>> data
>>>> > to
>>>> > >> > > > allocate
>>>> > >> > > > > > one
>>>> > >> > > > > > > per each node, some of the nodes will also just go
>>>> unused.
>>>> > >> > > > > > >
>>>> > >> > > > > > > The best example for you to run is to use randomwriter
>>>> to
>>>> > fill
>>>> > >> up
>>>> > >> > > > your
>>>> > >> > > > > > > cluster with several GB of random data and then run the
>>>> sort
>>>> > >> > > program.
>>>> > >> > > > > If
>>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
>>>> then
>>>> > >> you've
>>>> > >> > > > > > > definitely
>>>> > >> > > > > > > got something strange going on.
>>>> > >> > > > > > >
>>>> > >> > > > > > > - Aaron
>>>> > >> > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>>>> > >> > > mnagendr@asu.edu>
>>>> > >> > > > > > > wrote:
>>>> > >> > > > > > >
>>>> > >> > > > > > > > Hey all
>>>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran
>>>> an
>>>> > >> > examples
>>>> > >> > > on
>>>> > >> > > > > it.
>>>> > >> > > > > > > It
>>>> > >> > > > > > > > was pretty fast, and all the three nodes were being
>>>> used
>>>> > (I
>>>> > >> > > checked
>>>> > >> > > > > the
>>>> > >> > > > > > > log
>>>> > >> > > > > > > > files to make sure that the slaves are utilized).
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15
>>>> nodes. I
>>>> > ran
>>>> > >> > the
>>>> > >> > > > same
>>>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
>>>> task
>>>> > >> seems
>>>> > >> > to
>>>> > >> > > > > take
>>>> > >> > > > > > > > forever! The slaves are not being used for some
>>>> reason.
>>>> > This
>>>> > >> > > second
>>>> > >> > > > > > > cluster
>>>> > >> > > > > > > > has a lower, per node processing power, but should
>>>> that
>>>> > make
>>>> > >> > any
>>>> > >> > > > > > > > difference?
>>>> > >> > > > > > > > How can I ensure that the data is being mapped to all
>>>> the
>>>> > >> > nodes?
>>>> > >> > > > > > > Presently,
>>>> > >> > > > > > > > the only node that seems to be doing all the work is
>>>> the
>>>> > >> Master
>>>> > >> > > > node.
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
>>>> What
>>>> > >> can
>>>> > >> > I
>>>> > >> > > do
>>>> > >> > > > > to
>>>> > >> > > > > > > > setup
>>>> > >> > > > > > > > the cluster to function more efficiently?
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Thanks!
>>>> > >> > > > > > > > Mithila Nagendra
>>>> > >> > > > > > > > Arizona State University
>>>> > >> > > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > > >
>>>> > >> > > > >
>>>> > >> > > >
>>>> > >> > >
>>>> > >> >
>>>> > >>
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>


Ravi
--


Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
The log file runs into thousands of line with the same message being
displayed every time.

On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> following in it:
>
> 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = node19/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
> 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> ************************************************************/
> 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> 192.168.0.18:54310 not available yet, Zzzzz...
> 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/
> 192.168.0.18:54310 not available yet, Zzzzz...
> 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>
>
> Hmmm I still cant figure it out..
>
> Mithila
>
>
> On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>
>> Also, Would the way the port is accessed change if all these node are
>> connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu
>> systems we worked with earlier didnt have a gateway.
>> Mithila
>>
>> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>>
>>> Aaron: Which log file do I look into - there are alot of them. Here s
>>> what the error looks like:
>>> [mithila@node19:~]$ cd hadoop
>>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 0 time(s).
>>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 1 time(s).
>>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 2 time(s).
>>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 3 time(s).
>>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 4 time(s).
>>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 5 time(s).
>>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 6 time(s).
>>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 7 time(s).
>>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 8 time(s).
>>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
>>> 192.168.0.18:54310. Already tried 9 time(s).
>>> Bad connection to FS. command aborted.
>>>
>>> Node19 is a slave and Node18 is the master.
>>>
>>> Mithila
>>>
>>>
>>>
>>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>>
>>>> Are there any error messages in the log files on those nodes?
>>>> - Aaron
>>>>
>>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
>>>> wrote:
>>>>
>>>> > I ve drawn a blank here! Can't figure out what s wrong with the ports.
>>>> I
>>>> > can
>>>> > ssh between the nodes but cant access the DFS from the slaves - says
>>>> "Bad
>>>> > connection to DFS". Master seems to be fine.
>>>> > Mithila
>>>> >
>>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
>>>> > wrote:
>>>> >
>>>> > > Yes I can..
>>>> > >
>>>> > >
>>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
>>>> > >wrote:
>>>> > >
>>>> > >> Can you ssh between the nodes?
>>>> > >>
>>>> > >> -jim
>>>> > >>
>>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>>>> mnagendr@asu.edu>
>>>> > >> wrote:
>>>> > >>
>>>> > >> > Thanks Aaron.
>>>> > >> > Jim: The three clusters I setup had ubuntu running on them and
>>>> the dfs
>>>> > >> was
>>>> > >> > accessed at port 54310. The new cluster which I ve setup has Red
>>>> Hat
>>>> > >> Linux
>>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the
>>>> dfs
>>>> > from
>>>> > >> > one
>>>> > >> > of the slaves i get the following response: dfs cannot be
>>>> accessed.
>>>> > When
>>>> > >> I
>>>> > >> > access the DFS throught the master there s no problem. So I feel
>>>> there
>>>> > a
>>>> > >> > problem with the port. Any ideas? I did check the list of slaves,
>>>> it
>>>> > >> looks
>>>> > >> > fine to me.
>>>> > >> >
>>>> > >> > Mithila
>>>> > >> >
>>>> > >> >
>>>> > >> >
>>>> > >> >
>>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>>>> jim.twensky@gmail.com>
>>>> > >> > wrote:
>>>> > >> >
>>>> > >> > > Mithila,
>>>> > >> > >
>>>> > >> > > You said all the slaves were being utilized in the 3 node
>>>> cluster.
>>>> > >> Which
>>>> > >> > > application did you run to test that and what was your input
>>>> size?
>>>> > If
>>>> > >> you
>>>> > >> > > tried the word count application on a 516 MB input file on both
>>>> > >> cluster
>>>> > >> > > setups, than some of your nodes in the 15 node cluster may not
>>>> be
>>>> > >> running
>>>> > >> > > at
>>>> > >> > > all. Generally, one map job is assigned to each input split and
>>>> if
>>>> > you
>>>> > >> > are
>>>> > >> > > running your cluster with the defaults, the splits are 64 MB
>>>> each. I
>>>> > >> got
>>>> > >> > > confused when you said the Namenode seemed to do all the work.
>>>> Can
>>>> > you
>>>> > >> > > check
>>>> > >> > > conf/slaves and make sure you put the names of all task
>>>> trackers
>>>> > >> there? I
>>>> > >> > > also suggest comparing both clusters with a larger input size,
>>>> say
>>>> > at
>>>> > >> > least
>>>> > >> > > 5 GB, to really see a difference.
>>>> > >> > >
>>>> > >> > > Jim
>>>> > >> > >
>>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>>>> aaron@cloudera.com>
>>>> > >> > wrote:
>>>> > >> > >
>>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
>>>> data
>>>> > >> and
>>>> > >> > > > "sort"
>>>> > >> > > > to sort it.
>>>> > >> > > > - Aaron
>>>> > >> > > >
>>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>>>> > forpankil@gmail.com>
>>>> > >> > > wrote:
>>>> > >> > > >
>>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it
>>>> might be
>>>> > >> > > overhead
>>>> > >> > > > > time of these clusters making your total MR jobs more time
>>>> > >> consuming.
>>>> > >> > > > > I guess you will have to try with larger set of data..
>>>> > >> > > > >
>>>> > >> > > > > Pankil
>>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>>>> > >> mnagendr@asu.edu>
>>>> > >> > > > > wrote:
>>>> > >> > > > >
>>>> > >> > > > > > Aaron
>>>> > >> > > > > >
>>>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
>>>> this
>>>> > >> see
>>>> > >> > a
>>>> > >> > > > bit
>>>> > >> > > > > of
>>>> > >> > > > > > speed up?
>>>> > >> > > > > > Could you guide me to the example? I ll run my cluster on
>>>> it
>>>> > and
>>>> > >> > see
>>>> > >> > > > what
>>>> > >> > > > > I
>>>> > >> > > > > > get. Also for my program I had a java timer running to
>>>> record
>>>> > >> the
>>>> > >> > > time
>>>> > >> > > > > > taken
>>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>>>> > >> > > > > >
>>>> > >> > > > > > Mithila
>>>> > >> > > > > >
>>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>>>> > >> aaron@cloudera.com
>>>> > >> > >
>>>> > >> > > > > wrote:
>>>> > >> > > > > >
>>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop
>>>> are
>>>> > >> designed
>>>> > >> > > to
>>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>>>> ability
>>>> > to
>>>> > >> > > > process
>>>> > >> > > > > > very
>>>> > >> > > > > > > large volumes of data (starting around, say, tens of GB
>>>> per
>>>> > >> job,
>>>> > >> > > and
>>>> > >> > > > > > going
>>>> > >> > > > > > > up in orders of magnitude from there). So if you are
>>>> timing
>>>> > >> the
>>>> > >> > pi
>>>> > >> > > > > > > calculator (or something like that), its results won't
>>>> > >> > necessarily
>>>> > >> > > be
>>>> > >> > > > > > very
>>>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
>>>> data
>>>> > to
>>>> > >> > > > allocate
>>>> > >> > > > > > one
>>>> > >> > > > > > > per each node, some of the nodes will also just go
>>>> unused.
>>>> > >> > > > > > >
>>>> > >> > > > > > > The best example for you to run is to use randomwriter
>>>> to
>>>> > fill
>>>> > >> up
>>>> > >> > > > your
>>>> > >> > > > > > > cluster with several GB of random data and then run the
>>>> sort
>>>> > >> > > program.
>>>> > >> > > > > If
>>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
>>>> then
>>>> > >> you've
>>>> > >> > > > > > > definitely
>>>> > >> > > > > > > got something strange going on.
>>>> > >> > > > > > >
>>>> > >> > > > > > > - Aaron
>>>> > >> > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>>>> > >> > > mnagendr@asu.edu>
>>>> > >> > > > > > > wrote:
>>>> > >> > > > > > >
>>>> > >> > > > > > > > Hey all
>>>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran
>>>> an
>>>> > >> > examples
>>>> > >> > > on
>>>> > >> > > > > it.
>>>> > >> > > > > > > It
>>>> > >> > > > > > > > was pretty fast, and all the three nodes were being
>>>> used
>>>> > (I
>>>> > >> > > checked
>>>> > >> > > > > the
>>>> > >> > > > > > > log
>>>> > >> > > > > > > > files to make sure that the slaves are utilized).
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15
>>>> nodes. I
>>>> > ran
>>>> > >> > the
>>>> > >> > > > same
>>>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
>>>> task
>>>> > >> seems
>>>> > >> > to
>>>> > >> > > > > take
>>>> > >> > > > > > > > forever! The slaves are not being used for some
>>>> reason.
>>>> > This
>>>> > >> > > second
>>>> > >> > > > > > > cluster
>>>> > >> > > > > > > > has a lower, per node processing power, but should
>>>> that
>>>> > make
>>>> > >> > any
>>>> > >> > > > > > > > difference?
>>>> > >> > > > > > > > How can I ensure that the data is being mapped to all
>>>> the
>>>> > >> > nodes?
>>>> > >> > > > > > > Presently,
>>>> > >> > > > > > > > the only node that seems to be doing all the work is
>>>> the
>>>> > >> Master
>>>> > >> > > > node.
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
>>>> What
>>>> > >> can
>>>> > >> > I
>>>> > >> > > do
>>>> > >> > > > > to
>>>> > >> > > > > > > > setup
>>>> > >> > > > > > > > the cluster to function more efficiently?
>>>> > >> > > > > > > >
>>>> > >> > > > > > > > Thanks!
>>>> > >> > > > > > > > Mithila Nagendra
>>>> > >> > > > > > > > Arizona State University
>>>> > >> > > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > > >
>>>> > >> > > > >
>>>> > >> > > >
>>>> > >> > >
>>>> > >> >
>>>> > >>
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
following in it:

2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = node19/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
************************************************************/
2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 0 time(s).
2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 1 time(s).
2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 2 time(s).
2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 3 time(s).
2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 4 time(s).
2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 5 time(s).
2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 6 time(s).
2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 7 time(s).
2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 8 time(s).
2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 9 time(s).
2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/
192.168.0.18:54310 not available yet, Zzzzz...
2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 0 time(s).
2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 1 time(s).
2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 2 time(s).
2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 3 time(s).
2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 4 time(s).
2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 5 time(s).
2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 6 time(s).
2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 7 time(s).
2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 8 time(s).
2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 9 time(s).
2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/
192.168.0.18:54310 not available yet, Zzzzz...
2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 0 time(s).
2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 1 time(s).
2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: node18/192.168.0.18:54310. Already tried 2 time(s).


Hmmm I still cant figure it out..

Mithila


On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Also, Would the way the port is accessed change if all these node are
> connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu
> systems we worked with earlier didnt have a gateway.
> Mithila
>
> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mn...@asu.edu>wrote:
>
>> Aaron: Which log file do I look into - there are alot of them. Here s what
>> the error looks like:
>> [mithila@node19:~]$ cd hadoop
>> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 0 time(s).
>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 1 time(s).
>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 2 time(s).
>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 3 time(s).
>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 4 time(s).
>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 5 time(s).
>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 6 time(s).
>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 7 time(s).
>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 8 time(s).
>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
>> 192.168.0.18:54310. Already tried 9 time(s).
>> Bad connection to FS. command aborted.
>>
>> Node19 is a slave and Node18 is the master.
>>
>> Mithila
>>
>>
>>
>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>
>>> Are there any error messages in the log files on those nodes?
>>> - Aaron
>>>
>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
>>> wrote:
>>>
>>> > I ve drawn a blank here! Can't figure out what s wrong with the ports.
>>> I
>>> > can
>>> > ssh between the nodes but cant access the DFS from the slaves - says
>>> "Bad
>>> > connection to DFS". Master seems to be fine.
>>> > Mithila
>>> >
>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
>>> > wrote:
>>> >
>>> > > Yes I can..
>>> > >
>>> > >
>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
>>> > >wrote:
>>> > >
>>> > >> Can you ssh between the nodes?
>>> > >>
>>> > >> -jim
>>> > >>
>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mnagendr@asu.edu
>>> >
>>> > >> wrote:
>>> > >>
>>> > >> > Thanks Aaron.
>>> > >> > Jim: The three clusters I setup had ubuntu running on them and the
>>> dfs
>>> > >> was
>>> > >> > accessed at port 54310. The new cluster which I ve setup has Red
>>> Hat
>>> > >> Linux
>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the
>>> dfs
>>> > from
>>> > >> > one
>>> > >> > of the slaves i get the following response: dfs cannot be
>>> accessed.
>>> > When
>>> > >> I
>>> > >> > access the DFS throught the master there s no problem. So I feel
>>> there
>>> > a
>>> > >> > problem with the port. Any ideas? I did check the list of slaves,
>>> it
>>> > >> looks
>>> > >> > fine to me.
>>> > >> >
>>> > >> > Mithila
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>>> jim.twensky@gmail.com>
>>> > >> > wrote:
>>> > >> >
>>> > >> > > Mithila,
>>> > >> > >
>>> > >> > > You said all the slaves were being utilized in the 3 node
>>> cluster.
>>> > >> Which
>>> > >> > > application did you run to test that and what was your input
>>> size?
>>> > If
>>> > >> you
>>> > >> > > tried the word count application on a 516 MB input file on both
>>> > >> cluster
>>> > >> > > setups, than some of your nodes in the 15 node cluster may not
>>> be
>>> > >> running
>>> > >> > > at
>>> > >> > > all. Generally, one map job is assigned to each input split and
>>> if
>>> > you
>>> > >> > are
>>> > >> > > running your cluster with the defaults, the splits are 64 MB
>>> each. I
>>> > >> got
>>> > >> > > confused when you said the Namenode seemed to do all the work.
>>> Can
>>> > you
>>> > >> > > check
>>> > >> > > conf/slaves and make sure you put the names of all task trackers
>>> > >> there? I
>>> > >> > > also suggest comparing both clusters with a larger input size,
>>> say
>>> > at
>>> > >> > least
>>> > >> > > 5 GB, to really see a difference.
>>> > >> > >
>>> > >> > > Jim
>>> > >> > >
>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>>> aaron@cloudera.com>
>>> > >> > wrote:
>>> > >> > >
>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
>>> data
>>> > >> and
>>> > >> > > > "sort"
>>> > >> > > > to sort it.
>>> > >> > > > - Aaron
>>> > >> > > >
>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>>> > forpankil@gmail.com>
>>> > >> > > wrote:
>>> > >> > > >
>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it might
>>> be
>>> > >> > > overhead
>>> > >> > > > > time of these clusters making your total MR jobs more time
>>> > >> consuming.
>>> > >> > > > > I guess you will have to try with larger set of data..
>>> > >> > > > >
>>> > >> > > > > Pankil
>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>>> > >> mnagendr@asu.edu>
>>> > >> > > > > wrote:
>>> > >> > > > >
>>> > >> > > > > > Aaron
>>> > >> > > > > >
>>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
>>> this
>>> > >> see
>>> > >> > a
>>> > >> > > > bit
>>> > >> > > > > of
>>> > >> > > > > > speed up?
>>> > >> > > > > > Could you guide me to the example? I ll run my cluster on
>>> it
>>> > and
>>> > >> > see
>>> > >> > > > what
>>> > >> > > > > I
>>> > >> > > > > > get. Also for my program I had a java timer running to
>>> record
>>> > >> the
>>> > >> > > time
>>> > >> > > > > > taken
>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>>> > >> > > > > >
>>> > >> > > > > > Mithila
>>> > >> > > > > >
>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>>> > >> aaron@cloudera.com
>>> > >> > >
>>> > >> > > > > wrote:
>>> > >> > > > > >
>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop are
>>> > >> designed
>>> > >> > > to
>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>>> ability
>>> > to
>>> > >> > > > process
>>> > >> > > > > > very
>>> > >> > > > > > > large volumes of data (starting around, say, tens of GB
>>> per
>>> > >> job,
>>> > >> > > and
>>> > >> > > > > > going
>>> > >> > > > > > > up in orders of magnitude from there). So if you are
>>> timing
>>> > >> the
>>> > >> > pi
>>> > >> > > > > > > calculator (or something like that), its results won't
>>> > >> > necessarily
>>> > >> > > be
>>> > >> > > > > > very
>>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
>>> data
>>> > to
>>> > >> > > > allocate
>>> > >> > > > > > one
>>> > >> > > > > > > per each node, some of the nodes will also just go
>>> unused.
>>> > >> > > > > > >
>>> > >> > > > > > > The best example for you to run is to use randomwriter
>>> to
>>> > fill
>>> > >> up
>>> > >> > > > your
>>> > >> > > > > > > cluster with several GB of random data and then run the
>>> sort
>>> > >> > > program.
>>> > >> > > > > If
>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
>>> then
>>> > >> you've
>>> > >> > > > > > > definitely
>>> > >> > > > > > > got something strange going on.
>>> > >> > > > > > >
>>> > >> > > > > > > - Aaron
>>> > >> > > > > > >
>>> > >> > > > > > >
>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>>> > >> > > mnagendr@asu.edu>
>>> > >> > > > > > > wrote:
>>> > >> > > > > > >
>>> > >> > > > > > > > Hey all
>>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran
>>> an
>>> > >> > examples
>>> > >> > > on
>>> > >> > > > > it.
>>> > >> > > > > > > It
>>> > >> > > > > > > > was pretty fast, and all the three nodes were being
>>> used
>>> > (I
>>> > >> > > checked
>>> > >> > > > > the
>>> > >> > > > > > > log
>>> > >> > > > > > > > files to make sure that the slaves are utilized).
>>> > >> > > > > > > >
>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes.
>>> I
>>> > ran
>>> > >> > the
>>> > >> > > > same
>>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
>>> task
>>> > >> seems
>>> > >> > to
>>> > >> > > > > take
>>> > >> > > > > > > > forever! The slaves are not being used for some
>>> reason.
>>> > This
>>> > >> > > second
>>> > >> > > > > > > cluster
>>> > >> > > > > > > > has a lower, per node processing power, but should
>>> that
>>> > make
>>> > >> > any
>>> > >> > > > > > > > difference?
>>> > >> > > > > > > > How can I ensure that the data is being mapped to all
>>> the
>>> > >> > nodes?
>>> > >> > > > > > > Presently,
>>> > >> > > > > > > > the only node that seems to be doing all the work is
>>> the
>>> > >> Master
>>> > >> > > > node.
>>> > >> > > > > > > >
>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
>>> What
>>> > >> can
>>> > >> > I
>>> > >> > > do
>>> > >> > > > > to
>>> > >> > > > > > > > setup
>>> > >> > > > > > > > the cluster to function more efficiently?
>>> > >> > > > > > > >
>>> > >> > > > > > > > Thanks!
>>> > >> > > > > > > > Mithila Nagendra
>>> > >> > > > > > > > Arizona State University
>>> > >> > > > > > > >
>>> > >> > > > > > >
>>> > >> > > > > >
>>> > >> > > > >
>>> > >> > > >
>>> > >> > >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Also, Would the way the port is accessed change if all these node are
connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu
systems we worked with earlier didnt have a gateway.
Mithila

On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Aaron: Which log file do I look into - there are alot of them. Here s what
> the error looks like:
> [mithila@node19:~]$ cd hadoop
> [mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 0 time(s).
> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 1 time(s).
> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 2 time(s).
> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 3 time(s).
> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 4 time(s).
> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 5 time(s).
> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 6 time(s).
> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 7 time(s).
> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 8 time(s).
> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 9 time(s).
> Bad connection to FS. command aborted.
>
> Node19 is a slave and Node18 is the master.
>
> Mithila
>
>
>
> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
>> Are there any error messages in the log files on those nodes?
>> - Aaron
>>
>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > I ve drawn a blank here! Can't figure out what s wrong with the ports. I
>> > can
>> > ssh between the nodes but cant access the DFS from the slaves - says
>> "Bad
>> > connection to DFS". Master seems to be fine.
>> > Mithila
>> >
>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
>> > wrote:
>> >
>> > > Yes I can..
>> > >
>> > >
>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
>> > >wrote:
>> > >
>> > >> Can you ssh between the nodes?
>> > >>
>> > >> -jim
>> > >>
>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu>
>> > >> wrote:
>> > >>
>> > >> > Thanks Aaron.
>> > >> > Jim: The three clusters I setup had ubuntu running on them and the
>> dfs
>> > >> was
>> > >> > accessed at port 54310. The new cluster which I ve setup has Red
>> Hat
>> > >> Linux
>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the dfs
>> > from
>> > >> > one
>> > >> > of the slaves i get the following response: dfs cannot be accessed.
>> > When
>> > >> I
>> > >> > access the DFS throught the master there s no problem. So I feel
>> there
>> > a
>> > >> > problem with the port. Any ideas? I did check the list of slaves,
>> it
>> > >> looks
>> > >> > fine to me.
>> > >> >
>> > >> > Mithila
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> jim.twensky@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > Mithila,
>> > >> > >
>> > >> > > You said all the slaves were being utilized in the 3 node
>> cluster.
>> > >> Which
>> > >> > > application did you run to test that and what was your input
>> size?
>> > If
>> > >> you
>> > >> > > tried the word count application on a 516 MB input file on both
>> > >> cluster
>> > >> > > setups, than some of your nodes in the 15 node cluster may not be
>> > >> running
>> > >> > > at
>> > >> > > all. Generally, one map job is assigned to each input split and
>> if
>> > you
>> > >> > are
>> > >> > > running your cluster with the defaults, the splits are 64 MB
>> each. I
>> > >> got
>> > >> > > confused when you said the Namenode seemed to do all the work.
>> Can
>> > you
>> > >> > > check
>> > >> > > conf/slaves and make sure you put the names of all task trackers
>> > >> there? I
>> > >> > > also suggest comparing both clusters with a larger input size,
>> say
>> > at
>> > >> > least
>> > >> > > 5 GB, to really see a difference.
>> > >> > >
>> > >> > > Jim
>> > >> > >
>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> aaron@cloudera.com>
>> > >> > wrote:
>> > >> > >
>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
>> data
>> > >> and
>> > >> > > > "sort"
>> > >> > > > to sort it.
>> > >> > > > - Aaron
>> > >> > > >
>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> > forpankil@gmail.com>
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > > Your data is too small I guess for 15 clusters ..So it might
>> be
>> > >> > > overhead
>> > >> > > > > time of these clusters making your total MR jobs more time
>> > >> consuming.
>> > >> > > > > I guess you will have to try with larger set of data..
>> > >> > > > >
>> > >> > > > > Pankil
>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>> > >> mnagendr@asu.edu>
>> > >> > > > > wrote:
>> > >> > > > >
>> > >> > > > > > Aaron
>> > >> > > > > >
>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
>> this
>> > >> see
>> > >> > a
>> > >> > > > bit
>> > >> > > > > of
>> > >> > > > > > speed up?
>> > >> > > > > > Could you guide me to the example? I ll run my cluster on
>> it
>> > and
>> > >> > see
>> > >> > > > what
>> > >> > > > > I
>> > >> > > > > > get. Also for my program I had a java timer running to
>> record
>> > >> the
>> > >> > > time
>> > >> > > > > > taken
>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>> > >> > > > > >
>> > >> > > > > > Mithila
>> > >> > > > > >
>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>> > >> aaron@cloudera.com
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > > >
>> > >> > > > > > > Virtually none of the examples that ship with Hadoop are
>> > >> designed
>> > >> > > to
>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>> ability
>> > to
>> > >> > > > process
>> > >> > > > > > very
>> > >> > > > > > > large volumes of data (starting around, say, tens of GB
>> per
>> > >> job,
>> > >> > > and
>> > >> > > > > > going
>> > >> > > > > > > up in orders of magnitude from there). So if you are
>> timing
>> > >> the
>> > >> > pi
>> > >> > > > > > > calculator (or something like that), its results won't
>> > >> > necessarily
>> > >> > > be
>> > >> > > > > > very
>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
>> data
>> > to
>> > >> > > > allocate
>> > >> > > > > > one
>> > >> > > > > > > per each node, some of the nodes will also just go
>> unused.
>> > >> > > > > > >
>> > >> > > > > > > The best example for you to run is to use randomwriter to
>> > fill
>> > >> up
>> > >> > > > your
>> > >> > > > > > > cluster with several GB of random data and then run the
>> sort
>> > >> > > program.
>> > >> > > > > If
>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
>> then
>> > >> you've
>> > >> > > > > > > definitely
>> > >> > > > > > > got something strange going on.
>> > >> > > > > > >
>> > >> > > > > > > - Aaron
>> > >> > > > > > >
>> > >> > > > > > >
>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>> > >> > > mnagendr@asu.edu>
>> > >> > > > > > > wrote:
>> > >> > > > > > >
>> > >> > > > > > > > Hey all
>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran an
>> > >> > examples
>> > >> > > on
>> > >> > > > > it.
>> > >> > > > > > > It
>> > >> > > > > > > > was pretty fast, and all the three nodes were being
>> used
>> > (I
>> > >> > > checked
>> > >> > > > > the
>> > >> > > > > > > log
>> > >> > > > > > > > files to make sure that the slaves are utilized).
>> > >> > > > > > > >
>> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes.
>> I
>> > ran
>> > >> > the
>> > >> > > > same
>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
>> task
>> > >> seems
>> > >> > to
>> > >> > > > > take
>> > >> > > > > > > > forever! The slaves are not being used for some reason.
>> > This
>> > >> > > second
>> > >> > > > > > > cluster
>> > >> > > > > > > > has a lower, per node processing power, but should that
>> > make
>> > >> > any
>> > >> > > > > > > > difference?
>> > >> > > > > > > > How can I ensure that the data is being mapped to all
>> the
>> > >> > nodes?
>> > >> > > > > > > Presently,
>> > >> > > > > > > > the only node that seems to be doing all the work is
>> the
>> > >> Master
>> > >> > > > node.
>> > >> > > > > > > >
>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
>> What
>> > >> can
>> > >> > I
>> > >> > > do
>> > >> > > > > to
>> > >> > > > > > > > setup
>> > >> > > > > > > > the cluster to function more efficiently?
>> > >> > > > > > > >
>> > >> > > > > > > > Thanks!
>> > >> > > > > > > > Mithila Nagendra
>> > >> > > > > > > > Arizona State University
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Aaron: Which log file do I look into - there are alot of them. Here s what
the error looks like:
[mithila@node19:~]$ cd hadoop
[mithila@node19:~/hadoop]$ bin/hadoop dfs -ls
09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 0 time(s).
09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 1 time(s).
09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 2 time(s).
09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 3 time(s).
09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 4 time(s).
09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 5 time(s).
09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 6 time(s).
09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 7 time(s).
09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 8 time(s).
09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
192.168.0.18:54310. Already tried 9 time(s).
Bad connection to FS. command aborted.

Node19 is a slave and Node18 is the master.

Mithila



On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Are there any error messages in the log files on those nodes?
> - Aaron
>
> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > I ve drawn a blank here! Can't figure out what s wrong with the ports. I
> > can
> > ssh between the nodes but cant access the DFS from the slaves - says "Bad
> > connection to DFS". Master seems to be fine.
> > Mithila
> >
> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
> > wrote:
> >
> > > Yes I can..
> > >
> > >
> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
> > >wrote:
> > >
> > >> Can you ssh between the nodes?
> > >>
> > >> -jim
> > >>
> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu>
> > >> wrote:
> > >>
> > >> > Thanks Aaron.
> > >> > Jim: The three clusters I setup had ubuntu running on them and the
> dfs
> > >> was
> > >> > accessed at port 54310. The new cluster which I ve setup has Red Hat
> > >> Linux
> > >> > release 7.2 (Enigma)running on it. Now when I try to access the dfs
> > from
> > >> > one
> > >> > of the slaves i get the following response: dfs cannot be accessed.
> > When
> > >> I
> > >> > access the DFS throught the master there s no problem. So I feel
> there
> > a
> > >> > problem with the port. Any ideas? I did check the list of slaves, it
> > >> looks
> > >> > fine to me.
> > >> >
> > >> > Mithila
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <jim.twensky@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Mithila,
> > >> > >
> > >> > > You said all the slaves were being utilized in the 3 node cluster.
> > >> Which
> > >> > > application did you run to test that and what was your input size?
> > If
> > >> you
> > >> > > tried the word count application on a 516 MB input file on both
> > >> cluster
> > >> > > setups, than some of your nodes in the 15 node cluster may not be
> > >> running
> > >> > > at
> > >> > > all. Generally, one map job is assigned to each input split and if
> > you
> > >> > are
> > >> > > running your cluster with the defaults, the splits are 64 MB each.
> I
> > >> got
> > >> > > confused when you said the Namenode seemed to do all the work. Can
> > you
> > >> > > check
> > >> > > conf/slaves and make sure you put the names of all task trackers
> > >> there? I
> > >> > > also suggest comparing both clusters with a larger input size, say
> > at
> > >> > least
> > >> > > 5 GB, to really see a difference.
> > >> > >
> > >> > > Jim
> > >> > >
> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> aaron@cloudera.com>
> > >> > wrote:
> > >> > >
> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
> data
> > >> and
> > >> > > > "sort"
> > >> > > > to sort it.
> > >> > > > - Aaron
> > >> > > >
> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> > forpankil@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Your data is too small I guess for 15 clusters ..So it might
> be
> > >> > > overhead
> > >> > > > > time of these clusters making your total MR jobs more time
> > >> consuming.
> > >> > > > > I guess you will have to try with larger set of data..
> > >> > > > >
> > >> > > > > Pankil
> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> > >> mnagendr@asu.edu>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Aaron
> > >> > > > > >
> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
> this
> > >> see
> > >> > a
> > >> > > > bit
> > >> > > > > of
> > >> > > > > > speed up?
> > >> > > > > > Could you guide me to the example? I ll run my cluster on it
> > and
> > >> > see
> > >> > > > what
> > >> > > > > I
> > >> > > > > > get. Also for my program I had a java timer running to
> record
> > >> the
> > >> > > time
> > >> > > > > > taken
> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
> > >> > > > > >
> > >> > > > > > Mithila
> > >> > > > > >
> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> > >> aaron@cloudera.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Virtually none of the examples that ship with Hadoop are
> > >> designed
> > >> > > to
> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
> ability
> > to
> > >> > > > process
> > >> > > > > > very
> > >> > > > > > > large volumes of data (starting around, say, tens of GB
> per
> > >> job,
> > >> > > and
> > >> > > > > > going
> > >> > > > > > > up in orders of magnitude from there). So if you are
> timing
> > >> the
> > >> > pi
> > >> > > > > > > calculator (or something like that), its results won't
> > >> > necessarily
> > >> > > be
> > >> > > > > > very
> > >> > > > > > > consistent. If a job doesn't have enough fragments of data
> > to
> > >> > > > allocate
> > >> > > > > > one
> > >> > > > > > > per each node, some of the nodes will also just go unused.
> > >> > > > > > >
> > >> > > > > > > The best example for you to run is to use randomwriter to
> > fill
> > >> up
> > >> > > > your
> > >> > > > > > > cluster with several GB of random data and then run the
> sort
> > >> > > program.
> > >> > > > > If
> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15, then
> > >> you've
> > >> > > > > > > definitely
> > >> > > > > > > got something strange going on.
> > >> > > > > > >
> > >> > > > > > > - Aaron
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> > >> > > mnagendr@asu.edu>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hey all
> > >> > > > > > > > I recently setup a three node hadoop cluster and ran an
> > >> > examples
> > >> > > on
> > >> > > > > it.
> > >> > > > > > > It
> > >> > > > > > > > was pretty fast, and all the three nodes were being used
> > (I
> > >> > > checked
> > >> > > > > the
> > >> > > > > > > log
> > >> > > > > > > > files to make sure that the slaves are utilized).
> > >> > > > > > > >
> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I
> > ran
> > >> > the
> > >> > > > same
> > >> > > > > > > > example, but instead of speeding up, the map-reduce task
> > >> seems
> > >> > to
> > >> > > > > take
> > >> > > > > > > > forever! The slaves are not being used for some reason.
> > This
> > >> > > second
> > >> > > > > > > cluster
> > >> > > > > > > > has a lower, per node processing power, but should that
> > make
> > >> > any
> > >> > > > > > > > difference?
> > >> > > > > > > > How can I ensure that the data is being mapped to all
> the
> > >> > nodes?
> > >> > > > > > > Presently,
> > >> > > > > > > > the only node that seems to be doing all the work is the
> > >> Master
> > >> > > > node.
> > >> > > > > > > >
> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
> What
> > >> can
> > >> > I
> > >> > > do
> > >> > > > > to
> > >> > > > > > > > setup
> > >> > > > > > > > the cluster to function more efficiently?
> > >> > > > > > > >
> > >> > > > > > > > Thanks!
> > >> > > > > > > > Mithila Nagendra
> > >> > > > > > > > Arizona State University
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Aaron Kimball <aa...@cloudera.com>.
Are there any error messages in the log files on those nodes?
- Aaron

On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> I ve drawn a blank here! Can't figure out what s wrong with the ports. I
> can
> ssh between the nodes but cant access the DFS from the slaves - says "Bad
> connection to DFS". Master seems to be fine.
> Mithila
>
> On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Yes I can..
> >
> >
> > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <jim.twensky@gmail.com
> >wrote:
> >
> >> Can you ssh between the nodes?
> >>
> >> -jim
> >>
> >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu>
> >> wrote:
> >>
> >> > Thanks Aaron.
> >> > Jim: The three clusters I setup had ubuntu running on them and the dfs
> >> was
> >> > accessed at port 54310. The new cluster which I ve setup has Red Hat
> >> Linux
> >> > release 7.2 (Enigma)running on it. Now when I try to access the dfs
> from
> >> > one
> >> > of the slaves i get the following response: dfs cannot be accessed.
> When
> >> I
> >> > access the DFS throught the master there s no problem. So I feel there
> a
> >> > problem with the port. Any ideas? I did check the list of slaves, it
> >> looks
> >> > fine to me.
> >> >
> >> > Mithila
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <ji...@gmail.com>
> >> > wrote:
> >> >
> >> > > Mithila,
> >> > >
> >> > > You said all the slaves were being utilized in the 3 node cluster.
> >> Which
> >> > > application did you run to test that and what was your input size?
> If
> >> you
> >> > > tried the word count application on a 516 MB input file on both
> >> cluster
> >> > > setups, than some of your nodes in the 15 node cluster may not be
> >> running
> >> > > at
> >> > > all. Generally, one map job is assigned to each input split and if
> you
> >> > are
> >> > > running your cluster with the defaults, the splits are 64 MB each. I
> >> got
> >> > > confused when you said the Namenode seemed to do all the work. Can
> you
> >> > > check
> >> > > conf/slaves and make sure you put the names of all task trackers
> >> there? I
> >> > > also suggest comparing both clusters with a larger input size, say
> at
> >> > least
> >> > > 5 GB, to really see a difference.
> >> > >
> >> > > Jim
> >> > >
> >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com>
> >> > wrote:
> >> > >
> >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the data
> >> and
> >> > > > "sort"
> >> > > > to sort it.
> >> > > > - Aaron
> >> > > >
> >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
> forpankil@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Your data is too small I guess for 15 clusters ..So it might be
> >> > > overhead
> >> > > > > time of these clusters making your total MR jobs more time
> >> consuming.
> >> > > > > I guess you will have to try with larger set of data..
> >> > > > >
> >> > > > > Pankil
> >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> >> mnagendr@asu.edu>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Aaron
> >> > > > > >
> >> > > > > > That could be the issue, my data is just 516MB - wouldn't this
> >> see
> >> > a
> >> > > > bit
> >> > > > > of
> >> > > > > > speed up?
> >> > > > > > Could you guide me to the example? I ll run my cluster on it
> and
> >> > see
> >> > > > what
> >> > > > > I
> >> > > > > > get. Also for my program I had a java timer running to record
> >> the
> >> > > time
> >> > > > > > taken
> >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
> >> > > > > >
> >> > > > > > Mithila
> >> > > > > >
> >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> >> aaron@cloudera.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Virtually none of the examples that ship with Hadoop are
> >> designed
> >> > > to
> >> > > > > > > showcase its speed. Hadoop's speedup comes from its ability
> to
> >> > > > process
> >> > > > > > very
> >> > > > > > > large volumes of data (starting around, say, tens of GB per
> >> job,
> >> > > and
> >> > > > > > going
> >> > > > > > > up in orders of magnitude from there). So if you are timing
> >> the
> >> > pi
> >> > > > > > > calculator (or something like that), its results won't
> >> > necessarily
> >> > > be
> >> > > > > > very
> >> > > > > > > consistent. If a job doesn't have enough fragments of data
> to
> >> > > > allocate
> >> > > > > > one
> >> > > > > > > per each node, some of the nodes will also just go unused.
> >> > > > > > >
> >> > > > > > > The best example for you to run is to use randomwriter to
> fill
> >> up
> >> > > > your
> >> > > > > > > cluster with several GB of random data and then run the sort
> >> > > program.
> >> > > > > If
> >> > > > > > > that doesn't scale up performance from 3 nodes to 15, then
> >> you've
> >> > > > > > > definitely
> >> > > > > > > got something strange going on.
> >> > > > > > >
> >> > > > > > > - Aaron
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> >> > > mnagendr@asu.edu>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hey all
> >> > > > > > > > I recently setup a three node hadoop cluster and ran an
> >> > examples
> >> > > on
> >> > > > > it.
> >> > > > > > > It
> >> > > > > > > > was pretty fast, and all the three nodes were being used
> (I
> >> > > checked
> >> > > > > the
> >> > > > > > > log
> >> > > > > > > > files to make sure that the slaves are utilized).
> >> > > > > > > >
> >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I
> ran
> >> > the
> >> > > > same
> >> > > > > > > > example, but instead of speeding up, the map-reduce task
> >> seems
> >> > to
> >> > > > > take
> >> > > > > > > > forever! The slaves are not being used for some reason.
> This
> >> > > second
> >> > > > > > > cluster
> >> > > > > > > > has a lower, per node processing power, but should that
> make
> >> > any
> >> > > > > > > > difference?
> >> > > > > > > > How can I ensure that the data is being mapped to all the
> >> > nodes?
> >> > > > > > > Presently,
> >> > > > > > > > the only node that seems to be doing all the work is the
> >> Master
> >> > > > node.
> >> > > > > > > >
> >> > > > > > > > Does 15 nodes in a cluster increase the network cost? What
> >> can
> >> > I
> >> > > do
> >> > > > > to
> >> > > > > > > > setup
> >> > > > > > > > the cluster to function more efficiently?
> >> > > > > > > >
> >> > > > > > > > Thanks!
> >> > > > > > > > Mithila Nagendra
> >> > > > > > > > Arizona State University
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
I ve drawn a blank here! Can't figure out what s wrong with the ports. I can
ssh between the nodes but cant access the DFS from the slaves - says "Bad
connection to DFS". Master seems to be fine.
Mithila

On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> Yes I can..
>
>
> On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> Can you ssh between the nodes?
>>
>> -jim
>>
>> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu>
>> wrote:
>>
>> > Thanks Aaron.
>> > Jim: The three clusters I setup had ubuntu running on them and the dfs
>> was
>> > accessed at port 54310. The new cluster which I ve setup has Red Hat
>> Linux
>> > release 7.2 (Enigma)running on it. Now when I try to access the dfs from
>> > one
>> > of the slaves i get the following response: dfs cannot be accessed. When
>> I
>> > access the DFS throught the master there s no problem. So I feel there a
>> > problem with the port. Any ideas? I did check the list of slaves, it
>> looks
>> > fine to me.
>> >
>> > Mithila
>> >
>> >
>> >
>> >
>> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <ji...@gmail.com>
>> > wrote:
>> >
>> > > Mithila,
>> > >
>> > > You said all the slaves were being utilized in the 3 node cluster.
>> Which
>> > > application did you run to test that and what was your input size? If
>> you
>> > > tried the word count application on a 516 MB input file on both
>> cluster
>> > > setups, than some of your nodes in the 15 node cluster may not be
>> running
>> > > at
>> > > all. Generally, one map job is assigned to each input split and if you
>> > are
>> > > running your cluster with the defaults, the splits are 64 MB each. I
>> got
>> > > confused when you said the Namenode seemed to do all the work. Can you
>> > > check
>> > > conf/slaves and make sure you put the names of all task trackers
>> there? I
>> > > also suggest comparing both clusters with a larger input size, say at
>> > least
>> > > 5 GB, to really see a difference.
>> > >
>> > > Jim
>> > >
>> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com>
>> > wrote:
>> > >
>> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the data
>> and
>> > > > "sort"
>> > > > to sort it.
>> > > > - Aaron
>> > > >
>> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Your data is too small I guess for 15 clusters ..So it might be
>> > > overhead
>> > > > > time of these clusters making your total MR jobs more time
>> consuming.
>> > > > > I guess you will have to try with larger set of data..
>> > > > >
>> > > > > Pankil
>> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>> mnagendr@asu.edu>
>> > > > > wrote:
>> > > > >
>> > > > > > Aaron
>> > > > > >
>> > > > > > That could be the issue, my data is just 516MB - wouldn't this
>> see
>> > a
>> > > > bit
>> > > > > of
>> > > > > > speed up?
>> > > > > > Could you guide me to the example? I ll run my cluster on it and
>> > see
>> > > > what
>> > > > > I
>> > > > > > get. Also for my program I had a java timer running to record
>> the
>> > > time
>> > > > > > taken
>> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>> > > > > >
>> > > > > > Mithila
>> > > > > >
>> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>> aaron@cloudera.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Virtually none of the examples that ship with Hadoop are
>> designed
>> > > to
>> > > > > > > showcase its speed. Hadoop's speedup comes from its ability to
>> > > > process
>> > > > > > very
>> > > > > > > large volumes of data (starting around, say, tens of GB per
>> job,
>> > > and
>> > > > > > going
>> > > > > > > up in orders of magnitude from there). So if you are timing
>> the
>> > pi
>> > > > > > > calculator (or something like that), its results won't
>> > necessarily
>> > > be
>> > > > > > very
>> > > > > > > consistent. If a job doesn't have enough fragments of data to
>> > > > allocate
>> > > > > > one
>> > > > > > > per each node, some of the nodes will also just go unused.
>> > > > > > >
>> > > > > > > The best example for you to run is to use randomwriter to fill
>> up
>> > > > your
>> > > > > > > cluster with several GB of random data and then run the sort
>> > > program.
>> > > > > If
>> > > > > > > that doesn't scale up performance from 3 nodes to 15, then
>> you've
>> > > > > > > definitely
>> > > > > > > got something strange going on.
>> > > > > > >
>> > > > > > > - Aaron
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>> > > mnagendr@asu.edu>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey all
>> > > > > > > > I recently setup a three node hadoop cluster and ran an
>> > examples
>> > > on
>> > > > > it.
>> > > > > > > It
>> > > > > > > > was pretty fast, and all the three nodes were being used (I
>> > > checked
>> > > > > the
>> > > > > > > log
>> > > > > > > > files to make sure that the slaves are utilized).
>> > > > > > > >
>> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran
>> > the
>> > > > same
>> > > > > > > > example, but instead of speeding up, the map-reduce task
>> seems
>> > to
>> > > > > take
>> > > > > > > > forever! The slaves are not being used for some reason. This
>> > > second
>> > > > > > > cluster
>> > > > > > > > has a lower, per node processing power, but should that make
>> > any
>> > > > > > > > difference?
>> > > > > > > > How can I ensure that the data is being mapped to all the
>> > nodes?
>> > > > > > > Presently,
>> > > > > > > > the only node that seems to be doing all the work is the
>> Master
>> > > > node.
>> > > > > > > >
>> > > > > > > > Does 15 nodes in a cluster increase the network cost? What
>> can
>> > I
>> > > do
>> > > > > to
>> > > > > > > > setup
>> > > > > > > > the cluster to function more efficiently?
>> > > > > > > >
>> > > > > > > > Thanks!
>> > > > > > > > Mithila Nagendra
>> > > > > > > > Arizona State University
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Yes I can..

On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <ji...@gmail.com> wrote:

> Can you ssh between the nodes?
>
> -jim
>
> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Thanks Aaron.
> > Jim: The three clusters I setup had ubuntu running on them and the dfs
> was
> > accessed at port 54310. The new cluster which I ve setup has Red Hat
> Linux
> > release 7.2 (Enigma)running on it. Now when I try to access the dfs from
> > one
> > of the slaves i get the following response: dfs cannot be accessed. When
> I
> > access the DFS throught the master there s no problem. So I feel there a
> > problem with the port. Any ideas? I did check the list of slaves, it
> looks
> > fine to me.
> >
> > Mithila
> >
> >
> >
> >
> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <ji...@gmail.com>
> > wrote:
> >
> > > Mithila,
> > >
> > > You said all the slaves were being utilized in the 3 node cluster.
> Which
> > > application did you run to test that and what was your input size? If
> you
> > > tried the word count application on a 516 MB input file on both cluster
> > > setups, than some of your nodes in the 15 node cluster may not be
> running
> > > at
> > > all. Generally, one map job is assigned to each input split and if you
> > are
> > > running your cluster with the defaults, the splits are 64 MB each. I
> got
> > > confused when you said the Namenode seemed to do all the work. Can you
> > > check
> > > conf/slaves and make sure you put the names of all task trackers there?
> I
> > > also suggest comparing both clusters with a larger input size, say at
> > least
> > > 5 GB, to really see a difference.
> > >
> > > Jim
> > >
> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com>
> > wrote:
> > >
> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the data and
> > > > "sort"
> > > > to sort it.
> > > > - Aaron
> > > >
> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com>
> > > wrote:
> > > >
> > > > > Your data is too small I guess for 15 clusters ..So it might be
> > > overhead
> > > > > time of these clusters making your total MR jobs more time
> consuming.
> > > > > I guess you will have to try with larger set of data..
> > > > >
> > > > > Pankil
> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
> mnagendr@asu.edu>
> > > > > wrote:
> > > > >
> > > > > > Aaron
> > > > > >
> > > > > > That could be the issue, my data is just 516MB - wouldn't this
> see
> > a
> > > > bit
> > > > > of
> > > > > > speed up?
> > > > > > Could you guide me to the example? I ll run my cluster on it and
> > see
> > > > what
> > > > > I
> > > > > > get. Also for my program I had a java timer running to record the
> > > time
> > > > > > taken
> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
> > > > > >
> > > > > > Mithila
> > > > > >
> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
> aaron@cloudera.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Virtually none of the examples that ship with Hadoop are
> designed
> > > to
> > > > > > > showcase its speed. Hadoop's speedup comes from its ability to
> > > > process
> > > > > > very
> > > > > > > large volumes of data (starting around, say, tens of GB per
> job,
> > > and
> > > > > > going
> > > > > > > up in orders of magnitude from there). So if you are timing the
> > pi
> > > > > > > calculator (or something like that), its results won't
> > necessarily
> > > be
> > > > > > very
> > > > > > > consistent. If a job doesn't have enough fragments of data to
> > > > allocate
> > > > > > one
> > > > > > > per each node, some of the nodes will also just go unused.
> > > > > > >
> > > > > > > The best example for you to run is to use randomwriter to fill
> up
> > > > your
> > > > > > > cluster with several GB of random data and then run the sort
> > > program.
> > > > > If
> > > > > > > that doesn't scale up performance from 3 nodes to 15, then
> you've
> > > > > > > definitely
> > > > > > > got something strange going on.
> > > > > > >
> > > > > > > - Aaron
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> > > mnagendr@asu.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey all
> > > > > > > > I recently setup a three node hadoop cluster and ran an
> > examples
> > > on
> > > > > it.
> > > > > > > It
> > > > > > > > was pretty fast, and all the three nodes were being used (I
> > > checked
> > > > > the
> > > > > > > log
> > > > > > > > files to make sure that the slaves are utilized).
> > > > > > > >
> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran
> > the
> > > > same
> > > > > > > > example, but instead of speeding up, the map-reduce task
> seems
> > to
> > > > > take
> > > > > > > > forever! The slaves are not being used for some reason. This
> > > second
> > > > > > > cluster
> > > > > > > > has a lower, per node processing power, but should that make
> > any
> > > > > > > > difference?
> > > > > > > > How can I ensure that the data is being mapped to all the
> > nodes?
> > > > > > > Presently,
> > > > > > > > the only node that seems to be doing all the work is the
> Master
> > > > node.
> > > > > > > >
> > > > > > > > Does 15 nodes in a cluster increase the network cost? What
> can
> > I
> > > do
> > > > > to
> > > > > > > > setup
> > > > > > > > the cluster to function more efficiently?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Mithila Nagendra
> > > > > > > > Arizona State University
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Jim Twensky <ji...@gmail.com>.
Can you ssh between the nodes?

-jim

On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Thanks Aaron.
> Jim: The three clusters I setup had ubuntu running on them and the dfs was
> accessed at port 54310. The new cluster which I ve setup has Red Hat Linux
> release 7.2 (Enigma)running on it. Now when I try to access the dfs from
> one
> of the slaves i get the following response: dfs cannot be accessed. When I
> access the DFS throught the master there s no problem. So I feel there a
> problem with the port. Any ideas? I did check the list of slaves, it looks
> fine to me.
>
> Mithila
>
>
>
>
> On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Mithila,
> >
> > You said all the slaves were being utilized in the 3 node cluster. Which
> > application did you run to test that and what was your input size? If you
> > tried the word count application on a 516 MB input file on both cluster
> > setups, than some of your nodes in the 15 node cluster may not be running
> > at
> > all. Generally, one map job is assigned to each input split and if you
> are
> > running your cluster with the defaults, the splits are 64 MB each. I got
> > confused when you said the Namenode seemed to do all the work. Can you
> > check
> > conf/slaves and make sure you put the names of all task trackers there? I
> > also suggest comparing both clusters with a larger input size, say at
> least
> > 5 GB, to really see a difference.
> >
> > Jim
> >
> > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com>
> wrote:
> >
> > > in hadoop-*-examples.jar, use "randomwriter" to generate the data and
> > > "sort"
> > > to sort it.
> > > - Aaron
> > >
> > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com>
> > wrote:
> > >
> > > > Your data is too small I guess for 15 clusters ..So it might be
> > overhead
> > > > time of these clusters making your total MR jobs more time consuming.
> > > > I guess you will have to try with larger set of data..
> > > >
> > > > Pankil
> > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mn...@asu.edu>
> > > > wrote:
> > > >
> > > > > Aaron
> > > > >
> > > > > That could be the issue, my data is just 516MB - wouldn't this see
> a
> > > bit
> > > > of
> > > > > speed up?
> > > > > Could you guide me to the example? I ll run my cluster on it and
> see
> > > what
> > > > I
> > > > > get. Also for my program I had a java timer running to record the
> > time
> > > > > taken
> > > > > to complete execution. Does Hadoop have an inbuilt timer?
> > > > >
> > > > > Mithila
> > > > >
> > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aaron@cloudera.com
> >
> > > > wrote:
> > > > >
> > > > > > Virtually none of the examples that ship with Hadoop are designed
> > to
> > > > > > showcase its speed. Hadoop's speedup comes from its ability to
> > > process
> > > > > very
> > > > > > large volumes of data (starting around, say, tens of GB per job,
> > and
> > > > > going
> > > > > > up in orders of magnitude from there). So if you are timing the
> pi
> > > > > > calculator (or something like that), its results won't
> necessarily
> > be
> > > > > very
> > > > > > consistent. If a job doesn't have enough fragments of data to
> > > allocate
> > > > > one
> > > > > > per each node, some of the nodes will also just go unused.
> > > > > >
> > > > > > The best example for you to run is to use randomwriter to fill up
> > > your
> > > > > > cluster with several GB of random data and then run the sort
> > program.
> > > > If
> > > > > > that doesn't scale up performance from 3 nodes to 15, then you've
> > > > > > definitely
> > > > > > got something strange going on.
> > > > > >
> > > > > > - Aaron
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> > mnagendr@asu.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey all
> > > > > > > I recently setup a three node hadoop cluster and ran an
> examples
> > on
> > > > it.
> > > > > > It
> > > > > > > was pretty fast, and all the three nodes were being used (I
> > checked
> > > > the
> > > > > > log
> > > > > > > files to make sure that the slaves are utilized).
> > > > > > >
> > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran
> the
> > > same
> > > > > > > example, but instead of speeding up, the map-reduce task seems
> to
> > > > take
> > > > > > > forever! The slaves are not being used for some reason. This
> > second
> > > > > > cluster
> > > > > > > has a lower, per node processing power, but should that make
> any
> > > > > > > difference?
> > > > > > > How can I ensure that the data is being mapped to all the
> nodes?
> > > > > > Presently,
> > > > > > > the only node that seems to be doing all the work is the Master
> > > node.
> > > > > > >
> > > > > > > Does 15 nodes in a cluster increase the network cost? What can
> I
> > do
> > > > to
> > > > > > > setup
> > > > > > > the cluster to function more efficiently?
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Mithila Nagendra
> > > > > > > Arizona State University
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Thanks Aaron.
Jim: The three clusters I setup had ubuntu running on them and the dfs was
accessed at port 54310. The new cluster which I ve setup has Red Hat Linux
release 7.2 (Enigma)running on it. Now when I try to access the dfs from one
of the slaves i get the following response: dfs cannot be accessed. When I
access the DFS throught the master there s no problem. So I feel there a
problem with the port. Any ideas? I did check the list of slaves, it looks
fine to me.

Mithila




On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <ji...@gmail.com> wrote:

> Mithila,
>
> You said all the slaves were being utilized in the 3 node cluster. Which
> application did you run to test that and what was your input size? If you
> tried the word count application on a 516 MB input file on both cluster
> setups, than some of your nodes in the 15 node cluster may not be running
> at
> all. Generally, one map job is assigned to each input split and if you are
> running your cluster with the defaults, the splits are 64 MB each. I got
> confused when you said the Namenode seemed to do all the work. Can you
> check
> conf/slaves and make sure you put the names of all task trackers there? I
> also suggest comparing both clusters with a larger input size, say at least
> 5 GB, to really see a difference.
>
> Jim
>
> On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> > in hadoop-*-examples.jar, use "randomwriter" to generate the data and
> > "sort"
> > to sort it.
> > - Aaron
> >
> > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com>
> wrote:
> >
> > > Your data is too small I guess for 15 clusters ..So it might be
> overhead
> > > time of these clusters making your total MR jobs more time consuming.
> > > I guess you will have to try with larger set of data..
> > >
> > > Pankil
> > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mn...@asu.edu>
> > > wrote:
> > >
> > > > Aaron
> > > >
> > > > That could be the issue, my data is just 516MB - wouldn't this see a
> > bit
> > > of
> > > > speed up?
> > > > Could you guide me to the example? I ll run my cluster on it and see
> > what
> > > I
> > > > get. Also for my program I had a java timer running to record the
> time
> > > > taken
> > > > to complete execution. Does Hadoop have an inbuilt timer?
> > > >
> > > > Mithila
> > > >
> > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com>
> > > wrote:
> > > >
> > > > > Virtually none of the examples that ship with Hadoop are designed
> to
> > > > > showcase its speed. Hadoop's speedup comes from its ability to
> > process
> > > > very
> > > > > large volumes of data (starting around, say, tens of GB per job,
> and
> > > > going
> > > > > up in orders of magnitude from there). So if you are timing the pi
> > > > > calculator (or something like that), its results won't necessarily
> be
> > > > very
> > > > > consistent. If a job doesn't have enough fragments of data to
> > allocate
> > > > one
> > > > > per each node, some of the nodes will also just go unused.
> > > > >
> > > > > The best example for you to run is to use randomwriter to fill up
> > your
> > > > > cluster with several GB of random data and then run the sort
> program.
> > > If
> > > > > that doesn't scale up performance from 3 nodes to 15, then you've
> > > > > definitely
> > > > > got something strange going on.
> > > > >
> > > > > - Aaron
> > > > >
> > > > >
> > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
> mnagendr@asu.edu>
> > > > > wrote:
> > > > >
> > > > > > Hey all
> > > > > > I recently setup a three node hadoop cluster and ran an examples
> on
> > > it.
> > > > > It
> > > > > > was pretty fast, and all the three nodes were being used (I
> checked
> > > the
> > > > > log
> > > > > > files to make sure that the slaves are utilized).
> > > > > >
> > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran the
> > same
> > > > > > example, but instead of speeding up, the map-reduce task seems to
> > > take
> > > > > > forever! The slaves are not being used for some reason. This
> second
> > > > > cluster
> > > > > > has a lower, per node processing power, but should that make any
> > > > > > difference?
> > > > > > How can I ensure that the data is being mapped to all the nodes?
> > > > > Presently,
> > > > > > the only node that seems to be doing all the work is the Master
> > node.
> > > > > >
> > > > > > Does 15 nodes in a cluster increase the network cost? What can I
> do
> > > to
> > > > > > setup
> > > > > > the cluster to function more efficiently?
> > > > > >
> > > > > > Thanks!
> > > > > > Mithila Nagendra
> > > > > > Arizona State University
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Jim Twensky <ji...@gmail.com>.
Mithila,

You said all the slaves were being utilized in the 3 node cluster. Which
application did you run to test that and what was your input size? If you
tried the word count application on a 516 MB input file on both cluster
setups, than some of your nodes in the 15 node cluster may not be running at
all. Generally, one map job is assigned to each input split and if you are
running your cluster with the defaults, the splits are 64 MB each. I got
confused when you said the Namenode seemed to do all the work. Can you check
conf/slaves and make sure you put the names of all task trackers there? I
also suggest comparing both clusters with a larger input size, say at least
5 GB, to really see a difference.

Jim

On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> in hadoop-*-examples.jar, use "randomwriter" to generate the data and
> "sort"
> to sort it.
> - Aaron
>
> On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com> wrote:
>
> > Your data is too small I guess for 15 clusters ..So it might be overhead
> > time of these clusters making your total MR jobs more time consuming.
> > I guess you will have to try with larger set of data..
> >
> > Pankil
> > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mn...@asu.edu>
> > wrote:
> >
> > > Aaron
> > >
> > > That could be the issue, my data is just 516MB - wouldn't this see a
> bit
> > of
> > > speed up?
> > > Could you guide me to the example? I ll run my cluster on it and see
> what
> > I
> > > get. Also for my program I had a java timer running to record the time
> > > taken
> > > to complete execution. Does Hadoop have an inbuilt timer?
> > >
> > > Mithila
> > >
> > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com>
> > wrote:
> > >
> > > > Virtually none of the examples that ship with Hadoop are designed to
> > > > showcase its speed. Hadoop's speedup comes from its ability to
> process
> > > very
> > > > large volumes of data (starting around, say, tens of GB per job, and
> > > going
> > > > up in orders of magnitude from there). So if you are timing the pi
> > > > calculator (or something like that), its results won't necessarily be
> > > very
> > > > consistent. If a job doesn't have enough fragments of data to
> allocate
> > > one
> > > > per each node, some of the nodes will also just go unused.
> > > >
> > > > The best example for you to run is to use randomwriter to fill up
> your
> > > > cluster with several GB of random data and then run the sort program.
> > If
> > > > that doesn't scale up performance from 3 nodes to 15, then you've
> > > > definitely
> > > > got something strange going on.
> > > >
> > > > - Aaron
> > > >
> > > >
> > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mn...@asu.edu>
> > > > wrote:
> > > >
> > > > > Hey all
> > > > > I recently setup a three node hadoop cluster and ran an examples on
> > it.
> > > > It
> > > > > was pretty fast, and all the three nodes were being used (I checked
> > the
> > > > log
> > > > > files to make sure that the slaves are utilized).
> > > > >
> > > > > Now I ve setup another cluster consisting of 15 nodes. I ran the
> same
> > > > > example, but instead of speeding up, the map-reduce task seems to
> > take
> > > > > forever! The slaves are not being used for some reason. This second
> > > > cluster
> > > > > has a lower, per node processing power, but should that make any
> > > > > difference?
> > > > > How can I ensure that the data is being mapped to all the nodes?
> > > > Presently,
> > > > > the only node that seems to be doing all the work is the Master
> node.
> > > > >
> > > > > Does 15 nodes in a cluster increase the network cost? What can I do
> > to
> > > > > setup
> > > > > the cluster to function more efficiently?
> > > > >
> > > > > Thanks!
> > > > > Mithila Nagendra
> > > > > Arizona State University
> > > > >
> > > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Aaron Kimball <aa...@cloudera.com>.
in hadoop-*-examples.jar, use "randomwriter" to generate the data and "sort"
to sort it.
- Aaron

On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <fo...@gmail.com> wrote:

> Your data is too small I guess for 15 clusters ..So it might be overhead
> time of these clusters making your total MR jobs more time consuming.
> I guess you will have to try with larger set of data..
>
> Pankil
> On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Aaron
> >
> > That could be the issue, my data is just 516MB - wouldn't this see a bit
> of
> > speed up?
> > Could you guide me to the example? I ll run my cluster on it and see what
> I
> > get. Also for my program I had a java timer running to record the time
> > taken
> > to complete execution. Does Hadoop have an inbuilt timer?
> >
> > Mithila
> >
> > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com>
> wrote:
> >
> > > Virtually none of the examples that ship with Hadoop are designed to
> > > showcase its speed. Hadoop's speedup comes from its ability to process
> > very
> > > large volumes of data (starting around, say, tens of GB per job, and
> > going
> > > up in orders of magnitude from there). So if you are timing the pi
> > > calculator (or something like that), its results won't necessarily be
> > very
> > > consistent. If a job doesn't have enough fragments of data to allocate
> > one
> > > per each node, some of the nodes will also just go unused.
> > >
> > > The best example for you to run is to use randomwriter to fill up your
> > > cluster with several GB of random data and then run the sort program.
> If
> > > that doesn't scale up performance from 3 nodes to 15, then you've
> > > definitely
> > > got something strange going on.
> > >
> > > - Aaron
> > >
> > >
> > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mn...@asu.edu>
> > > wrote:
> > >
> > > > Hey all
> > > > I recently setup a three node hadoop cluster and ran an examples on
> it.
> > > It
> > > > was pretty fast, and all the three nodes were being used (I checked
> the
> > > log
> > > > files to make sure that the slaves are utilized).
> > > >
> > > > Now I ve setup another cluster consisting of 15 nodes. I ran the same
> > > > example, but instead of speeding up, the map-reduce task seems to
> take
> > > > forever! The slaves are not being used for some reason. This second
> > > cluster
> > > > has a lower, per node processing power, but should that make any
> > > > difference?
> > > > How can I ensure that the data is being mapped to all the nodes?
> > > Presently,
> > > > the only node that seems to be doing all the work is the Master node.
> > > >
> > > > Does 15 nodes in a cluster increase the network cost? What can I do
> to
> > > > setup
> > > > the cluster to function more efficiently?
> > > >
> > > > Thanks!
> > > > Mithila Nagendra
> > > > Arizona State University
> > > >
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Pankil Doshi <fo...@gmail.com>.
Your data is too small I guess for 15 clusters ..So it might be overhead
time of these clusters making your total MR jobs more time consuming.
I guess you will have to try with larger set of data..

Pankil
On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mn...@asu.edu> wrote:

> Aaron
>
> That could be the issue, my data is just 516MB - wouldn't this see a bit of
> speed up?
> Could you guide me to the example? I ll run my cluster on it and see what I
> get. Also for my program I had a java timer running to record the time
> taken
> to complete execution. Does Hadoop have an inbuilt timer?
>
> Mithila
>
> On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> > Virtually none of the examples that ship with Hadoop are designed to
> > showcase its speed. Hadoop's speedup comes from its ability to process
> very
> > large volumes of data (starting around, say, tens of GB per job, and
> going
> > up in orders of magnitude from there). So if you are timing the pi
> > calculator (or something like that), its results won't necessarily be
> very
> > consistent. If a job doesn't have enough fragments of data to allocate
> one
> > per each node, some of the nodes will also just go unused.
> >
> > The best example for you to run is to use randomwriter to fill up your
> > cluster with several GB of random data and then run the sort program. If
> > that doesn't scale up performance from 3 nodes to 15, then you've
> > definitely
> > got something strange going on.
> >
> > - Aaron
> >
> >
> > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mn...@asu.edu>
> > wrote:
> >
> > > Hey all
> > > I recently setup a three node hadoop cluster and ran an examples on it.
> > It
> > > was pretty fast, and all the three nodes were being used (I checked the
> > log
> > > files to make sure that the slaves are utilized).
> > >
> > > Now I ve setup another cluster consisting of 15 nodes. I ran the same
> > > example, but instead of speeding up, the map-reduce task seems to take
> > > forever! The slaves are not being used for some reason. This second
> > cluster
> > > has a lower, per node processing power, but should that make any
> > > difference?
> > > How can I ensure that the data is being mapped to all the nodes?
> > Presently,
> > > the only node that seems to be doing all the work is the Master node.
> > >
> > > Does 15 nodes in a cluster increase the network cost? What can I do to
> > > setup
> > > the cluster to function more efficiently?
> > >
> > > Thanks!
> > > Mithila Nagendra
> > > Arizona State University
> > >
> >
>

Re: Map-Reduce Slow Down

Posted by Mithila Nagendra <mn...@asu.edu>.
Aaron

That could be the issue, my data is just 516MB - wouldn't this see a bit of
speed up?
Could you guide me to the example? I ll run my cluster on it and see what I
get. Also for my program I had a java timer running to record the time taken
to complete execution. Does Hadoop have an inbuilt timer?

Mithila

On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> Virtually none of the examples that ship with Hadoop are designed to
> showcase its speed. Hadoop's speedup comes from its ability to process very
> large volumes of data (starting around, say, tens of GB per job, and going
> up in orders of magnitude from there). So if you are timing the pi
> calculator (or something like that), its results won't necessarily be very
> consistent. If a job doesn't have enough fragments of data to allocate one
> per each node, some of the nodes will also just go unused.
>
> The best example for you to run is to use randomwriter to fill up your
> cluster with several GB of random data and then run the sort program. If
> that doesn't scale up performance from 3 nodes to 15, then you've
> definitely
> got something strange going on.
>
> - Aaron
>
>
> On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mn...@asu.edu>
> wrote:
>
> > Hey all
> > I recently setup a three node hadoop cluster and ran an examples on it.
> It
> > was pretty fast, and all the three nodes were being used (I checked the
> log
> > files to make sure that the slaves are utilized).
> >
> > Now I ve setup another cluster consisting of 15 nodes. I ran the same
> > example, but instead of speeding up, the map-reduce task seems to take
> > forever! The slaves are not being used for some reason. This second
> cluster
> > has a lower, per node processing power, but should that make any
> > difference?
> > How can I ensure that the data is being mapped to all the nodes?
> Presently,
> > the only node that seems to be doing all the work is the Master node.
> >
> > Does 15 nodes in a cluster increase the network cost? What can I do to
> > setup
> > the cluster to function more efficiently?
> >
> > Thanks!
> > Mithila Nagendra
> > Arizona State University
> >
>

Re: Map-Reduce Slow Down

Posted by Aaron Kimball <aa...@cloudera.com>.
Virtually none of the examples that ship with Hadoop are designed to
showcase its speed. Hadoop's speedup comes from its ability to process very
large volumes of data (starting around, say, tens of GB per job, and going
up in orders of magnitude from there). So if you are timing the pi
calculator (or something like that), its results won't necessarily be very
consistent. If a job doesn't have enough fragments of data to allocate one
per each node, some of the nodes will also just go unused.

The best example for you to run is to use randomwriter to fill up your
cluster with several GB of random data and then run the sort program. If
that doesn't scale up performance from 3 nodes to 15, then you've definitely
got something strange going on.

- Aaron


On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> Hey all
> I recently setup a three node hadoop cluster and ran an examples on it. It
> was pretty fast, and all the three nodes were being used (I checked the log
> files to make sure that the slaves are utilized).
>
> Now I ve setup another cluster consisting of 15 nodes. I ran the same
> example, but instead of speeding up, the map-reduce task seems to take
> forever! The slaves are not being used for some reason. This second cluster
> has a lower, per node processing power, but should that make any
> difference?
> How can I ensure that the data is being mapped to all the nodes? Presently,
> the only node that seems to be doing all the work is the Master node.
>
> Does 15 nodes in a cluster increase the network cost? What can I do to
> setup
> the cluster to function more efficiently?
>
> Thanks!
> Mithila Nagendra
> Arizona State University
>