You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by William Kang <we...@gmail.com> on 2010/03/17 07:43:50 UTC

Distributed hadoop setup 0 live datanode problem in cluster

Hi,
I just moved from pseudo distributed hadoop to a four machine full
distributed hadoop setup.

But, after I start the dfs, there is no live node showing up. If I make
master a slave too, then the datanode in master machine will show up.

I looked up all logs and found no errors. The only thing
looks suspicious  is the log in the datanode:


************************************
2010-03-17 02:39:04,003 INFO org.apache.hadoop.ipc.RPC: Server at
/xx.xx.xx.xx:9000 not available yet, Zzzzz...
2010-03-17 02:39:06,064 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 0 time(s).
2010-03-17 02:39:07,076 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 1 time(s).
2010-03-17 02:39:08,081 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 2 time(s).
2010-03-17 02:39:09,098 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx6:9000. Already tried 3 time(s).
2010-03-17 02:39:10,159 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 4 time(s).
2010-03-17 02:39:11,179 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 5 time(s).
2010-03-17 02:39:12,221 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 6 time(s).
2010-03-17 02:39:13,372 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 7 time(s).
2010-03-17 02:39:14,545 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 8 time(s).
2010-03-17 02:39:15,558 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /xx.xx.xx.xx:9000. Already tried 9 time(s).
*************************************

Does anybody know what might cause this problem?

ssh among these machines are fine without password. The owner of hadoop
folder has been changed to the same hadoop user.

Thanks!


William

Re: when to sent distributed cache file

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi Gang,
Yes, the time to distribute files is considered as jobs running time ( more specifically the set up time ). The time is essentially for the the TT to copy the files specified in distributed cache to its local FS, generally from HDFS unless you have a separate FS for JT. So in general you might be having small time gains when your files to be distributed have relatively high replication factor.
Wrt blocks, AFAIK, even on HDFS if the file size < block size, the actual space consumed is the file size itself. The overhead is in terms of storing metadata on that (small) file block. So when you have it on local disk, it will still consume only the actual size and not block size.

Thanks,
Amogh


On 3/18/10 2:28 AM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

Thanks Ravi.

Here are some observations. I run job1 to generate some data used by the following job2 without replication. The total size of the job 1 output is 25mb and is in 50 files. I use distributed cache to sent all the files to nodes running job2 tasks. When job2 starts, it stayed at "map 0% reduce 0%" for 10 minutes. When the job1 output is in 10 files (using 10 reducers in job1), the time consumed here are 2 minutes.

So, I think the time to distribute cache files is actually counted as part of the total time of the MR job. And in order to sent a cache file from HDFS to local disk, it sent at least one block (64mb by default) even that file is only 1mb. Is that right? If so, how much space that cache file takes on the local disk, 64mb or 1mb?

-Gang




Hello Gang,
      The framework will copy the necessary files to the slave node  before any tasks for the job are executed on that node.
Not sure if  time required to distribute cache is counted in map reduce job time but it is included in job submission process in JobClient .
--
Ravi

On 3/17/10 11:32 AM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

Hi all,
I doubt when does hadoop distributes the cache files. The moment we call DistributedCache.addCacheFile() ? Will the time to distribute caches be counted as part of the mapreduce job time?

Thanks,
-Gang





Re: when to sent distributed cache file

Posted by Gang Luo <lg...@yahoo.com.cn>.
Thanks Ravi.

Here are some observations. I run job1 to generate some data used by the following job2 without replication. The total size of the job 1 output is 25mb and is in 50 files. I use distributed cache to sent all the files to nodes running job2 tasks. When job2 starts, it stayed at "map 0% reduce 0%" for 10 minutes. When the job1 output is in 10 files (using 10 reducers in job1), the time consumed here are 2 minutes. 

So, I think the time to distribute cache files is actually counted as part of the total time of the MR job. And in order to sent a cache file from HDFS to local disk, it sent at least one block (64mb by default) even that file is only 1mb. Is that right? If so, how much space that cache file takes on the local disk, 64mb or 1mb? 

-Gang



----- 原始邮件 ----
发件人: Ravi Phulari <rp...@yahoo-inc.com>
收件人: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>; Gang Luo <lg...@yahoo.com.cn>
发送日期: 2010/3/17 (周三) 3:52:24 下午
主   题: Re: when to sent distributed cache file

Hello Gang,
      The framework will copy the necessary files to the slave node  before any tasks for the job are executed on that node.
Not sure if  time required to distribute cache is counted in map reduce job time but it is included in job submission process in JobClient .
--
Ravi

On 3/17/10 11:32 AM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

Hi all,
I doubt when does hadoop distributes the cache files. The moment we call DistributedCache.addCacheFile() ? Will the time to distribute caches be counted as part of the mapreduce job time?

Thanks,
-Gang


      

Re: when to sent distributed cache file

Posted by Ravi Phulari <rp...@yahoo-inc.com>.
Hello Gang,
      The framework will copy the necessary files to the slave node  before any tasks for the job are executed on that node.
Not sure if  time required to distribute cache is counted in map reduce job time but it is included in job submission process in JobClient .
--
Ravi

On 3/17/10 11:32 AM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

Hi all,
I doubt when does hadoop distributes the cache files. The moment we call DistributedCache.addCacheFile() ? Will the time to distribute caches be counted as part of the mapreduce job time?

Thanks,
-Gang







when to sent distributed cache file

Posted by Gang Luo <lg...@yahoo.com.cn>.
Hi all,
I doubt when does hadoop distributes the cache files. The moment we call DistributedCache.addCacheFile() ? Will the time to distribute caches be counted as part of the mapreduce job time?

Thanks,
-Gang



      

Re: Austin Hadoop Users Group - Tomorrow Evening (Thursday)

Posted by Alexandre Jaquet <al...@gmail.com>.
Hi,

Please let me know if you wil publish any kind of document, presentation,
video and else

Thanks in advance

Alexandre Jaquet

2010/3/17 Stephen Watt <sw...@us.ibm.com>

> Hi Folks
>
> The Austin HUG is meeting tomorrow night. I hope to see you there. We have
> speakers from Rackspace (Stu Hood on Cassandra) and IBM (Gino Bustelo on
> BigSheets).
>
> Detailed Information is available at http://austinhug.blogspot.com/
>
> Kind regards
> Steve Watt

Austin Hadoop Users Group - Tomorrow Evening (Thursday)

Posted by Stephen Watt <sw...@us.ibm.com>.
Hi Folks

The Austin HUG is meeting tomorrow night. I hope to see you there. We have 
speakers from Rackspace (Stu Hood on Cassandra) and IBM (Gino Bustelo on 
BigSheets).

Detailed Information is available at http://austinhug.blogspot.com/

Kind regards
Steve Watt

Re: Distributed hadoop setup 0 live datanode problem in cluster

Posted by Steve Loughran <st...@apache.org>.
William Kang wrote:
> Hi Jeff,
> I think I partly found out the reasons of this problem. The /etc/hosts
> 127.0.0.1 has the master's host name in it. And the namenode took 127.0.0.1
> as the ip address of the namenode. I fixed it and I already found two nodes.
> There is one still missing. I will let you guys know what happened.
> Thanks.
> 

ongoing issue w/ Ubuntu. It's a feature designed to help laptops and the 
like, useless for when you want to bring up servers visible from the 
outside and addressed by hostname, which is what Hadoop and things like 
Java RMI like.

see http://linux.derkeiler.com/Mailing-Lists/Ubuntu/2007-08/msg00681.html
http://ubuntuforums.org/showthread.php?t=432875
https://lists.ubuntu.com/archives/ubuntu-users/2008-December/168883.html


I've debated fixing this at a pre-hadoop level in my code by having 
checks for the hostname and bailing out early if your hostname maps to 
::1 or 127.*.*.1, because neither are things that are useful off the 
specific host:
http://jira.smartfrog.org/jira/browse/SFOS-1184


Something could go into Hadoop directly, but either way you have a 
problem that hostname/DNS problems can be fairly tricky to identify in a 
timely manner, let alone handle.

-steve

Re: Distributed hadoop setup 0 live datanode problem in cluster

Posted by William Kang <we...@gmail.com>.
Hi Jeff,
I think I partly found out the reasons of this problem. The /etc/hosts
127.0.0.1 has the master's host name in it. And the namenode took 127.0.0.1
as the ip address of the namenode. I fixed it and I already found two nodes.
There is one still missing. I will let you guys know what happened.
Thanks.


William

On Wed, Mar 17, 2010 at 3:14 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Can you post your namenode's log ? It seems that your data node can not
> connect to the name node.
>
> On Wed, Mar 17, 2010 at 2:43 PM, William Kang <weliam.cloud@gmail.com
> >wrote:
>
> > Hi,
> > I just moved from pseudo distributed hadoop to a four machine full
> > distributed hadoop setup.
> >
> > But, after I start the dfs, there is no live node showing up. If I make
> > master a slave too, then the datanode in master machine will show up.
> >
> > I looked up all logs and found no errors. The only thing
> > looks suspicious  is the log in the datanode:
> >
> >
> > ************************************
> > 2010-03-17 02:39:04,003 INFO org.apache.hadoop.ipc.RPC: Server at
> > /xx.xx.xx.xx:9000 not available yet, Zzzzz...
> > 2010-03-17 02:39:06,064 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 0 time(s).
> > 2010-03-17 02:39:07,076 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 1 time(s).
> > 2010-03-17 02:39:08,081 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 2 time(s).
> > 2010-03-17 02:39:09,098 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx6:9000. Already tried 3 time(s).
> > 2010-03-17 02:39:10,159 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 4 time(s).
> > 2010-03-17 02:39:11,179 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 5 time(s).
> > 2010-03-17 02:39:12,221 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 6 time(s).
> > 2010-03-17 02:39:13,372 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 7 time(s).
> > 2010-03-17 02:39:14,545 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 8 time(s).
> > 2010-03-17 02:39:15,558 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 9 time(s).
> > *************************************
> >
> > Does anybody know what might cause this problem?
> >
> > ssh among these machines are fine without password. The owner of hadoop
> > folder has been changed to the same hadoop user.
> >
> > Thanks!
> >
> >
> > William
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Distributed hadoop setup 0 live datanode problem in cluster

Posted by William Kang <we...@gmail.com>.
Hi Jeff,
Here is the log from my namenode:


************************************************************/
2010-03-17 03:09:59,750 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubtserver01/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2010-03-17 03:09:59,903 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-03-17 03:09:59,909 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
ubtserver01/127.0.0.1:9000
2010-03-17 03:09:59,912 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-03-17 03:09:59,914 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-03-17 03:09:59,979 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=cakang,cakang,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
2010-03-17 03:09:59,980 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-03-17 03:09:59,980 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-03-17 03:09:59,995 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-03-17 03:09:59,998 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-03-17 03:10:00,042 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1
2010-03-17 03:10:00,048 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2010-03-17 03:10:00,048 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 96 loaded in 0 seconds.
2010-03-17 03:10:00,048 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /opt/hadoop/dfs/name/current/edits of size 4 edits # 0 loaded in
0 seconds.
2010-03-17 03:10:00,122 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 96 saved in 0 seconds.
2010-03-17 03:10:00,391 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 436 msecs
2010-03-17 03:10:00,393 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 0
2010-03-17 03:10:00,393 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
2010-03-17 03:10:00,393 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated blocks = 0
2010-03-17 03:10:00,393 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
 over-replicated blocks = 0
2010-03-17 03:10:00,393 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 0 secs.
2010-03-17 03:10:00,393 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 0 racks and 0 datanodes
2010-03-17 03:10:00,393 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
2010-03-17 03:10:05,572 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2010-03-17 03:10:05,661 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 50070
2010-03-17 03:10:05,663 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070
webServer.getConnectors()[0].getLocalPort() returned 50070
2010-03-17 03:10:05,663 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50070
2010-03-17 03:10:05,663 INFO org.mortbay.log: jetty-6.1.14
2010-03-17 03:11:10,095 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50070
2010-03-17 03:11:10,095 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2010-03-17 03:11:10,097 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-03-17 03:11:10,098 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 9000: starting
2010-03-17 03:11:10,110 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 9000: starting
2010-03-17 03:11:10,113 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 9000: starting
2010-03-17 03:11:10,114 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000: starting
2010-03-17 03:11:10,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 9000: starting
2010-03-17 03:11:10,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000: starting
2010-03-17 03:11:10,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9000: starting
2010-03-17 03:11:10,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000: starting
2010-03-17 03:11:10,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 9000: starting
2010-03-17 03:11:10,161 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000: starting
2010-03-17 03:11:10,170 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 9000: starting
2010-03-17 03:15:51,270 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage
DS-37429514-127.0.0.1-50010-1268798883208
2010-03-17 03:15:51,274 INFO org.apache.hadoop.net.NetworkTopology: Adding a
new node: /default-rack/127.0.0.1:50010

Thanks for the replies.
I am looking forward to hearing from you.


William

On Wed, Mar 17, 2010 at 3:14 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Can you post your namenode's log ? It seems that your data node can not
> connect to the name node.
>
> On Wed, Mar 17, 2010 at 2:43 PM, William Kang <weliam.cloud@gmail.com
> >wrote:
>
> > Hi,
> > I just moved from pseudo distributed hadoop to a four machine full
> > distributed hadoop setup.
> >
> > But, after I start the dfs, there is no live node showing up. If I make
> > master a slave too, then the datanode in master machine will show up.
> >
> > I looked up all logs and found no errors. The only thing
> > looks suspicious  is the log in the datanode:
> >
> >
> > ************************************
> > 2010-03-17 02:39:04,003 INFO org.apache.hadoop.ipc.RPC: Server at
> > /xx.xx.xx.xx:9000 not available yet, Zzzzz...
> > 2010-03-17 02:39:06,064 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 0 time(s).
> > 2010-03-17 02:39:07,076 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 1 time(s).
> > 2010-03-17 02:39:08,081 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 2 time(s).
> > 2010-03-17 02:39:09,098 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx6:9000. Already tried 3 time(s).
> > 2010-03-17 02:39:10,159 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 4 time(s).
> > 2010-03-17 02:39:11,179 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 5 time(s).
> > 2010-03-17 02:39:12,221 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 6 time(s).
> > 2010-03-17 02:39:13,372 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 7 time(s).
> > 2010-03-17 02:39:14,545 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 8 time(s).
> > 2010-03-17 02:39:15,558 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /xx.xx.xx.xx:9000. Already tried 9 time(s).
> > *************************************
> >
> > Does anybody know what might cause this problem?
> >
> > ssh among these machines are fine without password. The owner of hadoop
> > folder has been changed to the same hadoop user.
> >
> > Thanks!
> >
> >
> > William
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Distributed hadoop setup 0 live datanode problem in cluster

Posted by Jeff Zhang <zj...@gmail.com>.
Can you post your namenode's log ? It seems that your data node can not
connect to the name node.

On Wed, Mar 17, 2010 at 2:43 PM, William Kang <we...@gmail.com>wrote:

> Hi,
> I just moved from pseudo distributed hadoop to a four machine full
> distributed hadoop setup.
>
> But, after I start the dfs, there is no live node showing up. If I make
> master a slave too, then the datanode in master machine will show up.
>
> I looked up all logs and found no errors. The only thing
> looks suspicious  is the log in the datanode:
>
>
> ************************************
> 2010-03-17 02:39:04,003 INFO org.apache.hadoop.ipc.RPC: Server at
> /xx.xx.xx.xx:9000 not available yet, Zzzzz...
> 2010-03-17 02:39:06,064 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 0 time(s).
> 2010-03-17 02:39:07,076 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 1 time(s).
> 2010-03-17 02:39:08,081 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 2 time(s).
> 2010-03-17 02:39:09,098 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx6:9000. Already tried 3 time(s).
> 2010-03-17 02:39:10,159 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 4 time(s).
> 2010-03-17 02:39:11,179 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 5 time(s).
> 2010-03-17 02:39:12,221 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 6 time(s).
> 2010-03-17 02:39:13,372 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 7 time(s).
> 2010-03-17 02:39:14,545 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 8 time(s).
> 2010-03-17 02:39:15,558 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /xx.xx.xx.xx:9000. Already tried 9 time(s).
> *************************************
>
> Does anybody know what might cause this problem?
>
> ssh among these machines are fine without password. The owner of hadoop
> folder has been changed to the same hadoop user.
>
> Thanks!
>
>
> William
>



-- 
Best Regards

Jeff Zhang

Austin Hadoop Users Group - Tomorrow Evening (Thursday)

Posted by Stephen Watt <sw...@us.ibm.com>.
Hi Folks

The Austin HUG is meeting tomorrow night. I hope to see you there. We have 
speakers from Rackspace (Stu Hood on Cassandra) and IBM (Gino Bustelo on 
BigSheets).

Detailed Information is available at http://austinhug.blogspot.com/

Kind regards
Steve Watt