You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Bolle, Jeffrey F." <jb...@mitre.org> on 2007/06/07 01:16:27 UTC

Hadoop oddity

In theory I have a cluster with 4 nodes.  When running something like
bin/slaves.sh uptime I get the desired results (all four servers
respond with their uptimes).  However, when I run a crawl only one
server, the host (which also acts as a slave), appears under the nodes
display.  This has happened after the primary server died and had now
been rebuilt.  Had anyone experienced this before or does anyone have
any tips as to where to begin looking for the problem.  Thanks.
 
Jeff

RE: Hadoop oddity

Posted by "Bolle, Jeffrey F." <jb...@mitre.org>.

I can ping from every machine to every machine.  And here is another
good one.  I switched the master to one of the other servers and I
still have the exact same problem, only the master machine shows up on
the machines list.  

The fs.default.name variable is pointing to the fqdn of the namenode
and the mapred.job.tracker is also pointing to the fqdn of the master
server.

As I said, an oddity.

Jeff


-----Original Message-----
From: Dennis Kubes [mailto:kubes@apache.org] 
Sent: Friday, June 08, 2007 1:55 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop oddity

I was asking if you can ping the master from the slaves.  Can you hit 
the namenode from one or more of the remote datanodes?  If so in the 
hadoop-site.xml files on the datanodes, if the namenode variable 
pointing to the fqdn of the namenode instead of local?

Dennis Kubes

Bolle, Jeffrey F. wrote:
> Everything pings fine and nslookups all come back normally.  The ssh
> connections work just fine, as the bin/slaves.sh program will run and
I
> can check all of the uptimes remotely and everything. 
> 
> Looking at the logs there is nothing out of the ordinary.  I see
Jetty
> come up on each of the nodes as well as the main server. Jetty says
it
> is listening on 0.0.0.0:50070 for the namenode, 0.0.0.0:50060 for the
> tasktracker, 0.0.0.0:50030 for the jobtracker, and 0.0.0.0:50075 for
> the data node.  The datanode logs on all of the clients had a no
route
> to host exception from earlier, but other than that there is nothing
.
> In the task tracker logs everything looks normal with Jetty starting.

> 
> When running a hadoop fsck / I see that the blocks aren't being
> replicated to any of the servers (which makes complete sense with the
> idea that my master isn't communicating with any of the slaves).
> 
> In my slaves file there is one fqdn per line for each of the 4
> machines.  This file is the same on all 4 machines.  Any ideas on
> debugging this?
> 
> Jeff
> 
> 
> -----Original Message-----
> From: Vishal Shah [mailto:vishals@rediff.co.in] 
> Sent: Thursday, June 07, 2007 1:44 AM
> To: nutch-user@lucene.apache.org
> Subject: RE: Hadoop oddity
> 
> Hi Jeff,
> 
>    Can you also try an nslookup for the master from the slave nodes?
> Does
> that work properly? Also, it would be good to see the jobtracker and
> tasktracker logs.
> 
> -vishal.
> 
> -----Original Message-----
> From: Dennis Kubes [mailto:kubes@apache.org] 
> Sent: Thursday, June 07, 2007 9:58 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop oddity
> 
> The other things to check would be ability to ping from slave nodes, 
> correct fqdn in the slave nodes hadoop-site.xml file, correct dns
setup
> 
> for the master.
> 
> Dennis Kubes
> 
> Bolle, Jeffrey F. wrote:
>> The hosts file looks fine...still only showing 1 node.  
>>
>> Jeff
>>  
>>
>> -----Original Message-----
>> From: Dennis Kubes [mailto:kubes@apache.org] 
>> Sent: Wednesday, June 06, 2007 7:42 PM
>> To: nutch-user@lucene.apache.org
>> Subject: Re: Hadoop oddity
>>
>> If the hosts file on the namenode is not setup correctly it could be

>> listening only on localhost.  Make sure your /etc/hosts file looks 
>> something like this:
>>
>> 127.0.0.1	localhost, localhost.localdomain
>> x.x.x.x		yourcomputer.domain.tld
>>
>> Dennis Kubes
>>
>> Bolle, Jeffrey F. wrote:
>>> In theory I have a cluster with 4 nodes.  When running something
> like
>>> bin/slaves.sh uptime I get the desired results (all four servers
>>> respond with their uptimes).  However, when I run a crawl only one
>>> server, the host (which also acts as a slave), appears under the
>> nodes
>>> display.  This has happened after the primary server died and had
> now
>>> been rebuilt.  Had anyone experienced this before or does anyone
> have
>>> any tips as to where to begin looking for the problem.  Thanks.
>>>  
>>> Jeff
>>>
>

Re: Hadoop oddity

Posted by Dennis Kubes <ku...@apache.org>.

I was asking if you can ping the master from the slaves.  Can you hit 
the namenode from one or more of the remote datanodes?  If so in the 
hadoop-site.xml files on the datanodes, if the namenode variable 
pointing to the fqdn of the namenode instead of local?

Dennis Kubes

Bolle, Jeffrey F. wrote:
> Everything pings fine and nslookups all come back normally.  The ssh
> connections work just fine, as the bin/slaves.sh program will run and I
> can check all of the uptimes remotely and everything. 
> 
> Looking at the logs there is nothing out of the ordinary.  I see Jetty
> come up on each of the nodes as well as the main server. Jetty says it
> is listening on 0.0.0.0:50070 for the namenode, 0.0.0.0:50060 for the
> tasktracker, 0.0.0.0:50030 for the jobtracker, and 0.0.0.0:50075 for
> the data node.  The datanode logs on all of the clients had a no route
> to host exception from earlier, but other than that there is nothing .
> In the task tracker logs everything looks normal with Jetty starting. 
> 
> When running a hadoop fsck / I see that the blocks aren't being
> replicated to any of the servers (which makes complete sense with the
> idea that my master isn't communicating with any of the slaves).
> 
> In my slaves file there is one fqdn per line for each of the 4
> machines.  This file is the same on all 4 machines.  Any ideas on
> debugging this?
> 
> Jeff
> 
> 
> -----Original Message-----
> From: Vishal Shah [mailto:vishals@rediff.co.in] 
> Sent: Thursday, June 07, 2007 1:44 AM
> To: nutch-user@lucene.apache.org
> Subject: RE: Hadoop oddity
> 
> Hi Jeff,
> 
>    Can you also try an nslookup for the master from the slave nodes?
> Does
> that work properly? Also, it would be good to see the jobtracker and
> tasktracker logs.
> 
> -vishal.
> 
> -----Original Message-----
> From: Dennis Kubes [mailto:kubes@apache.org] 
> Sent: Thursday, June 07, 2007 9:58 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop oddity
> 
> The other things to check would be ability to ping from slave nodes, 
> correct fqdn in the slave nodes hadoop-site.xml file, correct dns setup
> 
> for the master.
> 
> Dennis Kubes
> 
> Bolle, Jeffrey F. wrote:
>> The hosts file looks fine...still only showing 1 node.  
>>
>> Jeff
>>  
>>
>> -----Original Message-----
>> From: Dennis Kubes [mailto:kubes@apache.org] 
>> Sent: Wednesday, June 06, 2007 7:42 PM
>> To: nutch-user@lucene.apache.org
>> Subject: Re: Hadoop oddity
>>
>> If the hosts file on the namenode is not setup correctly it could be 
>> listening only on localhost.  Make sure your /etc/hosts file looks 
>> something like this:
>>
>> 127.0.0.1	localhost, localhost.localdomain
>> x.x.x.x		yourcomputer.domain.tld
>>
>> Dennis Kubes
>>
>> Bolle, Jeffrey F. wrote:
>>> In theory I have a cluster with 4 nodes.  When running something
> like
>>> bin/slaves.sh uptime I get the desired results (all four servers
>>> respond with their uptimes).  However, when I run a crawl only one
>>> server, the host (which also acts as a slave), appears under the
>> nodes
>>> display.  This has happened after the primary server died and had
> now
>>> been rebuilt.  Had anyone experienced this before or does anyone
> have
>>> any tips as to where to begin looking for the problem.  Thanks.
>>>  
>>> Jeff
>>>
>

RE: Hadoop oddity

Posted by "Bolle, Jeffrey F." <jb...@mitre.org>.

Everything pings fine and nslookups all come back normally.  The ssh
connections work just fine, as the bin/slaves.sh program will run and I
can check all of the uptimes remotely and everything. 

Looking at the logs there is nothing out of the ordinary.  I see Jetty
come up on each of the nodes as well as the main server. Jetty says it
is listening on 0.0.0.0:50070 for the namenode, 0.0.0.0:50060 for the
tasktracker, 0.0.0.0:50030 for the jobtracker, and 0.0.0.0:50075 for
the data node.  The datanode logs on all of the clients had a no route
to host exception from earlier, but other than that there is nothing .
In the task tracker logs everything looks normal with Jetty starting. 

When running a hadoop fsck / I see that the blocks aren't being
replicated to any of the servers (which makes complete sense with the
idea that my master isn't communicating with any of the slaves).

In my slaves file there is one fqdn per line for each of the 4
machines.  This file is the same on all 4 machines.  Any ideas on
debugging this?

Jeff

-----Original Message-----
From: Vishal Shah [mailto:vishals@rediff.co.in] 
Sent: Thursday, June 07, 2007 1:44 AM
To: nutch-user@lucene.apache.org
Subject: RE: Hadoop oddity

Hi Jeff,

   Can you also try an nslookup for the master from the slave nodes?
Does
that work properly? Also, it would be good to see the jobtracker and
tasktracker logs.

-vishal.

-----Original Message-----
From: Dennis Kubes [mailto:kubes@apache.org] 
Sent: Thursday, June 07, 2007 9:58 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop oddity

The other things to check would be ability to ping from slave nodes, 
correct fqdn in the slave nodes hadoop-site.xml file, correct dns setup

for the master.

Dennis Kubes

Bolle, Jeffrey F. wrote:
> The hosts file looks fine...still only showing 1 node.  
> 
> Jeff
>  
> 
> -----Original Message-----
> From: Dennis Kubes [mailto:kubes@apache.org] 
> Sent: Wednesday, June 06, 2007 7:42 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop oddity
> 
> If the hosts file on the namenode is not setup correctly it could be 
> listening only on localhost.  Make sure your /etc/hosts file looks 
> something like this:
> 
> 127.0.0.1	localhost, localhost.localdomain
> x.x.x.x		yourcomputer.domain.tld
> 
> Dennis Kubes
> 
> Bolle, Jeffrey F. wrote:
>> In theory I have a cluster with 4 nodes.  When running something
like
>> bin/slaves.sh uptime I get the desired results (all four servers
>> respond with their uptimes).  However, when I run a crawl only one
>> server, the host (which also acts as a slave), appears under the
> nodes
>> display.  This has happened after the primary server died and had
now
>> been rebuilt.  Had anyone experienced this before or does anyone
have
>> any tips as to where to begin looking for the problem.  Thanks.
>>  
>> Jeff
>>

RE: Hadoop oddity

Posted by Vishal Shah <vi...@rediff.co.in>.

Hi Jeff,

   Can you also try an nslookup for the master from the slave nodes? Does
that work properly? Also, it would be good to see the jobtracker and
tasktracker logs.

-vishal.

-----Original Message-----
From: Dennis Kubes [mailto:kubes@apache.org] 
Sent: Thursday, June 07, 2007 9:58 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop oddity

The other things to check would be ability to ping from slave nodes, 
correct fqdn in the slave nodes hadoop-site.xml file, correct dns setup 
for the master.

Dennis Kubes

Bolle, Jeffrey F. wrote:
> The hosts file looks fine...still only showing 1 node.  
> 
> Jeff
>  
> 
> -----Original Message-----
> From: Dennis Kubes [mailto:kubes@apache.org] 
> Sent: Wednesday, June 06, 2007 7:42 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop oddity
> 
> If the hosts file on the namenode is not setup correctly it could be 
> listening only on localhost.  Make sure your /etc/hosts file looks 
> something like this:
> 
> 127.0.0.1	localhost, localhost.localdomain
> x.x.x.x		yourcomputer.domain.tld
> 
> Dennis Kubes
> 
> Bolle, Jeffrey F. wrote:
>> In theory I have a cluster with 4 nodes.  When running something like
>> bin/slaves.sh uptime I get the desired results (all four servers
>> respond with their uptimes).  However, when I run a crawl only one
>> server, the host (which also acts as a slave), appears under the
> nodes
>> display.  This has happened after the primary server died and had now
>> been rebuilt.  Had anyone experienced this before or does anyone have
>> any tips as to where to begin looking for the problem.  Thanks.
>>  
>> Jeff
>>

Re: Hadoop oddity

Posted by Dennis Kubes <ku...@apache.org>.

The other things to check would be ability to ping from slave nodes, 
correct fqdn in the slave nodes hadoop-site.xml file, correct dns setup 
for the master.

Dennis Kubes

Bolle, Jeffrey F. wrote:
> The hosts file looks fine...still only showing 1 node.  
> 
> Jeff
>  
> 
> -----Original Message-----
> From: Dennis Kubes [mailto:kubes@apache.org] 
> Sent: Wednesday, June 06, 2007 7:42 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop oddity
> 
> If the hosts file on the namenode is not setup correctly it could be 
> listening only on localhost.  Make sure your /etc/hosts file looks 
> something like this:
> 
> 127.0.0.1	localhost, localhost.localdomain
> x.x.x.x		yourcomputer.domain.tld
> 
> Dennis Kubes
> 
> Bolle, Jeffrey F. wrote:
>> In theory I have a cluster with 4 nodes.  When running something like
>> bin/slaves.sh uptime I get the desired results (all four servers
>> respond with their uptimes).  However, when I run a crawl only one
>> server, the host (which also acts as a slave), appears under the
> nodes
>> display.  This has happened after the primary server died and had now
>> been rebuilt.  Had anyone experienced this before or does anyone have
>> any tips as to where to begin looking for the problem.  Thanks.
>>  
>> Jeff
>>

RE: Hadoop oddity

Posted by "Bolle, Jeffrey F." <jb...@mitre.org>.

The hosts file looks fine...still only showing 1 node.  

Jeff
 

-----Original Message-----
From: Dennis Kubes [mailto:kubes@apache.org] 
Sent: Wednesday, June 06, 2007 7:42 PM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop oddity

If the hosts file on the namenode is not setup correctly it could be 
listening only on localhost.  Make sure your /etc/hosts file looks 
something like this:

127.0.0.1	localhost, localhost.localdomain
x.x.x.x		yourcomputer.domain.tld

Dennis Kubes

Bolle, Jeffrey F. wrote:
> In theory I have a cluster with 4 nodes.  When running something like
> bin/slaves.sh uptime I get the desired results (all four servers
> respond with their uptimes).  However, when I run a crawl only one
> server, the host (which also acts as a slave), appears under the
nodes
> display.  This has happened after the primary server died and had now
> been rebuilt.  Had anyone experienced this before or does anyone have
> any tips as to where to begin looking for the problem.  Thanks.
>  
> Jeff
>

Re: Hadoop oddity

Posted by Dennis Kubes <ku...@apache.org>.

If the hosts file on the namenode is not setup correctly it could be 
listening only on localhost.  Make sure your /etc/hosts file looks 
something like this:

127.0.0.1	localhost, localhost.localdomain
x.x.x.x		yourcomputer.domain.tld

Dennis Kubes

Bolle, Jeffrey F. wrote:
> In theory I have a cluster with 4 nodes.  When running something like
> bin/slaves.sh uptime I get the desired results (all four servers
> respond with their uptimes).  However, when I run a crawl only one
> server, the host (which also acts as a slave), appears under the nodes
> display.  This has happened after the primary server died and had now
> been rebuilt.  Had anyone experienced this before or does anyone have
> any tips as to where to begin looking for the problem.  Thanks.
>  
> Jeff
>