You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andy XUE <an...@gmail.com> on 2011/03/24 08:43:23 UTC

Hadoop Distributed System Problems: Does not recognise any slave nodes

Hi there:

I'm a new user to Hadoop and Nutch, and I am trying to run the crawler *
Nutch* on a distributed system powered by *Hadoop*. However as it turns out,
the distributed system does not recognise any slave nodes in the cluster.
I've stucked at this point for months and am desperate to look for a
solution. I appreciate if anyone would be kindly enough to spend 10 minutes
of their valuable time to help.

Thank you so much!!


This is what I currently encounter:
==================================
In order to set up Hadoop clusters, I followed the instructions described in
both of:
        http://wiki.apache.org/nutch/NutchHadoopTutorial
        http://hadoop.apache.org/common/docs/current/cluster_setup.html

The problem is that, when we have a distributed file system (*HDFS* in
Hadoop) , the files are stored on both of the computers. All data in HDFS,
which are supposed to be replicated or stored onto every computer in the
cluster, is only found on the master node. They are not replicated to other
slave nodes in the cluster, which causes the subsequent tasks such as *
jobtracker* to fail. I've attached a jobstracker log file.

It worked fine when there is only one computer (the master node) in the
cluster and everything is stored in the master node. However the problem
arises when the program tries to write files onto another computer (slave
node). The wield part is that HDFS can create folders on the slave nodes but
not the files. Therefore the HDFS folders on the slave nodes are all empty.
On the web interface (http://materNode:50070 and http://materNode:50030)
which shows the status of HDFS and jobtracker, it indicates that there is
only one active node (i.e., the master node). It fails to recognize any of
the slave nodes.

I use Nutch 1.2 and Hadoop 0.20 in the experiment.

Here are the things that I've done:
I followed the instructions in the aforementioned documentations. I created
users with identical username on multiple computers, which belong to the
same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
keys for all computers and experiments show that every node in the cluster
can *ssh* to another without the requirement of a password. I've shutdown
the firewall by "*sudo ufw disable*". I've tried to search for solutions on
the Internet, but there is no luck so far.

Appreciate for the help.

The Hadoop configuration files (*core-site.xml*, *hdfs-site.xml*, *
mapred-site.xml*, and *hadoop-env.sh*) and the log file with error message (
*hadoop-rui-jobtracker-ss2.log*) are attached.
==================================

Regards
Andy
The University of Melbourne

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by Harsh J <qw...@gmail.com>.
Also, is your Hadoop really under nutch/search or is it
nutch/search/hadoop-0.x.x? Set the HADOOP_HOME appropriately to the
exact directory Hadoop's files exist immediately under.

On Thu, Mar 24, 2011 at 1:13 PM, Andy XUE <an...@gmail.com> wrote:
> Hi there:
>
> I'm a new user to Hadoop and Nutch, and I am trying to run the crawler Nutch
> on a distributed system powered by Hadoop. However as it turns out, the
> distributed system does not recognise any slave nodes in the cluster. I've
> stucked at this point for months and am desperate to look for a solution. I
> appreciate if anyone would be kindly enough to spend 10 minutes of their
> valuable time to help.
>
> Thank you so much!!
>
> This is what I currently encounter:
> ==================================
> In order to set up Hadoop clusters, I followed the instructions described in
> both of:
>         http://wiki.apache.org/nutch/NutchHadoopTutorial
>         http://hadoop.apache.org/common/docs/current/cluster_setup.html
>
> The problem is that, when we have a distributed file system (HDFS in Hadoop)
> , the files are stored on both of the computers. All data in HDFS, which are
> supposed to be replicated or stored onto every computer in the cluster, is
> only found on the master node. They are not replicated to other slave nodes
> in the cluster, which causes the subsequent tasks such as jobtracker to
> fail. I've attached a jobstracker log file.
>
> It worked fine when there is only one computer (the master node) in the
> cluster and everything is stored in the master node. However the problem
> arises when the program tries to write files onto another computer (slave
> node). The wield part is that HDFS can create folders on the slave nodes but
> not the files. Therefore the HDFS folders on the slave nodes are all empty.
> On the web interface (http://materNode:50070 and http://materNode:50030)
> which shows the status of HDFS and jobtracker, it indicates that there is
> only one active node (i.e., the master node). It fails to recognize any of
> the slave nodes.
>
> I use Nutch 1.2 and Hadoop 0.20 in the experiment.
>
> Here are the things that I've done:
> I followed the instructions in the aforementioned documentations. I created
> users with identical username on multiple computers, which belong to the
> same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
> keys for all computers and experiments show that every node in the cluster
> can ssh to another without the requirement of a password. I've shutdown the
> firewall by "sudo ufw disable". I've tried to search for solutions on the
> Internet, but there is no luck so far.
>
> Appreciate for the help.
>
> The Hadoop configuration files (core-site.xml, hdfs-site.xml,
> mapred-site.xml, and hadoop-env.sh) and the log file with error message
> (hadoop-rui-jobtracker-ss2.log) are attached.
> ==================================
>
> Regards
> Andy
> The University of Melbourne
>
>



-- 
Harsh J
http://harshj.com

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by Andy XUE <an...@gmail.com>.
Hi all:

Thanks for your help. The problem is finally solved.
The problem I encounter and the corresponding solution is described here:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A1

<http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A1>Regards
XUE, Yuan (Andy)
The University of Melbourne


On 25 March 2011 01:50, modemide <mo...@gmail.com> wrote:

> I'm also new to hadoop, but I was able to get my cluster up and
> running.  I'm not familiar with Nutch though.
>
> In any case, my assumption is that Nutch relies on a working hadoop
> cluster as the base and adds on a few configurations to integrate the
> two.
>
> Here are some things that might help you:
> * Have you edited your slaves file to include the slave computer and
> the master file to include the jobtracker?
> * I also noticed that you are using open JDK for Java instead of sun
> java.  I went with the Hadoop recommended Java distribution.  Is there
> any particular reason for using Open JDK?
> * I'll assume that because you said that files are replicated on every
> computer, you only have two computers operating as slaves?
> * Do you have the configuration for your slaves done?  Can you attach
> those files?  (the attachments worked perfectly for me, I can't visit
> the paste sites at work unfortunately)
>
>
> Hope that gets you started in the right direction.  Also, if it helps,
> I went through these tutorials several times and found them much more
> helpful.  Maybe it will also help you:
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
>
> On Thu, Mar 24, 2011 at 9:39 AM, Harsh J <qw...@gmail.com> wrote:
> > Hello,
> >
> > Thanks for attaching the log.
> >
> > On Thu, Mar 24, 2011 at 5:34 PM, Andy XUE <an...@gmail.com> wrote:
> >> and the log file with error
> >> message (*hadoop-rui-jobtracker-ss2.log <http://db.tt/PPGhEaa>*) are
> linked.
> >
> > This is a case of
> >
> http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F
> >
> > --
> > Harsh J
> > http://harshj.com
> >
>

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by modemide <mo...@gmail.com>.
I'm also new to hadoop, but I was able to get my cluster up and
running.  I'm not familiar with Nutch though.

In any case, my assumption is that Nutch relies on a working hadoop
cluster as the base and adds on a few configurations to integrate the
two.

Here are some things that might help you:
* Have you edited your slaves file to include the slave computer and
the master file to include the jobtracker?
* I also noticed that you are using open JDK for Java instead of sun
java.  I went with the Hadoop recommended Java distribution.  Is there
any particular reason for using Open JDK?
* I'll assume that because you said that files are replicated on every
computer, you only have two computers operating as slaves?
* Do you have the configuration for your slaves done?  Can you attach
those files?  (the attachments worked perfectly for me, I can't visit
the paste sites at work unfortunately)


Hope that gets you started in the right direction.  Also, if it helps,
I went through these tutorials several times and found them much more
helpful.  Maybe it will also help you:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/


On Thu, Mar 24, 2011 at 9:39 AM, Harsh J <qw...@gmail.com> wrote:
> Hello,
>
> Thanks for attaching the log.
>
> On Thu, Mar 24, 2011 at 5:34 PM, Andy XUE <an...@gmail.com> wrote:
>> and the log file with error
>> message (*hadoop-rui-jobtracker-ss2.log <http://db.tt/PPGhEaa>*) are linked.
>
> This is a case of
> http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F
>
> --
> Harsh J
> http://harshj.com
>

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by Harsh J <qw...@gmail.com>.
Hello,

Thanks for attaching the log.

On Thu, Mar 24, 2011 at 5:34 PM, Andy XUE <an...@gmail.com> wrote:
> and the log file with error
> message (*hadoop-rui-jobtracker-ss2.log <http://db.tt/PPGhEaa>*) are linked.

This is a case of
http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F

-- 
Harsh J
http://harshj.com

Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by Andy XUE <an...@gmail.com>.
Hi there:

I'm a new user to Hadoop and Nutch, and I am trying to run the crawler *
Nutch* on a distributed system powered by *Hadoop*. However as it turns out,
the distributed system does not recognise any slave nodes in the cluster.
I've stucked at this point for months and am desperate to look for a
solution. I appreciate if anyone would be kindly enough to spend 10 minutes
of their valuable time to help.

Thank you so much!!


This is what I currently encounter:
==================================
In order to set up Hadoop clusters, I followed the instructions described in
both of:
        http://wiki.apache.org/nutch/NutchHadoopTutorial
        http://hadoop.apache.org/common/docs/current/cluster_setup.html

The problem is that, when we have a distributed file system (*HDFS* in
Hadoop) , the files are stored on both of the computers. All data in HDFS,
which are supposed to be replicated or stored onto every computer in the
cluster, is only found on the master node. They are not replicated to other
slave nodes in the cluster, which causes the subsequent tasks such as *
jobtracker* to fail. I've attached a jobstracker log file.

It worked fine when there is only one computer (the master node) in the
cluster and everything is stored in the master node. However the problem
arises when the program tries to write files onto another computer (slave
node). The wield part is that HDFS can create folders on the slave nodes but
not the files. Therefore the HDFS folders on the slave nodes are all empty.
On the web interface (http://materNode:50070 and http://materNode:50030)
which shows the status of HDFS and jobtracker, it indicates that there is
only one active node (i.e., the master node). It fails to recognize any of
the slave nodes.

I use Nutch 1.2 and Hadoop 0.20 in the experiment.

Here are the things that I've done:
I followed the instructions in the aforementioned documentations. I created
users with identical username on multiple computers, which belong to the
same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
keys for all computers and experiments show that every node in the cluster
can *ssh* to another without the requirement of a password. I've shutdown
the firewall by "*sudo ufw disable*". I've tried to search for solutions on
the Internet, but there is no luck so far.

Appreciate for the help.

The Hadoop configuration files (*core-site.xml* <http://db.tt/co0q25s>, *
hdfs-site.xml <http://db.tt/TSK7jA6>*, *mapred-site.xml<http://db.tt/8dJoUrp>
*, and *hadoop-env.sh <http://db.tt/FztxTEw>*) and the log file with error
message (*hadoop-rui-jobtracker-ss2.log <http://db.tt/PPGhEaa>*) are linked.

p.s.: Re: Harsh J: Thank you so much for your time and reply, I've uploaded
the configuration and log files as links. The directory of *HADOOP_HOME*(i.e.,
*/home/rui/workspace/nutch/search/*) is where the '*bin/*', '*conf/*', '*
lib/*' etc are located. the '*start-all.sh*' is located at *
${HADOOP_HOME}/bin/shart-all.sh*. There is no separate directory for Hadoop.
I thought it is integrated into Nutch.

==================================

Regards
Andy
The University of Melbourne

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Posted by Harsh J <qw...@gmail.com>.
Hello Andy,

The list forbids some attachments; could you paste your logs on any
available paste service and post back a link to that here?
http://paste.pocoo.org is a good one.

Your configuration looks alright for a homogeneous cluster setup. When
you say "does not recognize", do you mean that you have 1 live node
(master) and rest all are dead? Have you ensured that your TaskTracker
and DataNode services started successfully on all the slave machines
(as provided in the conf/slaves file)? Check logs of any service that
does not start successfully - that should help you track down issues
further.

On Thu, Mar 24, 2011 at 1:13 PM, Andy XUE <an...@gmail.com> wrote:
> Hi there:
>
> I'm a new user to Hadoop and Nutch, and I am trying to run the crawler Nutch
> on a distributed system powered by Hadoop. However as it turns out, the
> distributed system does not recognise any slave nodes in the cluster. I've
> stucked at this point for months and am desperate to look for a solution. I
> appreciate if anyone would be kindly enough to spend 10 minutes of their
> valuable time to help.
>
> Thank you so much!!
>
> This is what I currently encounter:
> ==================================
> In order to set up Hadoop clusters, I followed the instructions described in
> both of:
>         http://wiki.apache.org/nutch/NutchHadoopTutorial
>         http://hadoop.apache.org/common/docs/current/cluster_setup.html
>
> The problem is that, when we have a distributed file system (HDFS in Hadoop)
> , the files are stored on both of the computers. All data in HDFS, which are
> supposed to be replicated or stored onto every computer in the cluster, is
> only found on the master node. They are not replicated to other slave nodes
> in the cluster, which causes the subsequent tasks such as jobtracker to
> fail. I've attached a jobstracker log file.
>
> It worked fine when there is only one computer (the master node) in the
> cluster and everything is stored in the master node. However the problem
> arises when the program tries to write files onto another computer (slave
> node). The wield part is that HDFS can create folders on the slave nodes but
> not the files. Therefore the HDFS folders on the slave nodes are all empty.
> On the web interface (http://materNode:50070 and http://materNode:50030)
> which shows the status of HDFS and jobtracker, it indicates that there is
> only one active node (i.e., the master node). It fails to recognize any of
> the slave nodes.
>
> I use Nutch 1.2 and Hadoop 0.20 in the experiment.
>
> Here are the things that I've done:
> I followed the instructions in the aforementioned documentations. I created
> users with identical username on multiple computers, which belong to the
> same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
> keys for all computers and experiments show that every node in the cluster
> can ssh to another without the requirement of a password. I've shutdown the
> firewall by "sudo ufw disable". I've tried to search for solutions on the
> Internet, but there is no luck so far.
>
> Appreciate for the help.
>
> The Hadoop configuration files (core-site.xml, hdfs-site.xml,
> mapred-site.xml, and hadoop-env.sh) and the log file with error message
> (hadoop-rui-jobtracker-ss2.log) are attached.
> ==================================
>
> Regards
> Andy
> The University of Melbourne
>
>



-- 
Harsh J
http://harshj.com